Message boards :
Number crunching :
Stderr Truncations
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I grumbled a bit because they moved things around on the menu again and I had to hunt to find often used shut down client. And in general it works, though I still get taken aback every time I try to 'View' the event log, but find it's a 'Tool'. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
It was under File first, as I remember. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
It was under File first, as I remember. In v6.12.34, which I still keep running, it's 'Advanced'. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Interesting how little CPU time the NVidia OpenCL tasks need - and I'm typing from the machine running the test, with no sign of screen lag either. I'd held back an extra core just in case (running five CPU tasks plus intel_gpu), but I think I can safely release that. In case of Einsten's low consumption puzzle was quite simple as I recall - 20ms or so predictable size kernels. With 20ms kernels one can safely sleep this time. Has MW code to look at? If bigger kernels again it's not "interesting" at all. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
To start my day here, I find that my xw9400, now running with BOINC v7.6.2, only coughed up one truncated Stderr last night, Task 4259104046. It's a typical truncation, not an empty Stderr, with the following contents: Stderr output <core_client_version>7.6.2</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 4 CUDA device(s): Device 1: GeForce GTX 750 Ti, 2047 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 5, pciSlotID = 0 Device 2: GeForce GTX 660, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 5 pciBusID = 88, pciSlotID = 0 Device 3: GeForce GTX 750 Ti, 2047 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 69, pciSlotID = 0 Device 4: GeForce GTX 660, 1535 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 6 pciBusID = 24, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 4 setiathome_CUDA: CUDA Device 4 specified, checking... Device 4: GeForce GTX 660 is okay SETI@home using CUDA accelerated device GeForce GTX 660 mbcuda.cfg, processpriority key detected pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to ABOVE_NORMAL successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.202907 Kepler GPU current clockRate = 992 MHz re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes Thread call stack limit is: 1k </stderr_txt> ]]> The Event Log shows: 10-Jul-2015 19:40:56 [SETI@home] [slot] assigning slot 6 to 17ja15aa.1758.15609.438086664198.12.247_0 10-Jul-2015 19:40:56 [---] [slot] removed file slots/6/init_data.xml 10-Jul-2015 19:40:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/Lunatics_x41zc_win32_cuda50.exe to slots/6/Lunatics_x41zc_win32_cuda50.exe 10-Jul-2015 19:40:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/cudart32_50_35.dll to slots/6/cudart32_50_35.dll 10-Jul-2015 19:40:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/cufft32_50_35.dll to slots/6/cufft32_50_35.dll 10-Jul-2015 19:40:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/mbcuda.cfg to slots/6/mbcuda.cfg 10-Jul-2015 19:40:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/17ja15aa.1758.15609.438086664198.12.247 to slots/6/work_unit.sah 10-Jul-2015 19:40:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/17ja15aa.1758.15609.438086664198.12.247_0_0 to slots/6/result.sah 10-Jul-2015 19:40:56 [---] [slot] removed file slots/6/boinc_temporary_exit 10-Jul-2015 19:40:56 [SETI@home] Starting task 17ja15aa.1758.15609.438086664198.12.247_0 10-Jul-2015 19:40:56 [SETI@home] [cpu_sched] Starting task 17ja15aa.1758.15609.438086664198.12.247_0 using setiathome_v7 version 700 (cuda50) in slot 6 ... 10-Jul-2015 20:12:36 [---] [slot] cleaning out slots/6: handle_exited_app() 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/boinc_finish_called 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/boinc_task_state.xml 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/cudart32_50_35.dll 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/cufft32_50_35.dll 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/init_data.xml 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/Lunatics_x41zc_win32_cuda50.exe 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/mbcuda.cfg 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/result.sah 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/state.sah 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/stderr.txt 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/work_unit.sah 10-Jul-2015 20:12:36 [SETI@home] Computation for task 17ja15aa.1758.15609.438086664198.12.247_0 finished 10-Jul-2015 20:12:36 [---] [slot] cleaning out slots/6: get_free_slot() 10-Jul-2015 20:12:36 [SETI@home] [slot] assigning slot 6 to 13ja15aa.17180.18063.438086664202.12.151_0 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/init_data.xml 10-Jul-2015 20:12:36 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/Lunatics_x41zc_win32_cuda50.exe to slots/6/Lunatics_x41zc_win32_cuda50.exe 10-Jul-2015 20:12:36 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/cudart32_50_35.dll to slots/6/cudart32_50_35.dll 10-Jul-2015 20:12:36 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/cufft32_50_35.dll to slots/6/cufft32_50_35.dll 10-Jul-2015 20:12:36 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/mbcuda.cfg to slots/6/mbcuda.cfg 10-Jul-2015 20:12:36 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/13ja15aa.17180.18063.438086664202.12.151 to slots/6/work_unit.sah 10-Jul-2015 20:12:36 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/13ja15aa.17180.18063.438086664202.12.151_0_0 to slots/6/result.sah 10-Jul-2015 20:12:36 [---] [slot] removed file slots/6/boinc_temporary_exit 10-Jul-2015 20:12:36 [SETI@home] Starting task 13ja15aa.17180.18063.438086664202.12.151_0 10-Jul-2015 20:12:36 [SETI@home] [cpu_sched] Starting task 13ja15aa.17180.18063.438086664202.12.151_0 using setiathome_v7 version 700 (cuda50) in slot 6 10-Jul-2015 20:12:38 [SETI@home] Started upload of 17ja15aa.1758.15609.438086664198.12.247_0_0 10-Jul-2015 20:12:40 [SETI@home] Finished upload of 17ja15aa.1758.15609.438086664198.12.247_0_0 Nothing to indicate a failure to delete/remove the stderr.txt file. It seems like this might indicate that truncated Stderr and empty Stderr result from slightly different behaviors. In S@h, of course, either one can lead to an Instant Invalid if it happens on an overflow with no Autocorrs. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Interesting how little CPU time the NVidia OpenCL tasks need - and I'm typing from the machine running the test, with no sign of screen lag either. I'd held back an extra core just in case (running five CPU tasks plus intel_gpu), but I think I can safely release that. The code is on github: https://github.com/Milkyway-at-home/milkywayathome_client If the question is kernel size, then the answer becomes choosing a development platform that suits the granularity of the problem under consideration. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
One thing to add to the MW/Seti Priority mix, is that if you reach a state of overcommit at any point, then that is where the probability of the application/boincapi side of the weaknesses happening (at least) should become higher. Examples of the most extreme situation would involve full utilisation of the CPUs. This pressure occurs most with GPU tasks starting up and shutting down, on top of existing CPU tasks, then pile on the Boinc client recalculating/simulating its schedule, other housekeeping/network and possibly other system tasks going on. Then I should be a perfect candidate for those conditions since I run SETI, MilkyWay and Einstein tasks concurrently on both of my GPUs. I also run 6 cores of 8 on the CPUs. The systems normally run the GPUs at 99% utilization and the CPUs between 65-95% utilization depending on what kind of tasks are on the cards. The Einstein and MilkyWay tasks use a lot less CPU than SETI. However I have noticed anytime a MW task is on the same card as running SETI tasks, then the SETI tasks take a lot longer to finish than normal. The MW tasks use more resources somewhere that hinder the SETI tasks. The problem could get worse if the MW devs actually release double sized tasks as they are proposing to make them run longer. Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
That's some tidy sources and structure actually. Nice to see no spaghetti code. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I grumbled a bit because they moved things around on the menu again and I had to hunt to find often used shut down client. Yes I found it after a few clicks. So you are to blame ;-} ... I like the Tools menu. It makes sense to me the the Event Log is in that category. Just have to forget muscle memory when I try to stop the client. Should become second nature in a week or so I guess. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Then I should be a perfect candidate for those conditions since I run SETI, MilkyWay and Einstein tasks concurrently on both of my GPUs. I also run 6 cores of 8 on the CPUs. The systems normally run the GPUs at 99% utilization and the CPUs between 65-95% utilization depending on what kind of tasks are on the cards. The Einstein and MilkyWay tasks use a lot less CPU than SETI. However I have noticed anytime a MW task is on the same card as running SETI tasks, then the SETI tasks take a lot longer to finish than normal. The MW tasks use more resources somewhere that hinder the SETI tasks. The problem could get worse if the MW devs actually release double sized tasks as they are proposing to make them run longer. Probably not the next release, but the one after for multibeam will have substantially more control over the loading. (next release will just be higher load). Probably can't recommend defaults to push OpenCL tasks aside , but can at least look at showing what something a bit more refined can provide as far as reconfigurability goes. Sharing the devices nicely will eventually be the name of the game, though for now it's all still a bit wild west out there. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
OK, I got an invalid with blank stderr: de_modfit_fast_15_3s_136_sim1Jun1_2_1434554402_8825659_0. Ran for just 96 seconds, but look what BOINC was doing in that time. 11-Jul-2015 17:06:17 [---] [slot] cleaning out slots/0: get_free_slot() I make that slots 1, 2, 5 and 6 being recycled during the first third of that 96-second run, but nothing at all happened in the final minute. Where does that leave our timing theories? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I make that slots 1, 2, 5 and 6 being recycled during the first third of that 96-second run, but nothing at all happened in the final minute. Where does that leave our timing theories? Does any of that indicate the stderr disk file ever had content ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's granularity of algorithm and device used to run it, not "granularity of platform". I'm not sure that suggestion aspecially wrong one can become an "answer". |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
If the question is kernel size, then the answer becomes choosing a development platform that suits the granularity of the problem under consideration. I accept the semantic correction. Granularity of algorithm, device, implementation, and runtime support libraries. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Richard, you also produced an invalid via the truncation mechanism Task 1184514362 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
One thing to consider is that the event log might not be in the exact order of events happening, since there is always a group of events with the same timestamp. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I make that slots 1, 2, 5 and 6 being recycled during the first third of that 96-second run, but nothing at all happened in the final minute. Where does that leave our timing theories? No, no indication that I know of either way. We're probably having to head towards int ACTIVE_TASK::copy_output_files() (app_start.cpp, line 469) copy_output_files(); (app_control.cpp, line 591) res.stderr_out.append(buf); (client_state.cpp, line 1845 - but that's in an error handler) None of them point to any extra debug logging that might be available and helpful. And only the middle one might help identify how stderr.txt gets into the state structure ready for the report - I haven't fully unravelled that yet. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Richard, you also produced an invalid via the truncation mechanism Task 1184514362 hmmm, could be winning. that *possibly* would place the truncated with content version as terminated between the stderr buffer flush and it reaching disk, and the entirely blank one the time it would take to write that much data, earlier. To my mind the race still lies, after the flush, between the hard terminateprocess cancelleation, and the precedding flush (before which would have created the empty file, or cleared existing content, because that's what happens with a memory mapped file, it empties it on open for write) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Richard, you also produced an invalid via the truncation mechanism Task 1184514362 So I have - that happened while I was working on the previous one. 11-Jul-2015 17:26:40 [---] [slot] cleaning out slots/1: handle_exited_app() Nothing much to go on there, either. Edit - the first one (totally blank stderr) happened while I was out of the house for a walk. The second one happened while I was at the console, opening and searching a 38 MB stdoutdae.txt file, and loading about 40 source code files into Notepad++. I'm not very good at cleaning out my working environment after use... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Looking if I see any visual studio looking settings in the milkyway CMake files (not sure yet). Joe *might* have a tool on hand capable of ramming commode.obj into the existing exe as a brutal anon platform test.... [Edit:] I see hints in there that visual studio is supported in the makefiles, though that doesn''t dictate it was used for the stock build there. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.