Message boards :
Number crunching :
Stderr Truncations
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
same wait for event and sleep between polls. And clFinish after some (not time-critical?) kernels. #ifdef _WIN32 #define mwMilliSleep(x) Sleep((DWORD) (x)) /* The usleep() in MinGW tries to round up to avoid sleeping for 0 */ #define mwMicroSleep(x) Sleep(((DWORD) (x) + 999) / 1000) #else #define mwMilliSleep(x) usleep((useconds_t) 1000 * (x)) #define mwMicroSleep(x) usleep((useconds_t)(x)) #endif /* _WIN32 */ Still nothing better than old Sleep() for Win32. And Sleep requires big enough kernel (due to its own granularity). "Not interesting" case. What is interesting (but unrelated to CPU usage) is direct IL usage technique. Some of their kernels written on IL, smth like macro-assembler for CPUs but for GPU. This could account for great speed and GPU utilization of MW. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Richard, you also produced an invalid via the truncation mechanism Task 1184514362 Um...I know I'm not a Milkyway guy, but are those are those log entries actually from that particular task? I see ps_fast_15_3s_136_sim1Jun1_1_1434554402_8833447_0 in the task detail. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
If the question is kernel size, then the answer becomes choosing a development platform that suits the granularity of the problem under consideration. only 2 first really matter (granularity of runtime support in this particular case is Windows OS quantum and Sleep() call granularity). Algorithm bound with task one try to solve. IF it's large area integration for example, it has nothing to do with SETI. And if it's FFT-based signal modification it has nothing to do with Nbody problem. And device.... well, the imperative here to use any available computational device, not the one with "right granularity" for particular problem. So I still missing the point in your solution to CPU usage issue for OpenCL NV runtime. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
Um...I know I'm not a Milkyway guy, but are those are those log entries actually from that particular task? I see ps_fast_15_3s_136_sim1Jun1_1_1434554402_8833447_0 in the task detail. Beg pardon, I never was good at multitasking. Try 11-Jul-2015 17:37:04 [---] [slot] cleaning out slots/0: handle_exited_app() I did think the times looked a bit odd, too. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
... well, the imperative here to use any available computational device, not the one with "right granularity" for particular problem. So I still missing the point in your solution to CPU usage issue for OpenCL NV runtime. That takes us back to the nub of my question. Is it an imperative, and if so, so whom? I suspect it's less an imperative for the project, than for a user who shelled out personal bucks for hardware and doesn't like to be told "it isn't the best device for this project, go try use it somewhere else" - as I believe has happened with single-precision hardware at Milkyway. I don't think you could say it was an imperative for Milkyway to re-write their algorithm in a form suitable for single-precision devices. But what's a "device", anyway? Is it pure hardware, hence the requirement for double-precision purchases for MW? Or is it the combination of the hardware and the software chosen to run on it? I believe it's mathematically possible to achieve double-precision accuracy on single-precision hardware, by software emulation. But it's horribly slow - so one doesn't choose to use that method, even if it would enable a wider range of (hardware) devices to be utilised. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Chuckle...no material difference anyway. Here's a new twist for you, though. I just went and looked at my xw9400 to see if Slot 16 had been created in the latest run and found that it had, although it was now empty. A check of the log found an AP task, #4260258349, with an apparently complete Stderr but which failed the slot clean out, causing the subsequent task to get assigned to Slot 16. Here are the relevant log entries: 11-Jul-2015 02:58:58 [---] [slot] cleaning out slots/3: get_free_slot() 11-Jul-2015 02:58:58 [SETI@home] [slot] assigning slot 3 to ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2 11-Jul-2015 02:58:58 [SETI@home] [cpu_sched] Preempting 12fe15aa.25759.18071.438086664205.12.13.vlar_1 (removed from memory) 11-Jul-2015 02:58:58 [---] [slot] removed file slots/3/init_data.xml 11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe to slots/3/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe 11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/libfftw3f-3-3-4_x86.dll to slots/3/libfftw3f-3-3-4_x86.dll 11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_Kernels_r2887.cl to slots/3/AstroPulse_Kernels_r2887.cl 11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_cmdline_win_x86_SSE2_OpenCL_NV.txt to slots/3/ap_cmdline.txt 11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_NV_config.xml to slots/3/AstroPulse_NV_config.xml 11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu to slots/3/in.dat 11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0 to slots/3/pulse.out 11-Jul-2015 02:58:58 [---] [slot] removed file slots/3/boinc_temporary_exit 11-Jul-2015 02:58:58 [SETI@home] Starting task ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2 11-Jul-2015 02:58:58 [SETI@home] [cpu_sched] Starting task ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2 using astropulse_v7 version 705 (cuda_opencl_100) in slot 3 ... 11-Jul-2015 03:39:56 [SETI@home] Message from task: 0 11-Jul-2015 03:39:56 [---] [slot] cleaning out slots/3: handle_exited_app() 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/ap_cmdline.txt 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/ap_state.dat0 11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/ap_state.dat1: unlink() failed 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/AstroPulse_Kernels_r2887.cl 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/AstroPulse_NV_config.xml 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/boinc_finish_called 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/boinc_task_state.xml 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/in.dat 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/indices.txt 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/init_data.xml 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/libfftw3f-3-3-4_x86.dll 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/pulse.out 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/pulse.out0 11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/pulse.out1 11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/stderr.txt: unlink() failed 11-Jul-2015 03:39:56 [SETI@home] Computation for task ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2 finished 11-Jul-2015 03:39:56 [---] [slot] cleaning out slots/3: get_free_slot() 11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/ap_state.dat1: unlink() failed 11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/stderr.txt: unlink() failed 11-Jul-2015 03:39:56 [SETI@home] [slot] failed to clean out dir: unlink() failed 11-Jul-2015 03:39:56 [SETI@home] [slot] assigning slot 16 to ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0 11-Jul-2015 03:39:56 [---] [slot] removed file slots/16/init_data.xml 11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe to slots/16/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe 11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/libfftw3f-3-3-4_x86.dll to slots/16/libfftw3f-3-3-4_x86.dll 11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_Kernels_r2887.cl to slots/16/AstroPulse_Kernels_r2887.cl 11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_cmdline_win_x86_SSE2_OpenCL_NV.txt to slots/16/ap_cmdline.txt 11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_NV_config.xml to slots/16/AstroPulse_NV_config.xml 11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_04jn15aa_B2_P1_00172_20150710_03651.wu to slots/16/in.dat 11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0_0 to slots/16/pulse.out 11-Jul-2015 03:39:56 [---] [slot] removed file slots/16/boinc_temporary_exit 11-Jul-2015 03:39:56 [SETI@home] Starting task ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0 11-Jul-2015 03:39:56 [SETI@home] [cpu_sched] Starting task ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0 using astropulse_v7 version 705 (cuda_opencl_100) in slot 16 11-Jul-2015 03:39:59 [SETI@home] Started upload of ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0 11-Jul-2015 03:40:02 [SETI@home] Finished upload of ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0 11-Jul-2015 03:40:02 [---] [slot] removed file projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0 11-Jul-2015 03:40:02 [---] [slot] removed file projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0.gz 11-Jul-2015 03:40:02 [---] [slot] removed file projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0.gzt More grist for somebody's mill, perhaps? EDIT: Went back and highlighted the first "ap_state.dat1: unlink() failed", which I originally overlooked. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
... well, the imperative here to use any available computational device, not the one with "right granularity" for particular problem. So I still missing the point in your solution to CPU usage issue for OpenCL NV runtime. 1) Imperative of BOINC very existence: to use free (initially, CPU) cycles for doing science. And to select right science by available hardware? What a nonsence. if one doesn't see the need in some research or don't want to support it personally - why one should think about what hardware best suits?? 2) you plain wrong here, regarding double precision emulation. That's exactly what was used for CUDA and later for all GPU SETI builds. Both AP and MB require double precision clculations to keep precision in trigonometric functions. And emulation used. Moreover, at least for some GPUs with double precision hardware support it happens to be faster. So, yep, we did it. And did it successfully. MW is another case cause there DP required everywhere along the path AFAIK. Well, just another point not to directly compare different algorithms and their possible issues in implementation. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
A new one: 11-Jul-2015 18:57:59 [---] [slot] cleaning out slots/0: handle_exited_app() (stderr truncated) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
Error 32 From https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx: ERROR_SHARING_VIOLATION sandbox.cpp, lines 237-248 static int delete_project_owned_file_aux(const char* path) { #ifdef _WIN32 if (DeleteFile(path)) return 0; int error = GetLastError(); if (error == ERROR_FILE_NOT_FOUND) { return 0; } if (error == ERROR_ACCESS_DENIED) { SetFileAttributes(path, FILE_ATTRIBUTE_NORMAL); if (DeleteFile(path)) return 0; } return error; |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
yeah, bounced a suggestion in response to the global 'Ãnyone?' dev list cry for ideas. Will see if waiting for the I have no idea what's going on' flag, was a suitable cue''or not I guess. [Edit:] for posterity ... Anyone have any ideas?... "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
David also asked that anyone planning to submit further evidence on this problem "set <task_debug> as well [as <slot_debug>]". I've done that, but Murphy's law dictates that I haven't seen a failure to delete stderr.txt since then. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
David also asked that anyone planning to submit further evidence on this problem "set <task_debug> as well [as <slot_debug>]". I've done that, but Murphy's law dictates that I haven't seen a failure to delete stderr.txt since then. Okay, will add that when I start the xw9400 back up this evening. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Ugh, did that boinc_dev list really remove my end of lines in the email trial patch ? or just my hotmail did it ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for. We're looking for either/both, but most of all we're looking for evidence like those 'unlink failed' or 'error 32' which might hint at the underlying cause. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
Ugh, did that boinc_dev list really remove my end of lines in the email trial patch ? or just my hotmail did it ? I think it's the webmail interfaces we're both sending with. I'm using BT Internet, but it's piggybacking on a Yahoo service. Since we're copying direct to David, and he's using a text only mail client, he'll see the clean version with line breaks and without the font variations. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for. That being the one second sleep followed by a debugbreak, that seems to be there because they didn't put the required wait in after TerminateProcess(). That debugbreak() will attempt to download symbols. You could probably watch internet connections spark up on some scenarios at least. If you already had the requisite symbol PDB file in place, you could probably get an intact stderr with a boinc debugger dump in it. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for. When we were looking at these early last year, I found that there was a tendency for a given machine or OS to produce either one type or the other, though it wasn't entirely consistent. My xw9400 (running Win XP) generally produces the truncated ones and I think my other XP machines did, also. My daily driver (Win Vista) and a T7400 (Win 8.1) generally put out the entirely empty variety. Which reminds me. I could reactivate that T7400 and install BOINC v7.6.2 on it if samples of empty Stderrs on S@h would be useful. (I'm not going to put v7.6.2 on my daily driver. The terms "MAY BE UNSTABLE" and daily driver don't go together in my world!) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
David also asked that anyone planning to submit further evidence on this problem "set <task_debug> as well [as <slot_debug>]". I've done that, but Murphy's law dictates that I haven't seen a failure to delete stderr.txt since then. Richard, curious and curiouser. I don't know if its Murphy or not, but since I set the <task_debug> along with the previous <slot_debug> and <cpu_sched> flags I haven't produced an invalid at MW yet. I haven't had a really good run of 1.36 app tasks on either machine though since I changed the flags. Will continue to monitor their progress. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
If the stable situation continues only with the flag, aside from a bug in the actual flag code, there can be effects because you're dealing with a statistical time related phenomena. You're 'looking' at the process and affecting the outcome by altering the client behaviour. IOW, reaching for your binoculars to check on schrodinger's box gave just enough time for the cat to decide it wasn't dead, and run away. (The Boinc client and the app are sharing the system, and take turns in timeslices [quanta] ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.