Message boards :
Number crunching :
Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)
Message board moderation
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 32 · Next
Author | Message |
---|---|
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl 15-Sep-2016 23:59:13 [SETI@home] Starting task blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0 15-Sep-2016 23:59:21 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 23:59:21 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 23:59:21 [SETI@home] Task postponed: CL file build failure 16-Sep-2016 01:28:50 [SETI@home] Computation for task blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0 finished 16-Sep-2016 01:28:52 [SETI@home] Started upload of blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0_0 16-Sep-2016 01:28:55 [SETI@home] Finished upload of blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0_0 I could possibly see your argument about 2 identical GPUs starting tasks at nearly the same time, except if that was the case I should have seen this error on a much more frequent basis. There have been times when I've seen both GPUs start tasks at the same/nearly the same time and complete successfully the first time around, regardless if they came from Arecibo of Green Banks or a mixture of the two. I don't buy computers, I build them!! |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
One possible thought: since Cliff has two identical GPUs in the host, if two copies of the app tried to start at nearly the same instant, might one suffer an access problem? I had a somewhat similar timing problem over on Beta when I added a new GPU and tried to start 3 tasks at once. (See Message 59299) I just got the third GTX 960 installed in my newly acquired xw9400 and am running stock for initial testing. The first 3 tasks D/L'd were 8.17 SoG, but 2 of them failed immediately because, apparently, all three of them couldn't compile the new Kernel at the same time. Go figure! However, those actually resulted in errors, not just postponements. (See Task 24538029.) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Will r3528 come out in beta -5 or do I need to do a stand-alone install? So far it's in best gaussian only, all reportable reported OK. Until smth new appears it's not show-stopper and r3528 is RC still (Besides, Zalster runs SoG and best was reported there). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Cliff, I think they are right, look to your commandline. Bring it down a bit and see if these resolve |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I could possibly see your argument about 2 identical GPUs starting tasks at nearly the same time, except if that was the case I should have seen this error on a much more frequent basis. There have been times when I've seen both GPUs start tasks at the same/nearly the same time and complete successfully the first time around, regardless if they came from Arecibo of Green Banks or a mixture of the two. After initial binary cache build those files are accessed only for read so non-blocking and can be read by as many instances as needed. It's rare enough event what you encountered. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Cliff, I think they are right, look to your commandline. Already done, but I still don't think that is the solution to the problem. That cmd_line has been used since beta -4 was put on main and I should have seen more of these errors. I think I've found another task with a similar build failure, this one from Arecibo. I looked at the stdoutae.txt and found the build failure line missing in the log, so I don't think it got to the notice tab either. 15-Sep-2016 19:30:19 [SETI@home] Starting task 22dc09ab.16767.2525.7.34.68_0 15-Sep-2016 19:30:26 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 19:30:26 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 20:09:38 [SETI@home] Computation for task 22dc09ab.16767.2525.7.34.68_0 finished https://setiathome.berkeley.edu/result.php?resultid=5156804058 Info: BOINC provided OpenCL device ID used ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Waiting 30 sec before restart... I don't buy computers, I build them!! |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Edited.. How many work units per card are you running? |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Cliff, I think they are right, look to your commandline. You've got lots of various "ERROR:" messages in your current task list, 24 according to my search. I think you're probably pushing some memory limits but problems only show up when multiple tasks are stretching the limit at the same time, possibly when trying to allocate buffers. Examples: Task 5156804058 ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Waiting 30 sec before restart... Task 5117714628 ERROR: OpenCL kernel/call 'Enqueueing kernel:Transpose4_kernel_cl(pulse)' call failed (-4) in file ..\analyzePoT.cpp near line 4282. Waiting 30 sec before restart... Task 525058453 ERROR: OpenCL kernel/call 'clCreateContext' call failed (-5) in file ..\..\..\src\GPU_lock.cpp near line 1299. Waiting 30 sec before restart... etc., etc. EDIT: In fact, here's the whole list, just without the task links: Task 5096108890 320: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5115334440 203: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5117714628 196: ERROR: OpenCL kernel/call 'Enqueueing kernel:Transpose4_kernel_cl(pulse)' call failed (-4) in file ..\analyzePoT.cpp near line 4282. Task 5125058453 93: ERROR: OpenCL kernel/call 'clCreateContext' call failed (-5) in file ..\..\..\src\GPU_lock.cpp near line 1299. Task 5129251397 196: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5141904299 225: ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Task 5142324411 205: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3006. Task 5144145197 211: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. 252: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5150396299 199: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3006. Task 5150672754 202: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2062. Task 5156316910 208: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2767. Task 5156330545 204: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5156392608 196: ERROR: OpenCL kernel/call ' oclFFT1: clEnqueueNDRangeKernel' call failed (-5) in file ..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 570. Task 5156772740 206: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5156773009 203: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3977. Task 5156804058 93: ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Task 5157134736 205: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2767. Task 5157247061 196: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5157618461 201: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2767. Task 5157618618 201: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5157658211 196: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1793. Task 5157969485 205: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5157982995 198: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3006. Task 5157988929 198: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
I've dropped my tasks to 2 per GPU w/ .5 CPU per task. I don't buy computers, I build them!! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
I've dropped my tasks to 2 per GPU w/ .5 CPU per task. SoG does best with 1 CPU core per WU. On my systems running SoG I ended up going with just 1WU at a time due to getting (very) intermittent "Finish file present" too long errors when running more than 1WU, and the gain in output per hour of running more than 1WU was so slight as to not really make it worth while anyway. i'm running with -tt 1500 -hp -period_iterations_num 3 -sbs 768 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -period_iterations_num 3 causes a fair bit of keyboard/screen delay, but I can live with it. From memory 5 resulted in almost none, 10 in none. Grant Darwin NT |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
SoG does best with 1 CPU core per WU. He's got -use_sleep in his cmdline, so I doubt if a full core is necessary. At least I didn't find it so in my recent experiments with SoG. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi Raistmer, . . What does RC mean please? Stephen . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
. . Hi Raistmer, In this context RC = Release Candidate. Grant Darwin NT |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
15-Sep-2016 23:59:13 [SETI@home] Starting task blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0 It is a Pity the new/current BOINC versions don't show "Restarting task ..." by default. Which log flag have to be enabled for "Restarting task ..." to appear in Event Log (without too much other clutter)? I also wonder why BOINC say two times in the same second "task postponed 30.000000 sec: " - is this for one or 2 tasks? And the string ends with "sec: " (yes, space after :) which suggests something is missing (maybe task name) and will be shown by some log flag. Â Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Which log flag have to be enabled for "Restarting task ..." to appear in Event Log (without too much other clutter)? That would be cpu_sched. 9/15/2016 9:12:34 PM | SETI@home | [cpu_sched] Restarting task 26fe09ab.11025.10706.9.36.130_0 using setiathome_v8 version 800 (cuda50) in slot 0 Yeah, that's irritated me from the time they made the change. I even rolled back to an earlier BOINC for a long time, but eventually had to start giving in. It's not just from the "Restarting" standpoint either, but from the lack of slot info if you don't turn on cpu_sched. The downside is that you end up with two "Starting task" messages in the log for every task, one without the slot number and one with it. You can't just replace one with the other. Just something we have to accept if we want that additional info, I guess. EDIT: Just found my original complaint about it, from March, 2014: BOINC 7.2.42 - Reduced info in Event Log? |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 |
My 440 does not appear to be causing any problems, so I now have SETI@home beta running on it. |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 |
[snip] I must have remembered some adapters from somewhere other than that computer. No connectors running hot. One 6-pin PCIE connector direct from the PSU. The other using one adapter from two separate Molex connectors. The valids only take one wingmate to appear. The invalids initially appear as inconclusive, and take a second wingmate to change the marking to invalid perhaps one day later. Several invalids yesterday; none have appeared yet today, probably due to this delay in marking them. I'm about to install SIV for the computer with the 560. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
My 440 does not appear to be causing any problems, so I now have SETI@home beta running on it. . . Fair enough, I do not know how many machines Raistmer needs running r3528/V8.19 for homologation but the more the merrier :) . . I am waiting for Lunatics 0.45 Beta(5) from Richard so I can put it on my rig with the 970s so I do not have to port them to Beta. That way they will have "real world" data to see how it performs. My little machine is running it on Beta at the moment. Stephen . |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . OK I hope it helps solve the problem for you. Stephen . |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
. . Hi Raistmer, RC=Release Candidate. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.