Message boards :
Number crunching :
Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 31 · Next
Author | Message |
---|---|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Will r3528 come out in beta -5 or do I need to do a stand-alone install? Raistmer has found something 'worth a deeper look', which suggests r3528 won't be the end of the line. So it's not worth hanging on for a full final release - I'll try and get a Beta5 out tomorrow, in the hope we can catch all the bugs in one go if we all combine forces. (A bit late to start that on a Friday night, this side of the pond) |
![]() ![]() Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 ![]() |
Can't copy/paste what isn't there, as there is no indication what task was involved. You can search your stdoutdae.txt and stdoutdae.old for: CL file build failure This will find the lines you already posted: 09/15/2016 23:59:21 | SETI@home | task postponed 30.000000 sec: 09/15/2016 23:59:21 | SETI@home | task postponed 30.000000 sec: 09/15/2016 23:59:21 | SETI@home | Task postponed: CL file build failure Maybe a few lines above is the "[SETI@home] Starting task ..." Â ![]() ![]() Â |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
Seems like it might be Task 5157268689. I have a utility that can retrieve all task details for a host, which can then be searched. That's the only currently listed task of his that has anything with "CL file build" in it, and the time frame looks to be about right. |
![]() ![]() Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 ![]() |
OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl CL file build log on device GeForce GTX 750 Ti INFO: can't build program from binary kernels, code 0 , recompiling from source... Error : Building Program (binary, clBuildProgram):main kernels: not OK code -6 CL file build log on device GeForce GTX 750 Ti  ![]() ![]()  |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl But followed at the next attempt by CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl ar=0.012665 NumCfft=117119 NumGauss=0 NumPulse=47842204544 NumTriplet=60817138848 Currently allocated 585 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 and from there on, it completed and was validated - with no sign of re-compiling, so the binary file was there all along. It would be interesting if Cliff could search the log files BilBg suggested for blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar and post the whole history, from the first attempt at running to final completion. One possible thought: since Cliff has two identical GPUs in the host, if two copies of the app tried to start at nearly the same instant, might one suffer an access problem? |
![]() ![]() Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 ![]() ![]() |
OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl 15-Sep-2016 23:59:13 [SETI@home] Starting task blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0 15-Sep-2016 23:59:21 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 23:59:21 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 23:59:21 [SETI@home] Task postponed: CL file build failure 16-Sep-2016 01:28:50 [SETI@home] Computation for task blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0 finished 16-Sep-2016 01:28:52 [SETI@home] Started upload of blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0_0 16-Sep-2016 01:28:55 [SETI@home] Finished upload of blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0_0 I could possibly see your argument about 2 identical GPUs starting tasks at nearly the same time, except if that was the case I should have seen this error on a much more frequent basis. There have been times when I've seen both GPUs start tasks at the same/nearly the same time and complete successfully the first time around, regardless if they came from Arecibo of Green Banks or a mixture of the two. ![]() ![]() I don't buy computers, I build them!! |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
One possible thought: since Cliff has two identical GPUs in the host, if two copies of the app tried to start at nearly the same instant, might one suffer an access problem? I had a somewhat similar timing problem over on Beta when I added a new GPU and tried to start 3 tasks at once. (See Message 59299) I just got the third GTX 960 installed in my newly acquired xw9400 and am running stock for initial testing. The first 3 tasks D/L'd were 8.17 SoG, but 2 of them failed immediately because, apparently, all three of them couldn't compile the new Kernel at the same time. Go figure! However, those actually resulted in errors, not just postponements. (See Task 24538029.) |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Will r3528 come out in beta -5 or do I need to do a stand-alone install? So far it's in best gaussian only, all reportable reported OK. Until smth new appears it's not show-stopper and r3528 is RC still (Besides, Zalster runs SoG and best was reported there). SETI apps news We're not gonna fight them. We're gonna transcend them. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Cliff, I think they are right, look to your commandline. Bring it down a bit and see if these resolve ![]() ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
I could possibly see your argument about 2 identical GPUs starting tasks at nearly the same time, except if that was the case I should have seen this error on a much more frequent basis. There have been times when I've seen both GPUs start tasks at the same/nearly the same time and complete successfully the first time around, regardless if they came from Arecibo of Green Banks or a mixture of the two. After initial binary cache build those files are accessed only for read so non-blocking and can be read by as many instances as needed. It's rare enough event what you encountered. SETI apps news We're not gonna fight them. We're gonna transcend them. |
![]() ![]() Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 ![]() ![]() |
Cliff, I think they are right, look to your commandline. Already done, but I still don't think that is the solution to the problem. That cmd_line has been used since beta -4 was put on main and I should have seen more of these errors. I think I've found another task with a similar build failure, this one from Arecibo. I looked at the stdoutae.txt and found the build failure line missing in the log, so I don't think it got to the notice tab either. 15-Sep-2016 19:30:19 [SETI@home] Starting task 22dc09ab.16767.2525.7.34.68_0 15-Sep-2016 19:30:26 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 19:30:26 [SETI@home] task postponed 30.000000 sec: 15-Sep-2016 20:09:38 [SETI@home] Computation for task 22dc09ab.16767.2525.7.34.68_0 finished https://setiathome.berkeley.edu/result.php?resultid=5156804058 Info: BOINC provided OpenCL device ID used ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Waiting 30 sec before restart... ![]() ![]() I don't buy computers, I build them!! |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Edited.. How many work units per card are you running? |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
Cliff, I think they are right, look to your commandline. You've got lots of various "ERROR:" messages in your current task list, 24 according to my search. I think you're probably pushing some memory limits but problems only show up when multiple tasks are stretching the limit at the same time, possibly when trying to allocate buffers. Examples: Task 5156804058 ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Waiting 30 sec before restart... Task 5117714628 ERROR: OpenCL kernel/call 'Enqueueing kernel:Transpose4_kernel_cl(pulse)' call failed (-4) in file ..\analyzePoT.cpp near line 4282. Waiting 30 sec before restart... Task 525058453 ERROR: OpenCL kernel/call 'clCreateContext' call failed (-5) in file ..\..\..\src\GPU_lock.cpp near line 1299. Waiting 30 sec before restart... etc., etc. EDIT: In fact, here's the whole list, just without the task links: Task 5096108890 320: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5115334440 203: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5117714628 196: ERROR: OpenCL kernel/call 'Enqueueing kernel:Transpose4_kernel_cl(pulse)' call failed (-4) in file ..\analyzePoT.cpp near line 4282. Task 5125058453 93: ERROR: OpenCL kernel/call 'clCreateContext' call failed (-5) in file ..\..\..\src\GPU_lock.cpp near line 1299. Task 5129251397 196: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5141904299 225: ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Task 5142324411 205: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3006. Task 5144145197 211: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. 252: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5150396299 199: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3006. Task 5150672754 202: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2062. Task 5156316910 208: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2767. Task 5156330545 204: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5156392608 196: ERROR: OpenCL kernel/call ' oclFFT1: clEnqueueNDRangeKernel' call failed (-5) in file ..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 570. Task 5156772740 206: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5156773009 203: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3977. Task 5156804058 93: ERROR: OpenCL kernel/call 'clCreateContext' call failed (999) in file ..\..\..\src\GPU_lock.cpp near line 1299. Task 5157134736 205: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2767. Task 5157247061 196: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5157618461 201: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2767. Task 5157618618 201: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3215. Task 5157658211 196: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1793. Task 5157969485 205: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. Task 5157982995 198: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3006. Task 5157988929 198: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 2512. |
![]() ![]() Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 ![]() ![]() |
I've dropped my tasks to 2 per GPU w/ .5 CPU per task. ![]() ![]() I don't buy computers, I build them!! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I've dropped my tasks to 2 per GPU w/ .5 CPU per task. SoG does best with 1 CPU core per WU. On my systems running SoG I ended up going with just 1WU at a time due to getting (very) intermittent "Finish file present" too long errors when running more than 1WU, and the gain in output per hour of running more than 1WU was so slight as to not really make it worth while anyway. i'm running with -tt 1500 -hp -period_iterations_num 3 -sbs 768 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -period_iterations_num 3 causes a fair bit of keyboard/screen delay, but I can live with it. From memory 5 resulted in almost none, 10 in none. Grant Darwin NT |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
SoG does best with 1 CPU core per WU. He's got -use_sleep in his cmdline, so I doubt if a full core is necessary. At least I didn't find it so in my recent experiments with SoG. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . Hi Raistmer, . . What does RC mean please? Stephen . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
. . Hi Raistmer, In this context RC = Release Candidate. Grant Darwin NT |
![]() ![]() Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 ![]() |
15-Sep-2016 23:59:13 [SETI@home] Starting task blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar_0 It is a Pity the new/current BOINC versions don't show "Restarting task ..." by default. Which log flag have to be enabled for "Restarting task ..." to appear in Event Log (without too much other clutter)? I also wonder why BOINC say two times in the same second "task postponed 30.000000 sec: " - is this for one or 2 tasks? And the string ends with "sec: " (yes, space after :) which suggests something is missing (maybe task name) and will be shown by some log flag. Â Â ![]() ![]() Â |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
Which log flag have to be enabled for "Restarting task ..." to appear in Event Log (without too much other clutter)? That would be cpu_sched. 9/15/2016 9:12:34 PM | SETI@home | [cpu_sched] Restarting task 26fe09ab.11025.10706.9.36.130_0 using setiathome_v8 version 800 (cuda50) in slot 0 Yeah, that's irritated me from the time they made the change. I even rolled back to an earlier BOINC for a long time, but eventually had to start giving in. It's not just from the "Restarting" standpoint either, but from the lack of slot info if you don't turn on cpu_sched. The downside is that you end up with two "Starting task" messages in the log for every task, one without the slot number and one with it. You can't just replace one with the other. Just something we have to accept if we want that additional info, I guess. EDIT: Just found my original complaint about it, from March, 2014: BOINC 7.2.42 - Reduced info in Event Log? |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.