Message boards :
Number crunching :
Task Status "Postponed" -- ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I think I see a language usage issue here. One person thinks a "workaround" is not a "fix" although it does address the immediate problem. Another person considers a "workaround" that addresses the immediate problem to be a "fix". In production you WANT the workaround until a code change can fix it. In a testing environment you want to have it break to determine if the code change "worked". Tom A proud member of the OFA (Old Farts Association). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
In my terminology, a 'workround' is something that has to be applied individually: each user has to take action for him or her self. So the users who don't hear about it are left out. A 'fix' is something that is applied at a central point (SETI or NVidia), so it reaches everyone. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Since the “fix†is a new driver, each user has to individually apply it. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Since the “fix†is a new driver, each user has to individually apply it.But if they individually installed the faulty driver, we can have more confidence that they know how to do that, than we have about whether they will read these message boards. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
I worked for 2 hours tonight, with Richard Haselgrove, and we believe we have reproduced the following: On the SAME work unit (28oc11aa.6787.6611.5.32.85 with <true_angle_range>2.7274446668827</true_angle_range> ): using Windows 10 Insider Build 19002, NVIDIA drivers 436.51: - MBbench210, using device 0, RTX 2080: Runs forever but does not use GPU or CPU, never completes, basically stalls with no exit. - MBbench210, using device 1, GTX 980 Ti: Errors on clEnqueueMapBuffer(gpu_GPUState) - MBbench210, using device 2, GTX 980: Errors on clEnqueueMapBuffer(gpu_GPUState) My NVIDIA contact said that they are aware of an issue with Maxwell, but I'm uncertain if they are aware of an issue with the Turing RTX 2080. I will be trying to clean this up to post a link to shared zip files of inputs for testing ... as well as consider re-testing on various driver versions (like: 436.48, 435.80, 431.68, 431.60) Regards, Jacob Klein |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Those results match the observations I made from looking at your tasks. Pascal has the same issues as Turing with regards to running forever, it’s not exclusive to Turing. In addition to many others with the same problem, You had the same issue on your 1050ti system. The task is still there: https://setiathome.berkeley.edu/result.php?resultid=8138238060 Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Thanks. Even though both problem types result in the BOINC task timing out .. the actual behavior is different, as I have described. Will work to get even more info, soon. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I must have missed where you mentioned anything about Pascal cards. I see your comments about the Maxwell cards causing the error and the Turing cards timing out. I was just pointing out that the Pascal cards are doing the same thing as the Turing cards. Now to narrow down if this is an OpenCL problem or a Windows specific driver problem. One thing is for certain, this problem doesn’t exist with Linux 440.26 drivers while running a CUDA app. I downloaded several Arecibo VHARs that errored put on your system and they ran fine on mine (Linux/CUDA). You’ll have to use an old/slow app, but it’s probably a good idea to download a Windows CUDA app and try to run the same WU through it and see if you get the same behavior. I’ll download the SoG app on Linux and see if the issue pops up on Linux with 440 drivers. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
You're right. I am tired and won't be able to test a ton over the next week. Would my input file help anybody, do you think? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
feel free to post it, but any high AR Arecibo WU will have the same behavior. I have 3 of them that I pulled from your errored tasks last night. I'm running them on the Linux SoG app now, and it appears to be running fine, GPU utilization 100%, CPU utilization 100% of 1 core. but it will take some time for them to complete, oops. just finished. OS: Ubuntu 18.04.2 LTS Driver: Nvidia Linux 440.26 App: MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu Tool: KWSN-Bench-Linux-MBv7_v2.01.08 @TestBench:~$ nvidia-smi Mon Oct 21 21:16:22 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.26 Driver Version: 440.26 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1660 Off | 00000000:01:00.0 On | N/A | | 0% 47C P8 10W / 130W | 379MiB / 5941MiB | 6% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1650 Off | 00000000:03:00.0 Off | N/A | | 0% 38C P8 4W / 75W | 1MiB / 3911MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 928 G /usr/lib/xorg/Xorg 258MiB | | 0 1124 G /usr/bin/gnome-shell 118MiB | +-----------------------------------------------------------------------------+ compared to the CUDA 10 app results. all tests run on the same GTX 1660 (Turing) WUs tested: 14no08af.20888.2526.6.33.250.wu [WU true angle range is : 3.185754] 25se15aa.17104.8247.5.32.31.wu [WU true angle range is : 1.996717] 28oc11aa.13583.13564.5.32.170.wu [WU true angle range is : 2.728229] results summary: 99+% results similarity KWSN-Linux-MBbench v3.0 cache-keeping edition Running on TestBench at Tue 22 Oct 2019 01:05:39 AM UTC ---------------------------------------------------------------- Starting benchmark run... ---------------------------------------------------------------- Suspending BOINC Listing wu-file(s) in /testWUs : 14no08af.20888.2526.6.33.250.wu 25se15aa.17104.8247.5.32.31.wu 28oc11aa.13583.13564.5.32.170.wu Listing executable(s) in /APPS : setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 Listing executable in /REF_APPS : MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu ---------------------------------------------------------------- Current WU: 14no08af.20888.2526.6.33.250.wu ---------------------------------------------------------------- Skipping default app MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu, displaying saved result(s) Elapsed Time: ....................... 14 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs ./setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs 3.34 sec 0.88 sec 0.39 sec Elapsed Time : ...................... 3 seconds Speed compared to default : ......... 466 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 1 22 22 22 4 1 22 22 22 0 Autocorr 0 3 3 3 0 0 3 3 3 4 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 0 1 1 1 0 0 1 1 1 0 Best Spike 0 0 0 0 0 0 0 0 0 0 Best Autocorr 0 0 0 0 0 0 0 0 0 0 Best Gaussian 0 0 0 0 0 0 0 0 0 0 Best Pulse 0 0 0 0 0 0 0 0 0 0 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 26 26 26 4 1 26 26 26 4 Unmatched signal(s) in R1 at line(s) 711 744 777 810 Unmatched signal(s) in R2 at line(s) 762 779 796 813 For R1:R2 matched signals only, Q= 99.92% Result : Weakly similar. ---------------------------------------------------------------- Done with 14no08af.20888.2526.6.33.250.wu ==================================================================== Current WU: 25se15aa.17104.8247.5.32.31.wu ---------------------------------------------------------------- Skipping default app MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu, displaying saved result(s) Elapsed Time: ....................... 205 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs ./setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs 51.44 sec 40.25 sec 8.99 sec Elapsed Time : ...................... 52 seconds Speed compared to default : ......... 394 % ----------------- Comparing results Result : Strongly similar, Q= 99.16% ---------------------------------------------------------------- Done with 25se15aa.17104.8247.5.32.31.wu ==================================================================== Current WU: 28oc11aa.13583.13564.5.32.170.wu ---------------------------------------------------------------- Skipping default app MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu, displaying saved result(s) Elapsed Time: ....................... 202 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs ./setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs 51.48 sec 40.27 sec 9.17 sec Elapsed Time : ...................... 51 seconds Speed compared to default : ......... 396 % ----------------- Comparing results Result : Strongly similar, Q= 99.91% ---------------------------------------------------------------- Done with 28oc11aa.13583.13564.5.32.170.wu ==================================================================== Hosts CPU data ... model name : Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz cpu cores : 4 cpu MHz : 3600.018 cache size : 6144 KB flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d Done with Benchmark run! Removing temporary files! Resuming BOINC I have the stderr.txt files if you want them, but I think this shows it's a problem on the windows side, not necessarily just OpenCL vs CUDA, though if you're able to run a CUDA app on Windows, then its more specificaly a problem on the OpenCL part of the windows drivers. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Closing in on additional results within the hour.. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Alright. Here is an updated summary, along with Repro steps, along with results on all 6 of my GPUs... that took me about 4 hours to put together. It shows that the SETI OpenCL is broken on NVIDIA drivers 436.02 through 436.51, for: Maxwell, Pascal, and Turing. The OneDrive link has access to all of my files, in case you want to look at them, including OpenCL SDK example tests which didn't give any new info. I will say this, from what I heard from a birdie -- It will be interesting what happens *later this week*, as I'm hopeful to get to re-test this <hint, hint>. Enjoy. ----- Problem: Some SETI tasks have problems on R435 NVIDIA drivers - Example work unit: 28oc11aa.6787.6611.5.32.85 - Drivers that fail: 436.02 through 436.51 - Drivers that work: 431.60 through 431.68 - Error behaviors: 1) RTX 2080 / GTX 1050 Ti: Stalls forever 2) GTX 970 / GTX 980 / GTX 980 Ti: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. Files for Repro (OneDrive link): https://1drv.ms/u/s!AgP0NBEuAPQRp6Fr322LD1BXy6rdAg?e=tLWOYt Repro steps: - Go to the OneDrive link - Download the "MBbench - OpenCL Testing" folder - Run the appropriate .cmd file, for whatever GPU device you want to test (0, 1, 2) - Expected results: GPU Usage should be high while the job completes successfully, with no error in the resulting "*benchMB.txt" file |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I also tested the R435 Linux drivers. And they don’t have the same problem. They ran OpenCL on the Arecibo VHARs just as well. Did you ever test a CUDA app on the R435 Windows drivers? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Did you ever test a CUDA app on the R435 Windows drivers?I'm planning to do that during maintenance this afternoon. That test will also provide data points for Windows 7, and GTX 750 Ti, neither of which have been mentioned so far, AFAIK. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
If someone could post the app files needed to run: - the CUDA test that you recommend - the non-SOG OpenCL app ... I can possibly attempt it during the week. Post a link to the files in this thread please, since I am monitoring it. Also, can I run those apps using the same input files that I have been testing? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I'll try to dig them out while my test is running. You need the apps and the matching support files (cudart and cufft DLLs for CUDA, FFTW and .cl files for OpenCL), but otherwise the same procedures as last night still apply. Yes, for the time being, it would be best to stick with the same input file that we already know triggers the problem - change one thing at a time. We can extend the scope to other input files later if needed. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
If someone could post the app files needed to run: download the Lunatics installer here: http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=507 you'll need to extract the app and all of the supporting files. just extract them to whatever directory you want, I'd avoid extracting to your actual BOINC directory since this is only for testing. You can pull out the CUDA50, CUDA42, and CUDA32 apps this way. yes you can use the same input file for any MB app. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Unfortunately, we never included the non-SoG OpenCL app in a Lunatics installer, because it was much slower than SoG. And it doesn't look like it was ever deployed as stock for Windows, either. Unless anyone has a copy lying around, we'll have to give that one a miss. [opencl_nvidia_sah is available for Linux, but the Linux drivers don't need testing, according to this thread so far] |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I've run through the same tests as Jacob ran last night, with these differences. * Windows 7, not Windows 10 * International (UK) drivers, not US * Available cards GTX 970, GTX 750 Ti No errors found. Task ran to completion with Q >99.98% under drivers 431.60, 436.02, 436.48 And it also ran to successful completion under brand new driver 440.97 released today. 440.97 seems significantly quicker on GTX 970 (same speed on 750), so I'll stick with it for a few days. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I don't see anything in the release notes of the 440.97 driver (i checked the long form PDF package) that indicates they fixed anything related to OpenCL compute. looking like it's a problem specific to Windows 10 then. If it ends up fixed in win10 with 440 drivers, then it's either by accident (unlikely), or nvidia just isn't admitting there was a problem. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.