Task Status "Postponed" -- ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.

Message boards : Number crunching : Task Status "Postponed" -- ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2016113 - Posted: 20 Oct 2019, 23:40:42 UTC

I think I see a language usage issue here.

One person thinks a "workaround" is not a "fix" although it does address the immediate problem.
Another person considers a "workaround" that addresses the immediate problem to be a "fix".

In production you WANT the workaround until a code change can fix it. In a testing environment you want to have it break to determine if the code change "worked".

Tom
A proud member of the OFA (Old Farts Association).
ID: 2016113 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016139 - Posted: 21 Oct 2019, 8:42:46 UTC - in response to Message 2016113.  

In my terminology, a 'workround' is something that has to be applied individually: each user has to take action for him or her self. So the users who don't hear about it are left out.

A 'fix' is something that is applied at a central point (SETI or NVidia), so it reaches everyone.
ID: 2016139 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016148 - Posted: 21 Oct 2019, 11:59:33 UTC - in response to Message 2016139.  

Since the “fix” is a new driver, each user has to individually apply it.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016148 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016155 - Posted: 21 Oct 2019, 12:52:13 UTC - in response to Message 2016148.  

Since the “fix” is a new driver, each user has to individually apply it.
But if they individually installed the faulty driver, we can have more confidence that they know how to do that, than we have about whether they will read these message boards.
ID: 2016155 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016190 - Posted: 21 Oct 2019, 23:25:27 UTC
Last modified: 21 Oct 2019, 23:26:46 UTC

I worked for 2 hours tonight, with Richard Haselgrove, and we believe we have reproduced the following:

On the SAME work unit (28oc11aa.6787.6611.5.32.85 with <true_angle_range>2.7274446668827</true_angle_range> ):
using Windows 10 Insider Build 19002, NVIDIA drivers 436.51:
- MBbench210, using device 0, RTX 2080: Runs forever but does not use GPU or CPU, never completes, basically stalls with no exit.
- MBbench210, using device 1, GTX 980 Ti: Errors on clEnqueueMapBuffer(gpu_GPUState)
- MBbench210, using device 2, GTX 980: Errors on clEnqueueMapBuffer(gpu_GPUState)

My NVIDIA contact said that they are aware of an issue with Maxwell, but I'm uncertain if they are aware of an issue with the Turing RTX 2080.

I will be trying to clean this up to post a link to shared zip files of inputs for testing
... as well as consider re-testing on various driver versions (like: 436.48, 435.80, 431.68, 431.60)

Regards,
Jacob Klein
ID: 2016190 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016192 - Posted: 22 Oct 2019, 0:00:37 UTC - in response to Message 2016190.  
Last modified: 22 Oct 2019, 0:00:51 UTC

Those results match the observations I made from looking at your tasks.

Pascal has the same issues as Turing with regards to running forever, it’s not exclusive to Turing. In addition to many others with the same problem, You had the same issue on your 1050ti system. The task is still there: https://setiathome.berkeley.edu/result.php?resultid=8138238060
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016192 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016194 - Posted: 22 Oct 2019, 0:11:16 UTC - in response to Message 2016192.  
Last modified: 22 Oct 2019, 0:13:41 UTC

Thanks.

Even though both problem types result in the BOINC task timing out .. the actual behavior is different, as I have described.

Will work to get even more info, soon.
ID: 2016194 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016195 - Posted: 22 Oct 2019, 0:23:58 UTC - in response to Message 2016194.  

I must have missed where you mentioned anything about Pascal cards. I see your comments about the Maxwell cards causing the error and the Turing cards timing out.

I was just pointing out that the Pascal cards are doing the same thing as the Turing cards.

Now to narrow down if this is an OpenCL problem or a Windows specific driver problem. One thing is for certain, this problem doesn’t exist with Linux 440.26 drivers while running a CUDA app. I downloaded several Arecibo VHARs that errored put on your system and they ran fine on mine (Linux/CUDA).

You’ll have to use an old/slow app, but it’s probably a good idea to download a Windows CUDA app and try to run the same WU through it and see if you get the same behavior. I’ll download the SoG app on Linux and see if the issue pops up on Linux with 440 drivers.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016195 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016197 - Posted: 22 Oct 2019, 0:28:46 UTC - in response to Message 2016195.  

You're right. I am tired and won't be able to test a ton over the next week. Would my input file help anybody, do you think?
ID: 2016197 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016204 - Posted: 22 Oct 2019, 1:14:03 UTC - in response to Message 2016197.  
Last modified: 22 Oct 2019, 1:17:00 UTC

feel free to post it, but any high AR Arecibo WU will have the same behavior. I have 3 of them that I pulled from your errored tasks last night.

I'm running them on the Linux SoG app now, and it appears to be running fine, GPU utilization 100%, CPU utilization 100% of 1 core. but it will take some time for them to complete, oops. just finished.

OS: Ubuntu 18.04.2 LTS
Driver: Nvidia Linux 440.26
App: MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu
Tool: KWSN-Bench-Linux-MBv7_v2.01.08

@TestBench:~$ nvidia-smi
Mon Oct 21 21:16:22 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1660    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   47C    P8    10W / 130W |    379MiB /  5941MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1650    Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   38C    P8     4W /  75W |      1MiB /  3911MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       928      G   /usr/lib/xorg/Xorg                           258MiB |
|    0      1124      G   /usr/bin/gnome-shell                         118MiB |
+-----------------------------------------------------------------------------+


compared to the CUDA 10 app results. all tests run on the same GTX 1660 (Turing)

WUs tested:
14no08af.20888.2526.6.33.250.wu [WU true angle range is : 3.185754]
25se15aa.17104.8247.5.32.31.wu [WU true angle range is : 1.996717]
28oc11aa.13583.13564.5.32.170.wu [WU true angle range is : 2.728229]

results summary: 99+% results similarity
KWSN-Linux-MBbench v3.0 cache-keeping edition
Running on TestBench at Tue 22 Oct 2019 01:05:39 AM UTC
----------------------------------------------------------------
Starting benchmark run...
----------------------------------------------------------------
Suspending BOINC
Listing wu-file(s) in /testWUs :
14no08af.20888.2526.6.33.250.wu
25se15aa.17104.8247.5.32.31.wu
28oc11aa.13583.13564.5.32.170.wu

Listing executable(s) in /APPS :
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101

Listing executable in /REF_APPS :
MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu
----------------------------------------------------------------
Current WU: 14no08af.20888.2526.6.33.250.wu

----------------------------------------------------------------
Skipping default app MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu, displaying saved result(s)
Elapsed Time: ....................... 14 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs
./setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs 3.34 sec 0.88 sec 0.39 sec
Elapsed Time : ...................... 3 seconds
Speed compared to default : ......... 466 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      1     22     22     22      4        1     22     22     22      0
     Autocorr      0      3      3      3      0        0      3      3      3      4
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      0        0      0      0      0      0
      Triplet      0      1      1      1      0        0      1      1      1      0
   Best Spike      0      0      0      0      0        0      0      0      0      0
Best Autocorr      0      0      0      0      0        0      0      0      0      0
Best Gaussian      0      0      0      0      0        0      0      0      0      0
   Best Pulse      0      0      0      0      0        0      0      0      0      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     26     26     26      4        1     26     26     26      4

Unmatched signal(s) in R1 at line(s) 711 744 777 810
Unmatched signal(s) in R2 at line(s) 762 779 796 813
For R1:R2 matched signals only, Q= 99.92%
Result      : Weakly similar.

----------------------------------------------------------------
Done with 14no08af.20888.2526.6.33.250.wu

====================================================================
Current WU: 25se15aa.17104.8247.5.32.31.wu

----------------------------------------------------------------
Skipping default app MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu, displaying saved result(s)
Elapsed Time: ....................... 205 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs
./setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs 51.44 sec 40.25 sec 8.99 sec
Elapsed Time : ...................... 52 seconds
Speed compared to default : ......... 394 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.16%

----------------------------------------------------------------
Done with 25se15aa.17104.8247.5.32.31.wu

====================================================================
Current WU: 28oc11aa.13583.13564.5.32.170.wu

----------------------------------------------------------------
Skipping default app MBv8_8.23r3602_sse2_clNV_SoG_x86_64-pc-linux-gnu, displaying saved result(s)
Elapsed Time: ....................... 202 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs
./setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs 51.48 sec 40.27 sec 9.17 sec
Elapsed Time : ...................... 51 seconds
Speed compared to default : ......... 396 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.91%

----------------------------------------------------------------
Done with 28oc11aa.13583.13564.5.32.170.wu

====================================================================
Hosts CPU data ...
model name	: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz
cpu cores	: 4
cpu MHz		: 3600.018
cache size	: 6144 KB
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

Done with Benchmark run! Removing temporary files!
Resuming BOINC


I have the stderr.txt files if you want them, but I think this shows it's a problem on the windows side, not necessarily just OpenCL vs CUDA, though if you're able to run a CUDA app on Windows, then its more specificaly a problem on the OpenCL part of the windows drivers.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016204 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016212 - Posted: 22 Oct 2019, 3:37:10 UTC

Closing in on additional results within the hour..
ID: 2016212 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016218 - Posted: 22 Oct 2019, 4:59:37 UTC
Last modified: 22 Oct 2019, 5:13:26 UTC

Alright. Here is an updated summary, along with Repro steps, along with results on all 6 of my GPUs... that took me about 4 hours to put together.

It shows that the SETI OpenCL is broken on NVIDIA drivers 436.02 through 436.51, for: Maxwell, Pascal, and Turing.

The OneDrive link has access to all of my files, in case you want to look at them, including OpenCL SDK example tests which didn't give any new info.

I will say this, from what I heard from a birdie -- It will be interesting what happens *later this week*, as I'm hopeful to get to re-test this <hint, hint>.

Enjoy.

-----
Problem:

Some SETI tasks have problems on R435 NVIDIA drivers
- Example work unit: 28oc11aa.6787.6611.5.32.85
- Drivers that fail: 436.02 through 436.51
- Drivers that work: 431.60 through 431.68
- Error behaviors:
1) RTX 2080 / GTX 1050 Ti: Stalls forever
2) GTX 970 / GTX 980 / GTX 980 Ti: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.

Files for Repro (OneDrive link):
https://1drv.ms/u/s!AgP0NBEuAPQRp6Fr322LD1BXy6rdAg?e=tLWOYt

Repro steps:
- Go to the OneDrive link
- Download the "MBbench - OpenCL Testing" folder
- Run the appropriate .cmd file, for whatever GPU device you want to test (0, 1, 2)
- Expected results: GPU Usage should be high while the job completes successfully, with no error in the resulting "*benchMB.txt" file
ID: 2016218 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016240 - Posted: 22 Oct 2019, 11:56:33 UTC

I also tested the R435 Linux drivers. And they don’t have the same problem. They ran OpenCL on the Arecibo VHARs just as well.

Did you ever test a CUDA app on the R435 Windows drivers?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016240 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016243 - Posted: 22 Oct 2019, 12:15:39 UTC - in response to Message 2016240.  

Did you ever test a CUDA app on the R435 Windows drivers?
I'm planning to do that during maintenance this afternoon. That test will also provide data points for Windows 7, and GTX 750 Ti, neither of which have been mentioned so far, AFAIK.
ID: 2016243 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016244 - Posted: 22 Oct 2019, 12:18:27 UTC - in response to Message 2016243.  
Last modified: 22 Oct 2019, 12:20:18 UTC

If someone could post the app files needed to run:
- the CUDA test that you recommend
- the non-SOG OpenCL app

... I can possibly attempt it during the week.
Post a link to the files in this thread please, since I am monitoring it.

Also, can I run those apps using the same input files that I have been testing?
ID: 2016244 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016246 - Posted: 22 Oct 2019, 12:27:58 UTC - in response to Message 2016244.  

I'll try to dig them out while my test is running. You need the apps and the matching support files (cudart and cufft DLLs for CUDA, FFTW and .cl files for OpenCL), but otherwise the same procedures as last night still apply. Yes, for the time being, it would be best to stick with the same input file that we already know triggers the problem - change one thing at a time. We can extend the scope to other input files later if needed.
ID: 2016246 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016247 - Posted: 22 Oct 2019, 12:41:29 UTC - in response to Message 2016244.  

If someone could post the app files needed to run:
- the CUDA test that you recommend
- the non-SOG OpenCL app

... I can possibly attempt it during the week.
Post a link to the files in this thread please, since I am monitoring it.

Also, can I run those apps using the same input files that I have been testing?


download the Lunatics installer here: http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=507

you'll need to extract the app and all of the supporting files. just extract them to whatever directory you want, I'd avoid extracting to your actual BOINC directory since this is only for testing. You can pull out the CUDA50, CUDA42, and CUDA32 apps this way.

yes you can use the same input file for any MB app.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016247 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016250 - Posted: 22 Oct 2019, 16:35:47 UTC - in response to Message 2016247.  

Unfortunately, we never included the non-SoG OpenCL app in a Lunatics installer, because it was much slower than SoG. And it doesn't look like it was ever deployed as stock for Windows, either. Unless anyone has a copy lying around, we'll have to give that one a miss.

[opencl_nvidia_sah is available for Linux, but the Linux drivers don't need testing, according to this thread so far]
ID: 2016250 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016262 - Posted: 22 Oct 2019, 17:59:10 UTC

I've run through the same tests as Jacob ran last night, with these differences.

* Windows 7, not Windows 10
* International (UK) drivers, not US
* Available cards GTX 970, GTX 750 Ti

No errors found. Task ran to completion with Q >99.98% under drivers 431.60, 436.02, 436.48

And it also ran to successful completion under brand new driver 440.97 released today.

440.97 seems significantly quicker on GTX 970 (same speed on 750), so I'll stick with it for a few days.
ID: 2016262 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016271 - Posted: 22 Oct 2019, 18:50:54 UTC - in response to Message 2016262.  

I don't see anything in the release notes of the 440.97 driver (i checked the long form PDF package) that indicates they fixed anything related to OpenCL compute.

looking like it's a problem specific to Windows 10 then. If it ends up fixed in win10 with 440 drivers, then it's either by accident (unlikely), or nvidia just isn't admitting there was a problem.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016271 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Task Status "Postponed" -- ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.