Task Status "Postponed" -- ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.

Message boards : Number crunching : Task Status "Postponed" -- ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016272 - Posted: 22 Oct 2019, 18:53:01 UTC

They routinely fix things that aren't explicitly listed. Release notes even say so. I will test this new driver in the upcoming hours.
ID: 2016272 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016279 - Posted: 22 Oct 2019, 19:38:26 UTC
Last modified: 22 Oct 2019, 19:56:17 UTC

Bad news.

On both my PCs, the new R440 driver 440.97 exhibits the exact same hangs and errors as the R335 drivers did.

Damn.

I will be reporting the bug to NVIDIA in the upcoming days.

If you want to report too, please find the driver feedback thread on the NVIDIA forums, and fill out the feedback form typically linked in the first post.

Thanks.
ID: 2016279 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016281 - Posted: 22 Oct 2019, 20:02:20 UTC - in response to Message 2016279.  

At least we've narrowed it down (assuming your CUDA tests are positive) to being Windows 10, OpenCL specific.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016281 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2016289 - Posted: 22 Oct 2019, 21:49:13 UTC
Last modified: 22 Oct 2019, 21:55:38 UTC

It appears the Non-SoG Windows version was removed from BETA for some reason. The SoG Mac versions are still at BETA even though they stopped working Years ago.
The last Non-SoG Windows version I can find is here, https://setiathome.berkeley.edu/forum_thread.php?id=79765&postid=1801541#1801541
I don't have a clue as to how it works, I had stopped running Windows by then.
BTW, I had pegged this problem days ago by simply looking at the Hundreds of results available to anyone. Just go down the list, look for the 436 driver, and check for errors, https://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=640 You can find Hosts running the CUDA App at BETA, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=87127&offset=180
ID: 2016289 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016366 - Posted: 23 Oct 2019, 12:52:30 UTC - in response to Message 2016247.  

If someone could post the app files needed to run:
- the CUDA test that you recommend
- the non-SOG OpenCL app

... I can possibly attempt it during the week.
Post a link to the files in this thread please, since I am monitoring it.

Also, can I run those apps using the same input files that I have been testing?


download the Lunatics installer here: http://lunatics.kwsn.info/index.php?action=downloads;sa=view;down=507

you'll need to extract the app and all of the supporting files. just extract them to whatever directory you want, I'd avoid extracting to your actual BOINC directory since this is only for testing. You can pull out the CUDA50, CUDA42, and CUDA32 apps this way.

yes you can use the same input file for any MB app.


What exact files do I need to test the 3 CUDA versions?
ID: 2016366 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2016368 - Posted: 23 Oct 2019, 12:55:26 UTC - in response to Message 2016366.  

I can't say which exactly, I've never used any of those apps. Richard listed some files a few posts back.

probably safe to just copy them all to your benchmark tool testing directory, same place you put the app files.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016368 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016370 - Posted: 23 Oct 2019, 13:04:08 UTC - in response to Message 2016366.  
Last modified: 23 Oct 2019, 13:26:37 UTC

Give me 20 minutes - just got back in. I assembled the installer, so I can give you precise chapter and verse. And I have the master copies of the actual files.

Cuda 50
Lunatics_x41zi_win32_cuda50.exe
cudart32_50_35.dll
cufft32_50_35.dll

Cuda 42
Lunatics_x41zi_win32_cuda42.exe
cudart32_42_9.dll
cufft32_42_9.dll

Cuda 32
Lunatics_x41zi_win32_cuda32.exe
cudart32_32_16.dll
cufft32_32_16.dll

OpenCL SoG
MB8_win_x86_SSE3_OpenCL_NV_SoG_r3557.exe
libfftw3f-3-3-4_x86.dll
MultiBeam_Kernels_r3557.cl

These names are very slightly different from the final stock deployment, but they are internally self-consistent and very little different in actual content.

There is also a Cuda 23, but it is probably inadvisable to run it on modern hardware. Note that these are 32-bit files (as are all the SETI GPU applications - smaller and more efficient memory footprint), but I have taken the names from the 64-bit installer.

Hint: if you have the installer, place it in a separate folder. Create a file in the same folder called Lunatics.ini, and make the contents

[LunaticsInstaller]
TestMode=1
Then run the installer. It will give you plenty of warning that it is running in test mode, and will unpack the files into a safe place. If you don't see the test mode warnings, back out and try again.
ID: 2016370 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016440 - Posted: 24 Oct 2019, 1:16:35 UTC

Could anyone having this problem, please do the steps below, to report your findings to NVIDIA?

If NVIDIA gets more useful reports, they will fix it.

1) Find my repro steps here:
https://setiathome.berkeley.edu/forum_thread.php?id=84780&postid=2016218

2) Test the 431.60 drivers. Verify that they work for all your GPUs.

3) Test the 440.97 drivers. Verify that they FAIL for some of your GPUs.

4) Report your findings to NVIDIA, mentioning "SETI OpenCL", to the Driver Feedback page located here:
https://forms.gle/kJ9Bqcaicvjb82SdA

Hopefully the steps are straightforward enough to complete, but I admit they were written hastily.

Thank you!
ID: 2016440 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016475 - Posted: 24 Oct 2019, 13:42:07 UTC

I put a CUDA testing folder, and my CUDA results folder, both in the OneDrive share.

Could somebody please look at the results to see if they look correct?

Thanks.
ID: 2016475 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2016562 - Posted: 25 Oct 2019, 11:42:14 UTC

I just heard back from NVIDIA, regarding my feedback about breaking SETI OpenCL, and NVIDIA is looking into it.

Hurray!
ID: 2016562 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13959
Credit: 208,696,464
RAC: 304
Australia
Message 2016632 - Posted: 25 Oct 2019, 21:35:38 UTC - in response to Message 2016562.  

I just heard back from NVIDIA, regarding my feedback about breaking SETI OpenCL, and NVIDIA is looking into it.

Hurray!
Thanks for all your effort to date.
Greatly appreciated.
Grant
Darwin NT
ID: 2016632 · Report as offensive
Holdolin

Send message
Joined: 10 Apr 19
Posts: 68
Credit: 88,777,750
RAC: 30
United States
Message 2016643 - Posted: 25 Oct 2019, 22:06:40 UTC - in response to Message 2016562.  

I just heard back from NVIDIA, regarding my feedback about breaking SETI OpenCL, and NVIDIA is looking into it.

Hurray!

Awesome sauce!! Thanks for your effort :0
ID: 2016643 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2017147 - Posted: 29 Oct 2019, 17:45:46 UTC
Last modified: 29 Oct 2019, 17:47:20 UTC

The 441.08 drivers do not fix the problem. We will have to continue waiting for a fix.
431.60 continues to be the workaround.
ID: 2017147 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2017711 - Posted: 3 Nov 2019, 16:08:37 UTC

When testing 440.97 under windows 7 a few days ago, I mentioned that it seemed to be running significantly quicker than before. So I decided to take my life in my hands and update a sister machine, this time with 441.08. Bad mistake ...

All went smoothly, and BOINC picked up from where I'd shut it down. But a few minutes later, there was a huge grinding sound from under the desk, then silence, then the screen went black. Fortunately, the second screen on the second GPU stayed live, the mouse pointer moved, apps could be shut down - all seemed fine. So I did an orderly shut-down, waited for everything to cool down, and tried again. Same result.

So I ran it for a while without BOINC. Mostly OK, but minor sporadic grinding noises, and GPU-Z showed temperature rising, fan speed zero with bursts of activity. So now the machine is crunching on the second GPU only, and the first GPU is in bits on the kitchen table while I spray compressed air at it. Some debris, but not much - no real dust bunnies.

So - GTX 970, Asus, single-fan, short card labelled 'DirectCU Mini'. About 4 years old - I'll look up exact dates and model number later. Any suggestions? (The supplier is shut on a Sunday, so I can't replace/upgrade it immediately). I'm tempted to finish cleaning the fan, go back to the original driver, and see if it'll limp along gently for a day or two.
ID: 2017711 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017712 - Posted: 3 Nov 2019, 16:34:50 UTC - in response to Message 2017711.  

Sounds like a bad fan. A driver update would have nothing to do with that. Pure coincidence that it happened with the driver update. But as we know, correlation does not equal causation.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017712 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2017719 - Posted: 3 Nov 2019, 17:39:09 UTC - in response to Message 2017712.  

Except that I was deliberately trying a driver which previous experience had suggested to me might drive the card faster - and hence likely hotter, and thus demanding a higher fan speed.

But I put it back in, with some wrestling with mis-aligned securing screw holes - and probably bent something. Anyway, it was even louder when I started it back up, even without starting boinc.

So I think I'll have to bite the bullet and change something - two case fans for a start, which I found had also seized. Hunting through the GPU lists now...
ID: 2017719 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017720 - Posted: 3 Nov 2019, 17:53:49 UTC - in response to Message 2017719.  
Last modified: 3 Nov 2019, 17:55:45 UTC

if the fan was in such a state that a couple degrees and a couple % fan speed increase (if that) pushed it over the edge, then it's still just a bad fan that was close to failure anyway. the driver update was at best a contributing factor, but not the root cause. take the case fans for example, do you think the GPU driver update caused them to fail also? of course not.

but more likely that the reboot and fan stop/start is what did it in. very possible and not unheard of for components that are operating in steady state for a long time to have issues when stopped and started again. especially with rotating mechanisms. It's the same reason we take such extreme caution when deciding to turn off/on components on our satellite that have been on continuously for years. there's no signs or symptoms that anything is wrong now, but we all know that if we turn it off, its very possible to not come on again. The risk of it not coming back on is just too high for HQ to accept, even if we are confident that a recycle of said component will solve some smaller problem we are having. or as a more analogous example, the same reason we introduce additional biases to our reaction wheels during large spacecraft maneuvers to prevent them from spinning down to 0rpm, for the fear that they wont spin up, or spin up with some additional friction or issues that could jeopardize the attitude control.

bad fan. replace fan, if you can find the same model (check part number on the fan, then google), or pull the fan/shroud off and get creative with some new fans and some zip ties.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017720 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2017721 - Posted: 3 Nov 2019, 17:54:39 UTC

Interesting. GTX 970 has dropped off the sales lists, of course. But there seems to be such a beast as a GTX 1660 SUPER (Asus dual fan) with fast GDDR6 memory, DVI output, and a Windows 7 driver (441.08 only) - and at a reasonable price for "new in". That ticks all my boxes (not ready to upgrade from DVI KVM to HDMI yet!). Worth a sleep on it and a re-check in the morning.
ID: 2017721 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017723 - Posted: 3 Nov 2019, 18:01:07 UTC - in response to Message 2017721.  

Interesting. GTX 970 has dropped off the sales lists, of course. But there seems to be such a beast as a GTX 1660 SUPER (Asus dual fan) with fast GDDR6 memory, DVI output, and a Windows 7 driver (441.08 only) - and at a reasonable price for "new in". That ticks all my boxes (not ready to upgrade from DVI KVM to HDMI yet!). Worth a sleep on it and a re-check in the morning.


the 1660 are nice cards. the only difference between the 1660 and 1660 super is that the super has GDDR6 memory, and the non-super has GDDR5. all other aspects are the same, same core count, etc. it might be slightly faster.

also if you are relying on that DVI output to use one of those DVI-VGA adapters for your KVM, think again. Nvidia removed all analog output from their cards starting with the 10-series cards. there is no VGA passed through the DVI connector anymore. If your KVM does digital DVI natively, then you'll be fine of course.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017723 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2017725 - Posted: 3 Nov 2019, 18:11:00 UTC - in response to Message 2017723.  

If your KVM does digital DVI natively, then you'll be fine of course.
Yes, I like decent hi-res screens, so I upgraded the KVM to a native (and expensive) DVI model years ago. The 1660 in my Linux box works just fine through it.
ID: 2017725 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Task Status "Postponed" -- ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.