NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 20 · Next

AuthorMessage
John M. Kendall
Avatar

Send message
Joined: 15 May 99
Posts: 30
Credit: 11,800,561
RAC: 164
United States
Message 2026605 - Posted: 6 Jan 2020, 22:16:56 UTC

What to do about GPS tasks that start out with a remaining time of about 18:00 and then build up to multiple days of remaining time.
The elapsed time will go above the initial remaining time amount.
The progress percentage never gets above one percent.

Using NVIDIA GeForce GTX 1070 with 26.21.14.4166 driver.

I hate wasting an hour or more of elapsed time when I could have finished four or five normal GPS tasks.
I certainly do not like baby sitting the SETI@Home task so I can suspend and abort those tasks.

You would think that BOINC Manager software could use information from the restart checkpoint process to detect these tasks and end them in a reasonable time frame.

If it is a NVIDIA driver problem, I sure hope they can get it fixed. Seems like it has been going on for some time now.

Happy New Year
ID: 2026605 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34802
Credit: 261,360,520
RAC: 489
Australia
Message 2026610 - Posted: 6 Jan 2020, 22:37:48 UTC - in response to Message 2026605.  

What to do about GPS tasks that start out with a remaining time of about 18:00 and then build up to multiple days of remaining time.
The elapsed time will go above the initial remaining time amount.
The progress percentage never gets above one percent.

Using NVIDIA GeForce GTX 1070 with 26.21.14.4166 driver.

I hate wasting an hour or more of elapsed time when I could have finished four or five normal GPS tasks.
I certainly do not like baby sitting the SETI@Home task so I can suspend and abort those tasks.

You would think that BOINC Manager software could use information from the restart checkpoint process to detect these tasks and end them in a reasonable time frame.

If it is a NVIDIA driver problem, I sure hope they can get it fixed. Seems like it has been going on for some time now.

Happy New Year
Once you go past driver version 431.60 you're in that trouble so rolling it back to 431.60 is the only solution at this time.

Cheers.
ID: 2026610 · Report as offensive     Reply Quote
BoincSpy
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 146
Credit: 124,775,115
RAC: 353
Canada
Message 2026841 - Posted: 8 Jan 2020, 17:32:55 UTC
Last modified: 8 Jan 2020, 17:41:16 UTC

Noticed in the last couple of day that I have not been getting any run time limit exceeded errors ( I am still getting GPU WU's ). I am wondering if seti is no longer sending Arecibo VHAR workunits,
ID: 2026841 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22214
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2026842 - Posted: 8 Jan 2020, 17:37:05 UTC

All that means is that there are no Arecibo VHAR work available, it does not mean that they are being "blocked".
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2026842 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34802
Credit: 261,360,520
RAC: 489
Australia
Message 2026858 - Posted: 8 Jan 2020, 20:47:50 UTC - in response to Message 2026841.  

Noticed in the last couple of day that I have not been getting any run time limit exceeded errors ( I am still getting GPU WU's ). I am wondering if seti is no longer sending Arecibo VHAR workunits,
This says otherwise, but the number of VHAR's being sent has dropped a bit (or someone else is getting them instead of me).

Cheers.
ID: 2026858 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2026868 - Posted: 8 Jan 2020, 22:42:25 UTC - in response to Message 2026858.  
Last modified: 8 Jan 2020, 23:39:53 UTC

No new VHARs will be created unless a tape comes in from Arecibo - and that may be interrupted because of the Puerto Rico earthquake (power outages - I haven't looked for any observatory news). *

But there'll be a background trickle of resends.

* - edit: Arecibo's facebook page says
Good morning! Due to safety inspection protocols after an earthquake, the Angel Ramos Foundation Science and Visitor Center will remain closed. Keep posted, for more details.
ID: 2026868 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2028484 - Posted: 19 Jan 2020, 13:43:07 UTC

NVIDIA released 442.01 drivers on 1/6/2020.
I tested them, and they still have the "SETI OpenCL SoG VHAR on Windows 10" problems:

Maxwell:
> Tasks crash with error.
>ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.

Pascal/Turing:
> Tasks run indefinitely with no load on the GPU.

431.60 are the last drivers that work correctly for those specific SETI tasks on Windows 10.
NVIDIA is aware, and per NVIDIA, we must continue to be patient for a driver version that includes a fix.
ID: 2028484 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028487 - Posted: 19 Jan 2020, 14:17:10 UTC
Last modified: 19 Jan 2020, 14:19:36 UTC

looks like thats a Hotfix driver, not a full release. hotfix released 1/17/2020, based on 441.87 which was released on 1/6


GeForce Hotfix Driver version 442.01
Updated 01/17/2020 12:03 PM

GeForce Hotfix display driver version 442.01 is based on our latest Game Ready Driver 441.87.

This Hotfix driver resolves the following issues:

[Call of Duty Modern Warfare] Streaming of gameplay using OBS will randomly stop
[The Witcher 3: Wild Hunt – Blood and Wine] Game may crash when a user reaches a specific cut scene
[SLI+G-SYNC Stutter] User may experience minor stuttering when using NVIDIA SLI in combination with G-SYNC.

Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028487 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2028488 - Posted: 19 Jan 2020, 14:35:22 UTC

True that. But I wanted to report the result since I did do the test.
ID: 2028488 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2028501 - Posted: 19 Jan 2020, 16:01:05 UTC

I've done a quick update to the Lunatics installer, to include the new ATI files and mention the NVidia driver bug in the documentation. If anyone with one of the affected cards is willing to give it a quick test, could they PM me for a download link, please? (64-bit only so far). Ta.
ID: 2028501 · Report as offensive     Reply Quote
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 2029212 - Posted: 25 Jan 2020, 19:42:28 UTC

for what it's worth (if anything)

I'm a new windows-10 user (sigh!) and have an NVIDIA GeForce GTX 1660 SUPER installed so I THINK I'm stuck with the new driver and the eternal looping induced by the interplay of the driver, w-10, and SOG.

My simple-minded solution, that has been working all day now, was to re-install cude50 from the old lunatics procedure.

This run a lot slower BUT - SOG consumed 90% + of the CUDAs and the CUDA 50 consumes about 55% so... I just run 2 CUDA50 and have had NO hanging.

for what its worth

Ed F

P.S. I have has several tasks "die" at about 15 sec's into the run ... are these the ones that would have looped??
ID: 2029212 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2029214 - Posted: 25 Jan 2020, 19:52:11 UTC - in response to Message 2029212.  

Hi Ed, I had a look at a couple of the tasks that ran in the region of 15 second these tasks are from blc35 data, this is normal behaviour for these types of tasks. I say keep processing them while you can.
That is interesting work around to the driver problem. Thanks for sharing
ID: 2029214 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13739
Credit: 208,696,464
RAC: 304
Australia
Message 2029254 - Posted: 25 Jan 2020, 22:18:12 UTC - in response to Message 2029212.  

This run a lot slower BUT - SOG consumed 90% + of the CUDAs and the CUDA 50 consumes about 55% so... I just run 2 CUDA50 and have had NO hanging.

for what its worth
If you have the right driver, SOG is better as it will produce much more work.
If you don't have the right driver then the old CUDA application is the way to go.
Grant
Darwin NT
ID: 2029254 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2029271 - Posted: 25 Jan 2020, 22:42:50 UTC

Can anyone tell us how to figure out the download server fanout for the URL for the SAH application. That would solve the issue also.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2029271 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2029291 - Posted: 25 Jan 2020, 23:28:05 UTC - in response to Message 2029271.  
Last modified: 25 Jan 2020, 23:28:22 UTC

Maybe since Richard is re-packing the Lunatics installer, he can include the sah app in it. would make the install a lot easier for the Windows guys.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2029291 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2029308 - Posted: 26 Jan 2020, 2:09:17 UTC

That would be an excellent option. If the user desires to use a newer Windows driver for gaming, then limit the choice to the SAH application only and exclude the SoG application.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2029308 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029337 - Posted: 26 Jan 2020, 9:24:33 UTC - in response to Message 2029308.  

I'm only doing an installer for Windows, and there's no nvidia_sah app for Windows on the applications page - only Linux. I'll have a look through my archives and see how far Raistmer got through testing before abandoning sah in favour of SoG.
ID: 2029337 · Report as offensive     Reply Quote
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 2029339 - Posted: 26 Jan 2020, 9:47:43 UTC

The latest sah app i found for NV is r3430 which still has the screen lag issue.


With each crime and every kindness we birth our future.
ID: 2029339 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029360 - Posted: 26 Jan 2020, 12:40:51 UTC - in response to Message 2029339.  

The latest sah app i found for NV is r3430 which still has the screen lag issue.
That one was a bugfix on 31 March 2016 - it's a long way behind the current r3584 dated 09 December 2016. I wouldn't want to lose r3551 and r3556 - described as 'improvement of validation rate on overflows' and 'massive improvement in overflow validation rate' respectively.
ID: 2029360 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2029362 - Posted: 26 Jan 2020, 13:23:48 UTC - in response to Message 2029360.  

when the new server version was released around Christmas time, the project was sending out the sah app. And as far as I’m aware, it’s sending out the sah app at Beta also.

You can probably get the app and the required files by just attaching to beta and waiting for it.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2029362 · Report as offensive     Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 20 · Next

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.