NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

AuthorMessage
LumenDan

Send message
Joined: 4 Jan 01
Posts: 5
Credit: 9,247,086
RAC: 13
Australia
Message 2018499 - Posted: 11 Nov 2019, 9:51:47 UTC - in response to Message 2017979.  

Observing when a VHAR work unit appears to stall on my machine, I have noticed that the progress will always stop counting at 0.603%.
If this is consistent between other computers it may help isolate what the client program is doing when it stops performing as intended.

LumenDan
ID: 2018499 · Report as offensive     Reply Quote
Stephen Thomas Home - Delhi

Send message
Joined: 16 Sep 99
Posts: 11
Credit: 507,164,357
RAC: 357
India
Message 2018667 - Posted: 12 Nov 2019, 15:18:23 UTC

I did just installed NVIDIA 441.20 on 3 Windows10 maschines und did run some units. It is looking good. Perhaps the problem is solved?
ID: 2018667 · Report as offensive     Reply Quote
Stephen Thomas Home - Delhi

Send message
Joined: 16 Sep 99
Posts: 11
Credit: 507,164,357
RAC: 357
India
Message 2018668 - Posted: 12 Nov 2019, 15:23:11 UTC - in response to Message 2018667.  

Sorry, sorry, I was too optomistic!! The problem still exist.
ID: 2018668 · Report as offensive     Reply Quote
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 2018681 - Posted: 12 Nov 2019, 16:19:43 UTC
Last modified: 12 Nov 2019, 16:26:42 UTC

It looks like I've had several tasks go into an endless loop, stopped only by the time limit.

https://setiathome.berkeley.edu/result.php?resultid=8226545671

https://setiathome.berkeley.edu/result.php?resultid=8226545679

https://setiathome.berkeley.edu/result.php?resultid=8226545233

https://setiathome.berkeley.edu/result.php?resultid=8226545289

https://setiathome.berkeley.edu/result.php?resultid=8226545586

https://setiathome.berkeley.edu/result.php?resultid=8226545594

https://setiathome.berkeley.edu/result.php?resultid=8226545596

https://setiathome.berkeley.edu/result.php?resultid=8226545599

https://setiathome.berkeley.edu/result.php?resultid=8226545603

https://setiathome.berkeley.edu/result.php?resultid=8226545604

https://setiathome.berkeley.edu/result.php?resultid=8226545609

Those that someone else completed reported Completed, waiting for validation, or two others produce a valid result using some other application version.

I'm using the 441.12 driver.

I just got a notice that the 441.20 driver is available (released today), and am now downloading it.

The last few of these stopped counting progress at 0.603%, except the one now running at 0.605%.

How do I tell if any of these were Arecibo VHAR workunits?
ID: 2018681 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2018685 - Posted: 12 Nov 2019, 19:27:54 UTC - in response to Message 2018681.  
Last modified: 12 Nov 2019, 19:44:19 UTC

How do I tell if any of these were Arecibo VHAR workunits?


In the first one in the sterr output: WU true angle range is : 8.615825

I've seen a little as 2 1.1 be described as VHAR. (Thanks Keith.)

Also you can tell it it's an Arecibo work unit simply by its name not starting with "blc"
ID: 2018685 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2018686 - Posted: 12 Nov 2019, 19:40:19 UTC

The problem has been seen on tasks with Angle Range as low as 1.1.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2018686 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018688 - Posted: 12 Nov 2019, 19:43:46 UTC - in response to Message 2018685.  

How do I tell if any of these were Arecibo VHAR workunits?


In the first one in the sterr output: WU true angle range is : 8.615825

I've seen a little as 2 be described as VHAR.


yeah all of his quoted tasks are VHAR. I was under the impression that anything >1 was a VHAR.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018688 · Report as offensive     Reply Quote
rcthardcore

Send message
Joined: 23 Nov 08
Posts: 48
Credit: 1,306,006
RAC: 0
United States
Message 2018693 - Posted: 12 Nov 2019, 20:07:25 UTC

Has anybody reported this problem to NVIDIA??? If not, they need to do so immediately.
ID: 2018693 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36569
Credit: 261,360,520
RAC: 489
Australia
Message 2018694 - Posted: 12 Nov 2019, 20:09:38 UTC - in response to Message 2018693.  

Has anybody reported this problem to NVIDIA??? If not, they need to do so immediately.
It has been by several here.

Cheers.
ID: 2018694 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018695 - Posted: 12 Nov 2019, 20:14:50 UTC - in response to Message 2018693.  

Has anybody reported this problem to NVIDIA??? If not, they need to do so immediately.


It’s been reported by several people. But it’s low priority for them.

Use the older drivers that work.
Or use the SAH app that works.
Or don’t use Windows 10.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018695 · Report as offensive     Reply Quote
Cameron
Avatar

Send message
Joined: 27 Nov 02
Posts: 110
Credit: 5,082,471
RAC: 17
Australia
Message 2018722 - Posted: 13 Nov 2019, 0:57:19 UTC - in response to Message 2018694.  

Has anybody reported this problem to NVIDIA??? If not, they need to do so immediately.
It has been by several here.

Cheers.


I believe others here when they say it's been reported to NVIDIA. or that the latest drivers haven't solved the problem.
I did try some of the later 436.xx drivers which didn't work. so I'm back to 431.60

checking the release notes for the last couple of drivers. I don't see it as a known open issue so I'm not certain.
Nvidia could just try and patch it on the quiet.
ID: 2018722 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2018725 - Posted: 13 Nov 2019, 1:04:46 UTC - in response to Message 2018688.  

yeah all of his quoted tasks are VHAR. I was under the impression that anything >1 was a VHAR.

Yes, somewhere buried in the attic of this forum is a post that actually defines what range constitutes VLAR and VHAR and a proposal of what should probably be called medium angle range. I didn't have time to go search for it earlier.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2018725 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2018729 - Posted: 13 Nov 2019, 1:30:12 UTC
Last modified: 13 Nov 2019, 1:46:32 UTC

NVIDIA released 441.20 drivers today.
I tested them, and they still have the "SETI OpenCL SoG VHAR on Windows 10" problems:

Maxwell:
> Tasks crash with error.
>ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995.

Pascal/Turing:
> Tasks run indefinitely with no load on the GPU.

431.60 are the last drivers that work correctly for those specific SETI tasks on Windows 10.
NVIDIA is aware, and per NVIDIA, we must continue to be patient for a driver version that includes a fix.
ID: 2018729 · Report as offensive     Reply Quote
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 2018744 - Posted: 13 Nov 2019, 2:42:58 UTC

Another VHAR failed with 441.12

https://setiathome.berkeley.edu/result.php?resultid=8226545610

I installed 441.20 without restarting Windows; the next task ran in a different slot, but had the same problem. Also progress froze at 0.605%.

The VHAR that failed with 441.20

https://setiathome.berkeley.edu/result.php?resultid=8226545584

Two things to consider for this type of problem:

Allow sorting all tasks or only the error tasks by timestamp of completion.

Allow telling the server don't send me any VHAR tasks.

I just reinstalled 431.60. Tasks appear to be completing normally, even if VHAR.

Could .vhar be inserted the names of VHAR tasks? Also .mar for medium angle range, if that is defined?
ID: 2018744 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2018745 - Posted: 13 Nov 2019, 2:43:06 UTC - in response to Message 2018725.  

yeah all of his quoted tasks are VHAR. I was under the impression that anything >1 was a VHAR.

Yes, somewhere buried in the attic of this forum is a post that actually defines what range constitutes VLAR and VHAR and a proposal of what should probably be called medium angle range. I didn't have time to go search for it earlier.

OK. Found the post and it was from HAL9000. https://setiathome.berkeley.edu/forum_thread.php?id=79990&postid=1805455


VHARs >1.0 (aka "Shorties")
Mid-range (0.12 - 0.99) (aka "MARs"?)
VLARs <0.12 (aka "OMG Why are these SO SLOW!")


And from Richard's post. https://setiathome.berkeley.edu/forum_thread.php?id=79990&postid=1805543

Technically, VLAR was used to denote AR 0.05 and below, and VHAR for AR 1.1275 and above. When GPUs came along, we extended VLAR to a somewhat arbitrary AR 0.12

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2018745 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018750 - Posted: 13 Nov 2019, 2:57:56 UTC

I remember at one time the project servers were setup to NOT send a certain type of task to Nvidia GPUs. I believe it was Arecibo VLARs.

I’d assume something like that could be implemented again, only for VHARs this time. Just set it up to not send any to those on SoG and on Windows 10.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018750 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2018751 - Posted: 13 Nov 2019, 3:01:38 UTC - in response to Message 2018750.  
Last modified: 13 Nov 2019, 3:02:10 UTC

I remember at one time the project servers were setup to NOT send a certain type of task to Nvidia GPUs. I believe it was Arecibo VLARs.

I’d assume something like that could be implemented again, only for VHARs this time. Just set it up to not send any to those on SoG and on Windows 10.

Could be added not to send any WU to crunch on the 5700 too.
ID: 2018751 · Report as offensive     Reply Quote
Penguin

Send message
Joined: 5 Sep 14
Posts: 13
Credit: 16,544,489
RAC: 360
United States
Message 2019374 - Posted: 17 Nov 2019, 4:55:12 UTC

Same problems here, like others, glad I found this thread. Was not sure what the cause was. I will try reverting the drivers now.
ID: 2019374 · Report as offensive     Reply Quote
Profile Bruce N. Goren

Send message
Joined: 1 Jul 99
Posts: 15
Credit: 11,329,118
RAC: 32
United States
Message 2019624 - Posted: 19 Nov 2019, 4:41:56 UTC

Yup, my daily credits have dropped by 50% since updating my driver. Can't revert, I need the latest for my main tasks, SETI is just for idle time. Let's hope someone comes up with a fix soon. Thanks for this thread, really had me scratching my head wondering what was broken!
ID: 2019624 · Report as offensive     Reply Quote
Profile Bruce N. Goren

Send message
Joined: 1 Jul 99
Posts: 15
Credit: 11,329,118
RAC: 32
United States
Message 2019638 - Posted: 19 Nov 2019, 7:51:08 UTC - in response to Message 2019624.  
Last modified: 19 Nov 2019, 7:52:10 UTC

I just installed the NVidia Studio Driver version 441.28 . I'll let you know if my SETI performance is restored.
ID: 2019638 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.