NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 20 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34581
Credit: 261,360,520
RAC: 489
Australia
Message 2017644 - Posted: 3 Nov 2019, 1:14:12 UTC

They may even have to get their heads together with M$ heads to find out what has gone wrong.

Cheers.
ID: 2017644 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13689
Credit: 208,696,464
RAC: 304
Australia
Message 2017645 - Posted: 3 Nov 2019, 1:15:50 UTC - in response to Message 2017640.  

I'm just saying that it wouldn't be the first time that M$ itself has broken something driver wise over the years by throwing in an undocumented update and that in itself would not surprise me that it's happened again.

Sorry, but I just ATM I can't wholly lay the blame with Nvidia as yet when it's only the 1 OS being effected. ;-)
True, but this issue is affecting WIn10 systems across multiple builds. If it affected only systems after a particular build, then you could put the blame on M$, or some of the blame on M$ as well (even then- it could be a case of them fixing something that was broken, and Nvidia got caught out by making use of what was known to be a bug).
At this stage, everything is pointing to Nvidia (Of course if Nvidia fixed an issue with their driver, and this is the result, then things will get very ugly).
Grant
Darwin NT
ID: 2017645 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017656 - Posted: 3 Nov 2019, 3:00:08 UTC - in response to Message 2017629.  

You can see this behavior in that Win 8.1 user's task, on their GTX 960 (Pascal):


GTX 960 (and all 900 series) is Maxwell, not Pascal
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017656 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2017661 - Posted: 3 Nov 2019, 3:25:35 UTC - in response to Message 2017656.  
Last modified: 3 Nov 2019, 3:26:13 UTC

You're right, I had a typo there. And it looks like I can't fix it. Dang.
ID: 2017661 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34581
Credit: 261,360,520
RAC: 489
Australia
Message 2017671 - Posted: 3 Nov 2019, 9:05:12 UTC - in response to Message 2017661.  
Last modified: 3 Nov 2019, 9:16:45 UTC

You're right, I had a typo there. And it looks like I can't fix it. Dang.
I wouldn't worry about that too much. ;-)

But at least we can narrow the problem down to being eccentric endemic to just Win10 rigs running the latest set of drivers.

Cheers.
ID: 2017671 · Report as offensive     Reply Quote
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 2017675 - Posted: 3 Nov 2019, 11:14:10 UTC

I am running Arecibo tasks via Science United on a Windows 8.1 PC with a GTX 1050 Ti and 436.30 driver. No problem.
Tullio
ID: 2017675 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017677 - Posted: 3 Nov 2019, 11:28:43 UTC - in response to Message 2017675.  
Last modified: 3 Nov 2019, 11:28:58 UTC

I am running Arecibo tasks via Science United on a Windows 8.1 PC with a GTX 1050 Ti and 436.30 driver. No problem.
Tullio


This is the host he’s talking about.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8815395

But as has been stated, the problem only affects Windows 10 with drivers 436+ (so far).
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017677 · Report as offensive     Reply Quote
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 2017682 - Posted: 3 Nov 2019, 12:15:08 UTC

Right. It is running also Milkyway@home (ten thousand tasks so far), Asteroids@home (one thousand tasks), and supports also a Linux Virtual Machine with SuSE Tumbleweed and kernel 5.7.1, updated very frequently. This also runs Science United tasks, including a long range climateprediction.net. A rare Linux task in Climateprediction.net.
Tullio
ID: 2017682 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2017706 - Posted: 3 Nov 2019, 15:05:22 UTC
Last modified: 3 Nov 2019, 15:08:13 UTC

Here is another machine at Beta showing the Non-SoG App 8.16 working while the SoG App 8.22 fails, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=88461&offset=40
Coprocessors: NVIDIA TITAN X (Pascal) (4095MB) driver: 441.08 OpenCL: 1.2
Operating System: Microsoft Windows 10 Core x64 Edition, (10.00.18362.00)
You could probably find a few more if you look a little more.
ID: 2017706 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2017816 - Posted: 4 Nov 2019, 18:33:17 UTC

NVIDIA released 441.12 drivers today.
I tested them, and they still have the "SETI OpenCL SoG VHAR" problems.
Maxwell: Tasks crash with error.
Pascal/Turing: Tasks run indefinitely with no load on the GPU.

We must continue to be patient.
ID: 2017816 · Report as offensive     Reply Quote
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 2017979 - Posted: 6 Nov 2019, 14:03:09 UTC

After all my Milkyway@home and Seti@home started making computer errors on driver 436.30 I installed 441.12 and at least SETI@home are working. This on a Science United PC with Windows 8.1.
Tullio
ID: 2017979 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017982 - Posted: 6 Nov 2019, 14:36:36 UTC - in response to Message 2017979.  
Last modified: 6 Nov 2019, 14:41:40 UTC

After all my Milkyway@home and Seti@home started making computer errors on driver 436.30 I installed 441.12 and at least SETI@home are working. This on a Science United PC with Windows 8.1.
Tullio


The problem doesn’t exist on Win 8.1. Only Windows 10. The 441 driver has no change for this issue. It doesn’t fix anything.

Looking at your specific errors on that machine, it looks like your whole system had an issue or you had the driver crash. A simple reboot would have likely resolved your problem.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017982 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 2018319 - Posted: 9 Nov 2019, 21:40:28 UTC

Unfortunately, the old drivers have major security vulnerabilities that have been patched in driver 441.12. I have no choice but to recommend that we update to 441.12 and then crunch with the CPU for now. The older insecure drivers have escalation of privilege, information disclosure, and denial of service vulnerabilities. See https://nvidia.custhelp.com/app/answers/detail/a_id/4907 for details.
ID: 2018319 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018323 - Posted: 9 Nov 2019, 22:04:06 UTC - in response to Message 2018319.  

Unfortunately, the old drivers have major security vulnerabilities that have been patched in driver 441.12. I have no choice but to recommend that we update to 441.12 and then crunch with the CPU for now. The older insecure drivers have escalation of privilege, information disclosure, and denial of service vulnerabilities. See https://nvidia.custhelp.com/app/answers/detail/a_id/4907 for details.


the nvidia_opencl_sah app still works. you do not need to stop GPU crunching, and you can use the newest drivers.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018323 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13689
Credit: 208,696,464
RAC: 304
Australia
Message 2018349 - Posted: 10 Nov 2019, 2:29:23 UTC - in response to Message 2018319.  

Unfortunately, the old drivers have major security vulnerabilities that have been patched in driver 441.12. I have no choice but to recommend that we update to 441.12 and then crunch with the CPU for now. The older insecure drivers have escalation of privilege, information disclosure, and denial of service vulnerabilities. See https://nvidia.custhelp.com/app/answers/detail/a_id/4907 for details.
And are only an issue if you allow the people that want to attack your system physical access to it to set up the attack. No physical access, no breach possible.
Grant
Darwin NT
ID: 2018349 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 2018359 - Posted: 10 Nov 2019, 7:04:20 UTC
Last modified: 10 Nov 2019, 7:07:13 UTC

And are only an issue if you allow the people that want to attack your system physical access to it to set up the attack. No physical access, no breach possible.


Perhaps Nvidia need to make things a lot clearer then, as in the CVE list only one vulnerability is described as "The attacker requires local system access." and that CVE does not apply to GForce on Windows. With no similar statement in the 5 CVE's that do affect GForce cards on Windows I am definitely in the "better safe than sorry" camp.

Also as a gamer who uses Geforce Experience in game overlay in online games, I will always have up to date drivers. All I have done is re-run the Lunatics installer and selected Cuda 50, it now takes my 1660ti about 5 times longer to crunch but is better than nothing and hopefully it won't suffer the problem.
ID: 2018359 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13689
Credit: 208,696,464
RAC: 304
Australia
Message 2018360 - Posted: 10 Nov 2019, 7:16:50 UTC - in response to Message 2018359.  

All I have done is re-run the Lunatics installer and selected Cuda 50, it now takes my 1660ti about 5 times longer to crunch but is better than nothing and hopefully it won't suffer the problem.
If running the CUDA50 application, it would be worth seeing how 2 WUs at a time go; while not a high-end card, the GTX 1660Tis are very capable (particularly compared to what was highend when CUDA50 made it's appearance).
2 WUs at a time could result in more work per hour than just 1 WU.

My cheat sheet.
     10     15     20     30     40     50     60     70     80     90
1x    6      4      3      2      1.5    1.2    1      0.86   0.75   0.67
2x   12      8      6      4      3      2.4    2      1.7    1.5    1.3
3x   18     12      9      6      4.5    3.6    3      2.58   2.25   2

10, 15, 20 etc are the number of WUs per hour. 1x, 2x 3x are for the number of WUs running. The 6, 4, 1, 0.75 etc are the runtimes in min (0.75= 45sec).
So for example- if it takes 4min to process a WU, one at a time, that's 15 WUs per hour. When processing 2 at a time, you'd want the run times to be less than 8min to be getting more than 15 WUs per hour output. 3WU at a time, less than 12min runtime would be needed to make it worthwhile.
Grant
Darwin NT
ID: 2018360 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 2018363 - Posted: 10 Nov 2019, 7:37:15 UTC

If running the CUDA50 application, it would be worth seeing how 2 WUs at a time go; while not a high-end card, the GTX 1660Tis are very capable (particularly compared to what was highend when CUDA50 made it's appearance).
2 WUs at a time could result in more work per hour than just 1 WU.


I decided to see how it performed first, but I had intended to try running with 2 wu's.
ID: 2018363 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13689
Credit: 208,696,464
RAC: 304
Australia
Message 2018369 - Posted: 10 Nov 2019, 8:36:23 UTC - in response to Message 2018363.  

I decided to see how it performed first, but I had intended to try running with 2 wu's.
Makes sense- if you don't have a 1 WU baseline, you'll have no idea if more is better, or if it's worse. Although it would be easier to get an idea of the baseline if we were back to just the 1 or 2 files of the same data being split at a time again...
On my GTX 1070, I found 2 at a time generally gave the best performance- unless you had an Arecibo & GBT WU running together. Then the Arecibo WU would take 2.5 to 3 times longer than usual to finish.
Grant
Darwin NT
ID: 2018369 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2018374 - Posted: 10 Nov 2019, 12:33:21 UTC

You can check the performance of the different Apps at Beta, just look at the Application Details page. Here are a few running the three eligible Apps;
https://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=81711
SETI@home v8 8.01 windows_intelx86 (cuda50) Average processing rate: 179.51 GFLOPS
SETI@home v8 8.16 windows_intelx86 (opencl_nvidia_sah) Average processing rate: 647.25 GFLOPS
SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) Average processing rate: 599.67 GFLOPS

https://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=88461
SETI@home v8 8.01 windows_intelx86 (cuda50) Average processing rate: 101.27 GFLOPS
SETI@home v8 8.16 windows_intelx86 (opencl_nvidia_sah) Average processing rate: 427.21 GFLOPS
SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) Average processing rate: 425.44 GFLOPS

https://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=71641
SETI@home v8 8.00 windows_intelx86 (cuda50) Average processing rate: 161.09 GFLOPS
SETI@home v8 8.16 windows_intelx86 (opencl_nvidia_sah) Average processing rate: 213.59 GFLOPS
SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) Average processing rate: 333.04 GFLOPS

https://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=88527
SETI@home v8 8.01 windows_intelx86 (cuda50) Average processing rate: 136.00 GFLOPS
SETI@home v8 8.16 windows_intelx86 (opencl_nvidia_sah) Average processing rate: 519.36 GFLOPS
SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) Average processing rate: 358.09 GFLOPS

Well, I know which App I wouldn't be using...
ID: 2018374 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 20 · Next

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.