A word of warning about Nvidia driver 340.52

Message boards : Number crunching : A word of warning about Nvidia driver 340.52
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1550190 - Posted: 30 Jul 2014, 22:55:02 UTC
Last modified: 30 Jul 2014, 23:17:47 UTC

I use the OpenCL NV r2058 version of Astropulse, from Raistmer and Lunatics.

I upgraded to Nvidia driver 340.52 today, and with that all Nvidia AP-tasks started using 100% CPU.

Of course, that's what they used to do before Raistmer made r2058.

On downgrade to driver version 337.88, the problem seems to have gone away.

Compare run times and cpu times for Nvidia Astropulse tasks on this host.
ID: 1550190 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1550210 - Posted: 30 Jul 2014, 23:27:05 UTC

I haven't tried 340.52 yet, but as far as I know the high CPU usage is a characteristic of all recent NV drivers (at least without using the use_sleep option). I don't see anything that stands out as drastically different between your 337.88 and 340.52 tasks (run time can still be greater than CPU time even with 100% CPU usage for a given task).
Soli Deo Gloria
ID: 1550210 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1550223 - Posted: 30 Jul 2014, 23:45:19 UTC - in response to Message 1550210.  

Oh, but it is drastically different. Samples:

Task 1, before upgrade: Run time 4,922.92, Cpu time 2,429.77

Task 2, with 340.52 driver: Run time 3,208.58, Cpu time 3,201.88

Task 3, (partly) after downgrade: Run time 5,118.14, Cpu time 3,654.19

Task 3 is slightly misrepresented in the result file, since it was started with 340.52, but was completed with the old (337.88) driver (restarted at 49.55%)

None of these tasks have any blanking, so blanking does not mess up the results.
ID: 1550223 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1550232 - Posted: 31 Jul 2014, 0:04:09 UTC

I installed it yesterday, and have had no problems with my AP tasks and I'm using Lunatics.


I don't buy computers, I build them!!
ID: 1550232 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1550236 - Posted: 31 Jul 2014, 0:12:00 UTC - in response to Message 1550232.  

@Cliff: You are using an older version (1843) of the Astropulse program, where run time and cpu time are approximately the same.

I got the 2058 version from Raistmers twitter feed. With this version, due to Raistmer's wizardry, you only use half the cpu time. Which is nice.
ID: 1550236 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1550241 - Posted: 31 Jul 2014, 0:22:42 UTC

Go for the latest NV build r2399 at Mike´s site: http://mikesworldnet.de/home
ID: 1550241 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1550244 - Posted: 31 Jul 2014, 0:29:09 UTC - in response to Message 1550241.  

Go for the latest NV build r2399 at Mike´s site: http://mikesworldnet.de/home


Thanks, I'll have a look at that as soon as more AP work becomes available. Right now I've just run out of AP-work for the GTX680...
ID: 1550244 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1550266 - Posted: 31 Jul 2014, 0:59:57 UTC

Thanks @Oddbjornik @Juan -

I was waiting for a new consolidated Lunatics installer (0.42?), but I d/l'ed all of the V7 builds from Mike's site since they are newer builds then what I have now. Hopefully, I can remember how to install them manually as its been quite a while since I did one.


I don't buy computers, I build them!!
ID: 1550266 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1550269 - Posted: 31 Jul 2014, 1:04:58 UTC
Last modified: 31 Jul 2014, 1:05:15 UTC

It may be a case of different hardware, different results, but I was using r2180 before r2399 and still had CPU time closely match run time with NV 337.88: Task 3651527633.

As I mentioned, you can get CPU time < run time with CPU usage still being 100%, especially if your CPU is overloaded. If your CPU is still showing < 100% usage, then great. If it's a case of something having changed between r2058 and later revisions, then it might be a good idea to find out what changed to return CPU usage to 100% on NV GPUs.
Soli Deo Gloria
ID: 1550269 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1550279 - Posted: 31 Jul 2014, 1:44:16 UTC
Last modified: 31 Jul 2014, 1:51:51 UTC

Did you use the -use_sleep switch? It makes the CPU usage drops. Look the CPU times of my allready crunched AP WU. With it the CPu usage downs to about 10-20% of the GPU time depending on the % blank vs 100% without the switch. OBS: AFAIK This switch is to be used on NV GPU´s only.

To manuly install if you allready have the lunnatics installed is simple, just unzip the builds to the project file and run aimerge.
ID: 1550279 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1550286 - Posted: 31 Jul 2014, 2:18:36 UTC

I mentioned the use_sleep option earlier. I generally find the increase in overall run time to be not worth it - at least while there remains the potential for high-blanked AP tasks. I may reconsider it if/when the BLANKIT branch is released.
Soli Deo Gloria
ID: 1550286 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1553658 - Posted: 8 Aug 2014, 13:47:05 UTC
Last modified: 8 Aug 2014, 13:51:36 UTC

I can confirm this bug/feature with driver 340.52.

Before with driver 337.88 both NVidia Cards on AP6 r2399 where happily being feeded by one core.
Now they need both(!!) cores to the limit crunching AP6 with only ~2-4% blanking on r2399. That's quite a difference... :/

[edit]
System is Core2Duo E6600 running Win XP 32 bit with Zotac GT 640 2GB PCIe and Zotac GT 430 512kb PCI.
Aloha, Uli

ID: 1553658 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1553683 - Posted: 8 Aug 2014, 14:48:27 UTC
Last modified: 8 Aug 2014, 14:49:40 UTC

But is there a bug or something with the r2399 app? Using the -use_sleep option doesn't seem to work, it's using a full CPU core as if I didn't use the option.

btw I'm using 335.32 drivers.
ID: 1553683 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1553698 - Posted: 8 Aug 2014, 15:15:10 UTC - in response to Message 1553683.  

But is there a bug or something with the r2399 app? Using the -use_sleep option doesn't seem to work, it's using a full CPU core as if I didn't use the option.

btw I'm using 335.32 drivers.

I´m ussing 337.88 & r2399. -use_sleep works perfect on all my NV hosts.
ID: 1553698 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1553725 - Posted: 8 Aug 2014, 16:17:08 UTC

If i use the -use_sleep parameter with r2399 & driver 340.52, i'll get exactly 50% usage per GPU, which isn't enough. Even if i rise the -ffa_xx parameters to ridiculous values, GPU usage don't rise. The CPU usage is negligible in that case. So to workaround this problem i let 2 instances run per GPU resulting in nearly 100% GPU usage again. That's really strange. :?
Aloha, Uli

ID: 1553725 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1553737 - Posted: 8 Aug 2014, 16:32:26 UTC

I think this idle/sleep problem is because of the poor implementation of OpenCL in NVidias drivers as some kind of adapter to native Cuda. Would it be possible to build a native Cuda5/6 astropulse cruncher like the Lunatics x41 executables, which don't have these idle/sleep problems. :?
Aloha, Uli

ID: 1553737 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1553747 - Posted: 8 Aug 2014, 16:59:12 UTC - in response to Message 1553737.  
Last modified: 8 Aug 2014, 17:00:29 UTC

I think this idle/sleep problem is because of the poor implementation of OpenCL in NVidias drivers as some kind of adapter to native Cuda. Would it be possible to build a native Cuda5/6 astropulse cruncher like the Lunatics x41 executables, which don't have these idle/sleep problems. :?

As I recall Nvidia basically wrote the original CUDA app for us. Then the lunatics did all of their hard work improving that base code. There is no such CUDA Astropulse application to build upon.
So far no one seems to want to sped the large effort to whip up a CUDA astropulse app from scratch. I am sure they would be interested in more help if you are volunteering. :)
Also the sleep issue could be the nature of processing astropulse data on the hardware & not OpenCL related.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1553747 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1553748 - Posted: 8 Aug 2014, 16:59:18 UTC - in response to Message 1553698.  

But is there a bug or something with the r2399 app? Using the -use_sleep option doesn't seem to work, it's using a full CPU core as if I didn't use the option.

btw I'm using 335.32 drivers.

I´m ussing 337.88 & r2399. -use_sleep works perfect on all my NV hosts.

OK, strange.
ID: 1553748 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1553856 - Posted: 8 Aug 2014, 20:45:36 UTC - in response to Message 1553748.  
Last modified: 8 Aug 2014, 20:52:55 UTC

But is there a bug or something with the r2399 app? Using the -use_sleep option doesn't seem to work, it's using a full CPU core as if I didn't use the option.

btw I'm using 335.32 drivers.

I´m ussing 337.88 & r2399. -use_sleep works perfect on all my NV hosts.

OK, strange.

Are you sure you are ussing the r2399 app? Ask because your allready crunched WU said: AstroPulse v6 Windows x86 rev 2058.

Yes something is making -use_sleep not work as desired if you install the 340.52 driver.
ID: 1553856 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1554069 - Posted: 9 Aug 2014, 2:20:27 UTC - in response to Message 1553856.  

But is there a bug or something with the r2399 app? Using the -use_sleep option doesn't seem to work, it's using a full CPU core as if I didn't use the option.

btw I'm using 335.32 drivers.

I´m ussing 337.88 & r2399. -use_sleep works perfect on all my NV hosts.

OK, strange.

Are you sure you are ussing the r2399 app? Ask because your allready crunched WU said: AstroPulse v6 Windows x86 rev 2058.

Yes something is making -use_sleep not work as desired if you install the 340.52 driver.


Ok, maybe I'm missing something here. I manually installed r2399. but did not insert the -use_sleep option with the result http://setiathome.berkeley.edu/result.php?resultid=3668615240. Exited BOINC and inserted the sleep option and got this http://setiathome.berkeley.edu/result.php?resultid=3668739329. The CPU time went from 1:17:28 to 0:55:22, which was the object of this exercise wasn't it? With the modest increase in total run time and the massive decrease in kernel thrashing, I can live with this. Also because of the decrease CPU usage, I was able to increase % of CPU usage to 90% without an increase in CPU temps.


I don't buy computers, I build them!!
ID: 1554069 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : A word of warning about Nvidia driver 340.52


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.