Uneven usage of GPUs

Message boards : Number crunching : Uneven usage of GPUs

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 33,135,855
RAC: 3
Germany
Message 1298381 - Posted: 24 Oct 2012, 9:21:44 UTC
Last modified: 24 Oct 2012, 9:23:14 UTC

k, my assumption was wrong..

then try to free one cpu core as previously mentioned.
at boinc manager - computing preferences, set the value of "On multiprocessor systems, use at most" to 84 % and look if the times drop.

edit:
(ups, was a little late...)


- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1298381 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1298382 - Posted: 24 Oct 2012, 9:27:26 UTC
Last modified: 24 Oct 2012, 10:07:18 UTC

I had already set it to 95%, but now have changed it to 50% as you suggested Bilbg/Highlander, will see how it goes.

Update: Setting aside 3 cores doesn't seem to have any effect
ID: 1298382 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3654
Credit: 8,596,598
RAC: 1,132
Bulgaria
Message 1298386 - Posted: 24 Oct 2012, 10:43:02 UTC - in response to Message 1298382.  
Last modified: 24 Oct 2012, 11:39:18 UTC


What CPU load you see (in Windows Task Manager - Performance tab) on all cores when this is set to 50%?
Do you see only 3 CPU tasks running (in BOINC Manager - Tasks, in Windows Task Manager - Processes)?

Try even lower setting as 1% (I'm not sure will this use 0 or 1 core (i.e. no CPU tasks running at all or just one CPU task))

Change for test to <count>1</count> (make copy of original app_info.xml for easy return)


Also check what is 'angle range', only compare times for "similar" AR tasks:
"WU true angle range is" in stderr.txt (in ....\BOINC\slots\ while the GPU tasks run)
<true_angle_range> in the task/WU file

There are very few long running GPU tasks, in fact I found only one (after a few minutes browsing):
http://setiathome.berkeley.edu/result.php?resultid=2659629186
Can you give more links to long running GPU tasks?

jason_gee (the main programmer of the CUDA app you are using): http://setiathome.berkeley.edu/show_user.php?userid=8534984
... often suggest running DPC Latency Checker to check for bad behaving drivers in the system:
http://www.thesycon.de/deu/latency_check.shtml

http://setiathome.berkeley.edu/forum_thread.php?id=68151&postid=1240292#1240292
http://setiathome.berkeley.edu/forum_thread.php?id=67270&postid=1207226#1207226
http://setiathome.berkeley.edu/forum_thread.php?id=66787&postid=1189977#1189977

http://setiathome.berkeley.edu/forum_thread.php?id=66241&postid=1185224#1185224
http://setiathome.berkeley.edu/forum_thread.php?id=66241&postid=1184322#1184322





- ALF - "Find out what you don't do well ..... then don't do it!" :)
ID: 1298386 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1298395 - Posted: 24 Oct 2012, 11:45:26 UTC
Last modified: 24 Oct 2012, 11:47:57 UTC

With 3 cores active, I was getting the load that varied between 50 and 55%. Setting it to run just 1 core (1% setting) gave the load between 16 and 21%. Changing the count to 1 too didn't seem to bring about any change.

Here are few more WUs with similar AR

http://setiathome.berkeley.edu/result.php?resultid=2659629612
http://setiathome.berkeley.edu/result.php?resultid=2659629560
http://setiathome.berkeley.edu/result.php?resultid=2659619623
http://setiathome.berkeley.edu/result.php?resultid=2659619621

I will have to wait until I get back home before I can use the DPC Latency Checker.
ID: 1298395 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3654
Credit: 8,596,598
RAC: 1,132
Bulgaria
Message 1298396 - Posted: 24 Oct 2012, 12:25:38 UTC - in response to Message 1298395.  


If you like to play with video drivers - many people suggested in the past that this driver worked best
(I can't confirm by personal experience, I do not have CUDA GPU)

NVIDIA DRIVERS 266.58 WHQL - Windows XP 64-bit
http://www.nvidia.com/object/winxp64-266.58-whql-driver.html





- ALF - "Find out what you don't do well ..... then don't do it!" :)
ID: 1298396 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1298510 - Posted: 25 Oct 2012, 4:01:49 UTC

Here are my observations after running DPC Latency Checker.

1. I was getting high reading (max of 22k microsec) at frequent intervals. Following their suggestion, I disabled the network card and the frequency and the maximum values dropped.

2. I also noticed the EVGA Precision X was using a high % of CPU so shut that down as well which further reduced the latency.

3. Next to be disabled were the nVidia HD Audio. After this I started getting low reading with max of 1850 microsec.

I left the system running for sometime, during which a blank screen saver that I had set, kicked in. Upon returning, I logged back in and although the latency was still under 1000 microsec, one of the GPU had started a slow motion stunt.

So the next step was to disable the screen saver.

I did a drivesweep and fresh installation of the LAN driver. And just to be sure, I did a clean installation of the graphics driver with both the cards in the system. I enabled the network card and the latency was within permissible levels with occasional spikes.

Finally I put back the max CPU cores to be used back at 100% and the both the GPUs, as of now have been crunching without any hiccups since last night.

Currently disabled items - nVidia HD Audio, screen saver, EVGA Precision X.
ID: 1298510 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7490
Credit: 91,162,254
RAC: 46,319
Australia
Message 1298537 - Posted: 25 Oct 2012, 5:35:05 UTC - in response to Message 1298510.  

Currently disabled items - nVidia HD Audio, screen saver, EVGA Precision X.

From past experiences, i'd blame the screen saver.
I haven't bothered with one for 10+ years.
Grant
Darwin NT
ID: 1298537 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1298542 - Posted: 25 Oct 2012, 5:53:52 UTC - in response to Message 1298537.  

From past experiences, i'd blame the screen saver.
I haven't bothered with one for 10+ years.


I never had any issues with screensavers before and never thought it would cause such a massive overhead on the system. I will continue to leave it disabled and see how things go.
ID: 1298542 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7490
Credit: 91,162,254
RAC: 46,319
Australia
Message 1298553 - Posted: 25 Oct 2012, 7:06:18 UTC - in response to Message 1298542.  
Last modified: 25 Oct 2012, 7:06:39 UTC

From past experiences, i'd blame the screen saver.
I haven't bothered with one for 10+ years.


I never had any issues with screensavers before and never thought it would cause such a massive overhead on the system. I will continue to leave it disabled and see how things go.

Not so much a massive overhead, just that it's priority is most likely higher than Seti's, so it gets the resources.

Also, not running the Seti screen saver has long been the best way to boost crunching performance. Not running a 3rd party one is likely to help almost as much.
Grant
Darwin NT
ID: 1298553 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1298610 - Posted: 25 Oct 2012, 12:28:13 UTC

The rig has been under observation for many hours now and still seems to be doing well than before, and hope it stays that way. Not really sure if the nVidia HD Audio has any adverse effects but I shall keep it disabled, and also the screensaver. The network card has to be active for obvious reasons.

I have to thank you all for your advice and expertise. And thanks to BilBg for the Latency Checker. There are still some spikes now and then but the crunching looks smooth.
ID: 1298610 · Report as offensive
Profile ignorance is no excuse
Avatar

Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,321
RAC: 0
Korea, North
Message 1298690 - Posted: 25 Oct 2012, 18:05:20 UTC - in response to Message 1298610.  

screensaver you say? That might be a clue. Do you have your power settings set to never turn off monitor and always disable any screensaver including the BOINC/seti ones.
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

ID: 1298690 · Report as offensive
Profile Alex Storey
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1122
Credit: 1,952,304
RAC: 231
Greece
Message 1298744 - Posted: 25 Oct 2012, 21:17:23 UTC - in response to Message 1298352.  

This is a dedicated crunching machine running 24/7.


Then you don't need the Audio and PhysX, 3D stuff...

When installing the driver choose manual/custom/advanced or whatever it is that it says and then untick all the boxes (3-4 boxes). The actual graphics driver (if I'm remembering this correctly) you can't uncheck even if you wanted to. The rest of the stuff you don't need.

And of course, don't forget to check/tick the "clean install" box/option. Wish I could be more help with the problem you are having but I don't have the experience.

Good luck
ID: 1298744 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3654
Credit: 8,596,598
RAC: 1,132
Bulgaria
Message 1298949 - Posted: 26 Oct 2012, 13:09:51 UTC - in response to Message 1298610.  

... Latency Checker. There are still some spikes now and then but the crunching looks smooth.

Did you see/read also about "PCI latency timer BIOS entry" (mine is set to 128):
http://setiathome.berkeley.edu/forum_thread.php?id=67270&postid=1207226#1207226
http://setiathome.berkeley.edu/forum_thread.php?id=66787&postid=1189985#1189985





- ALF - "Find out what you don't do well ..... then don't do it!" :)
ID: 1298949 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,267,805
RAC: 758
Canada
Message 1299236 - Posted: 27 Oct 2012, 5:05:32 UTC

Here's another data point. Phenom II 945, Ripjaws F3-10666CL7D-8GBXH, ASRock N68C-GS FX (1000MHz HT, 1600MHz IMC).

Running dpc latency checker gave me values around 80 - 110 micro-s.

I changed my bios settings for mem timings to all auto--speed and timings. Previously it was running at 1333 7-7-7-21 (which it is rated for under an XMP profile), "auto" wanted it to 1066 7-7-7-20 which is JEDEC #2 (so says cpu-z).

Latency checker then reported values 15-30.

I tried tightening the timings to 6-6-6-auto, which runs, but the latency goes back up to ~100.

Maybe the Phenom IMC works better with standard JEDEC timings?
ID: 1299236 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1299269 - Posted: 27 Oct 2012, 6:20:29 UTC - in response to Message 1298949.  
Last modified: 27 Oct 2012, 6:47:46 UTC

... Latency Checker. There are still some spikes now and then but the crunching looks smooth.
Did you see/read also about "PCI latency timer BIOS entry" (mine is set to 128):

Unfortunately the BIOS does give the option to set the PCI latency timer. I did a check at the motherboard site and it has the latest version (Award BIOS). And all the memory settings are at auto.

Inspite of that, the rig has been behaving fine.
ID: 1299269 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,267,805
RAC: 758
Canada
Message 1299395 - Posted: 27 Oct 2012, 15:06:45 UTC - in response to Message 1299269.  
Last modified: 27 Oct 2012, 15:07:00 UTC

Unfortunately the BIOS does give the option to set the PCI latency timer.


I do have a timer control in my BIOS. I tried a many different settings over a few hours last night and didn't see any qualitative difference is processing speed.
ID: 1299395 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1300681 - Posted: 31 Oct 2012, 12:04:22 UTC
Last modified: 31 Oct 2012, 12:07:01 UTC

Has anyone come across the following message under status in the Tasks list - 'Waiting to run (0.04 CPUs + x.xx NVIDIA GPUs) (waiting for GPU memory)', even though GPU-Z reports memory usage at around 50%?
ID: 1300681 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,562,301
RAC: 7,819
Panama
Message 1300687 - Posted: 31 Oct 2012, 12:26:00 UTC - in response to Message 1300681.  

Has anyone come across the following message under status in the Tasks list - 'Waiting to run (0.04 CPUs + x.xx NVIDIA GPUs) (waiting for GPU memory)', even though GPU-Z reports memory usage at around 50%?

I had this issue before, in my case happens when for some reason the nvidia driver is corrupted, for example if you try to do remote access with RDP. Normaly restarting the computers solve the problem.
ID: 1300687 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 276
Credit: 152,469,509
RAC: 39,819
India
Message 1300692 - Posted: 31 Oct 2012, 12:36:53 UTC - in response to Message 1300687.  

I had this issue before, in my case happens when for some reason the nvidia driver is corrupted, for example if you try to do remote access with RDP. Normaly restarting the computers solve the problem.


I use RealVNC to connect to this rig, however, it never used to give this message. And yes, restarting the rig solves the issue until I access it again. Hopefully a clean reinstallation of the graphics driver will solve this.
ID: 1300692 · Report as offensive
Profile Bill GProject Donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 576
Credit: 86,276,458
RAC: 61,775
United States
Message 1300702 - Posted: 31 Oct 2012, 13:46:24 UTC - in response to Message 1300692.  

I had this issue before, in my case happens when for some reason the nvidia driver is corrupted, for example if you try to do remote access with RDP. Normaly restarting the computers solve the problem.


I use RealVNC to connect to this rig, however, it never used to give this message. And yes, restarting the rig solves the issue until I access it again. Hopefully a clean reinstallation of the graphics driver will solve this.


This may not be a problem with your video driver, it may be a problem with the graphics drivers that RealVNC uses, just as if you were using RD. The reason I say this is because you are displaying the exact symptoms I had when using Windows RD. I found that LogMeIn (also free) did not cause this problem. It was members of this group that pointed me in the right direction at the time.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1300702 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Uneven usage of GPUs


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.