Uneven usage of GPUs


log in

Advanced search

Message boards : Number crunching : Uneven usage of GPUs

Previous · 1 · 2 · 3 · Next
Author Message
Highlander
Avatar
Send message
Joined: 5 Oct 99
Posts: 146
Credit: 31,446,058
RAC: 10,968
Germany
Message 1298381 - Posted: 24 Oct 2012, 9:21:44 UTC
Last modified: 24 Oct 2012, 9:23:14 UTC

k, my assumption was wrong..

then try to free one cpu core as previously mentioned.
at boinc manager - computing preferences, set the value of "On multiprocessor systems, use at most" to 84 % and look if the times drop.

edit:
(ups, was a little late...)
____________

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1298382 - Posted: 24 Oct 2012, 9:27:26 UTC
Last modified: 24 Oct 2012, 10:07:18 UTC

I had already set it to 95%, but now have changed it to 50% as you suggested Bilbg/Highlander, will see how it goes.

Update: Setting aside 3 cores doesn't seem to have any effect

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2806
Credit: 6,331,007
RAC: 7,374
Bulgaria
Message 1298386 - Posted: 24 Oct 2012, 10:43:02 UTC - in response to Message 1298382.
Last modified: 24 Oct 2012, 11:39:18 UTC


What CPU load you see (in Windows Task Manager - Performance tab) on all cores when this is set to 50%?
Do you see only 3 CPU tasks running (in BOINC Manager - Tasks, in Windows Task Manager - Processes)?

Try even lower setting as 1% (I'm not sure will this use 0 or 1 core (i.e. no CPU tasks running at all or just one CPU task))

Change for test to <count>1</count> (make copy of original app_info.xml for easy return)


Also check what is 'angle range', only compare times for "similar" AR tasks:
"WU true angle range is" in stderr.txt (in ....\BOINC\slots\ while the GPU tasks run)
<true_angle_range> in the task/WU file

There are very few long running GPU tasks, in fact I found only one (after a few minutes browsing):
http://setiathome.berkeley.edu/result.php?resultid=2659629186
Can you give more links to long running GPU tasks?

jason_gee (the main programmer of the CUDA app you are using): http://setiathome.berkeley.edu/show_user.php?userid=8534984
... often suggest running DPC Latency Checker to check for bad behaving drivers in the system:
http://www.thesycon.de/deu/latency_check.shtml

http://setiathome.berkeley.edu/forum_thread.php?id=68151&postid=1240292#1240292
http://setiathome.berkeley.edu/forum_thread.php?id=67270&postid=1207226#1207226
http://setiathome.berkeley.edu/forum_thread.php?id=66787&postid=1189977#1189977

http://setiathome.berkeley.edu/forum_thread.php?id=66241&postid=1185224#1185224
http://setiathome.berkeley.edu/forum_thread.php?id=66241&postid=1184322#1184322


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1298395 - Posted: 24 Oct 2012, 11:45:26 UTC
Last modified: 24 Oct 2012, 11:47:57 UTC

With 3 cores active, I was getting the load that varied between 50 and 55%. Setting it to run just 1 core (1% setting) gave the load between 16 and 21%. Changing the count to 1 too didn't seem to bring about any change.

Here are few more WUs with similar AR

http://setiathome.berkeley.edu/result.php?resultid=2659629612
http://setiathome.berkeley.edu/result.php?resultid=2659629560
http://setiathome.berkeley.edu/result.php?resultid=2659619623
http://setiathome.berkeley.edu/result.php?resultid=2659619621

I will have to wait until I get back home before I can use the DPC Latency Checker.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2806
Credit: 6,331,007
RAC: 7,374
Bulgaria
Message 1298396 - Posted: 24 Oct 2012, 12:25:38 UTC - in response to Message 1298395.


If you like to play with video drivers - many people suggested in the past that this driver worked best
(I can't confirm by personal experience, I do not have CUDA GPU)

NVIDIA DRIVERS 266.58 WHQL - Windows XP 64-bit
http://www.nvidia.com/object/winxp64-266.58-whql-driver.html


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1298510 - Posted: 25 Oct 2012, 4:01:49 UTC

Here are my observations after running DPC Latency Checker.

1. I was getting high reading (max of 22k microsec) at frequent intervals. Following their suggestion, I disabled the network card and the frequency and the maximum values dropped.

2. I also noticed the EVGA Precision X was using a high % of CPU so shut that down as well which further reduced the latency.

3. Next to be disabled were the nVidia HD Audio. After this I started getting low reading with max of 1850 microsec.

I left the system running for sometime, during which a blank screen saver that I had set, kicked in. Upon returning, I logged back in and although the latency was still under 1000 microsec, one of the GPU had started a slow motion stunt.

So the next step was to disable the screen saver.

I did a drivesweep and fresh installation of the LAN driver. And just to be sure, I did a clean installation of the graphics driver with both the cards in the system. I enabled the network card and the latency was within permissible levels with occasional spikes.

Finally I put back the max CPU cores to be used back at 100% and the both the GPUs, as of now have been crunching without any hiccups since last night.

Currently disabled items - nVidia HD Audio, screen saver, EVGA Precision X.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,620,282
RAC: 47,519
Australia
Message 1298537 - Posted: 25 Oct 2012, 5:35:05 UTC - in response to Message 1298510.

Currently disabled items - nVidia HD Audio, screen saver, EVGA Precision X.

From past experiences, i'd blame the screen saver.
I haven't bothered with one for 10+ years.
____________
Grant
Darwin NT.

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1298542 - Posted: 25 Oct 2012, 5:53:52 UTC - in response to Message 1298537.

From past experiences, i'd blame the screen saver.
I haven't bothered with one for 10+ years.


I never had any issues with screensavers before and never thought it would cause such a massive overhead on the system. I will continue to leave it disabled and see how things go.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,620,282
RAC: 47,519
Australia
Message 1298553 - Posted: 25 Oct 2012, 7:06:18 UTC - in response to Message 1298542.
Last modified: 25 Oct 2012, 7:06:39 UTC

From past experiences, i'd blame the screen saver.
I haven't bothered with one for 10+ years.


I never had any issues with screensavers before and never thought it would cause such a massive overhead on the system. I will continue to leave it disabled and see how things go.

Not so much a massive overhead, just that it's priority is most likely higher than Seti's, so it gets the resources.

Also, not running the Seti screen saver has long been the best way to boost crunching performance. Not running a 3rd party one is likely to help almost as much.
____________
Grant
Darwin NT.

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1298610 - Posted: 25 Oct 2012, 12:28:13 UTC

The rig has been under observation for many hours now and still seems to be doing well than before, and hope it stays that way. Not really sure if the nVidia HD Audio has any adverse effects but I shall keep it disabled, and also the screensaver. The network card has to be active for obvious reasons.

I have to thank you all for your advice and expertise. And thanks to BilBg for the Latency Checker. There are still some spikes now and then but the crunching looks smooth.

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,321
RAC: 0
Korea, North
Message 1298690 - Posted: 25 Oct 2012, 18:05:20 UTC - in response to Message 1298610.

screensaver you say? That might be a clue. Do you have your power settings set to never turn off monitor and always disable any screensaver including the BOINC/seti ones.
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Profile Alex Storey
Volunteer tester
Avatar
Send message
Joined: 14 Jun 04
Posts: 553
Credit: 1,667,736
RAC: 575
Greece
Message 1298744 - Posted: 25 Oct 2012, 21:17:23 UTC - in response to Message 1298352.

This is a dedicated crunching machine running 24/7.


Then you don't need the Audio and PhysX, 3D stuff...

When installing the driver choose manual/custom/advanced or whatever it is that it says and then untick all the boxes (3-4 boxes). The actual graphics driver (if I'm remembering this correctly) you can't uncheck even if you wanted to. The rest of the stuff you don't need.

And of course, don't forget to check/tick the "clean install" box/option. Wish I could be more help with the problem you are having but I don't have the experience.

Good luck

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2806
Credit: 6,331,007
RAC: 7,374
Bulgaria
Message 1298949 - Posted: 26 Oct 2012, 13:09:51 UTC - in response to Message 1298610.

... Latency Checker. There are still some spikes now and then but the crunching looks smooth.

Did you see/read also about "PCI latency timer BIOS entry" (mine is set to 128):
http://setiathome.berkeley.edu/forum_thread.php?id=67270&postid=1207226#1207226
http://setiathome.berkeley.edu/forum_thread.php?id=66787&postid=1189985#1189985


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile cov_route
Avatar
Send message
Joined: 13 Sep 12
Posts: 296
Credit: 7,432,177
RAC: 12,987
Canada
Message 1299236 - Posted: 27 Oct 2012, 5:05:32 UTC

Here's another data point. Phenom II 945, Ripjaws F3-10666CL7D-8GBXH, ASRock N68C-GS FX (1000MHz HT, 1600MHz IMC).

Running dpc latency checker gave me values around 80 - 110 micro-s.

I changed my bios settings for mem timings to all auto--speed and timings. Previously it was running at 1333 7-7-7-21 (which it is rated for under an XMP profile), "auto" wanted it to 1066 7-7-7-20 which is JEDEC #2 (so says cpu-z).

Latency checker then reported values 15-30.

I tried tightening the timings to 6-6-6-auto, which runs, but the latency goes back up to ~100.

Maybe the Phenom IMC works better with standard JEDEC timings?

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1299269 - Posted: 27 Oct 2012, 6:20:29 UTC - in response to Message 1298949.
Last modified: 27 Oct 2012, 6:47:46 UTC

... Latency Checker. There are still some spikes now and then but the crunching looks smooth.
Did you see/read also about "PCI latency timer BIOS entry" (mine is set to 128):

Unfortunately the BIOS does give the option to set the PCI latency timer. I did a check at the motherboard site and it has the latest version (Award BIOS). And all the memory settings are at auto.

Inspite of that, the rig has been behaving fine.

Profile cov_route
Avatar
Send message
Joined: 13 Sep 12
Posts: 296
Credit: 7,432,177
RAC: 12,987
Canada
Message 1299395 - Posted: 27 Oct 2012, 15:06:45 UTC - in response to Message 1299269.
Last modified: 27 Oct 2012, 15:07:00 UTC

Unfortunately the BIOS does give the option to set the PCI latency timer.


I do have a timer control in my BIOS. I tried a many different settings over a few hours last night and didn't see any qualitative difference is processing speed.

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1300681 - Posted: 31 Oct 2012, 12:04:22 UTC
Last modified: 31 Oct 2012, 12:07:01 UTC

Has anyone come across the following message under status in the Tasks list - 'Waiting to run (0.04 CPUs + x.xx NVIDIA GPUs) (waiting for GPU memory)', even though GPU-Z reports memory usage at around 50%?

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5414
Credit: 306,647,596
RAC: 330,765
Brazil
Message 1300687 - Posted: 31 Oct 2012, 12:26:00 UTC - in response to Message 1300681.

Has anyone come across the following message under status in the Tasks list - 'Waiting to run (0.04 CPUs + x.xx NVIDIA GPUs) (waiting for GPU memory)', even though GPU-Z reports memory usage at around 50%?

I had this issue before, in my case happens when for some reason the nvidia driver is corrupted, for example if you try to do remote access with RDP. Normaly restarting the computers solve the problem.
____________

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,314,649
RAC: 74,661
India
Message 1300692 - Posted: 31 Oct 2012, 12:36:53 UTC - in response to Message 1300687.

I had this issue before, in my case happens when for some reason the nvidia driver is corrupted, for example if you try to do remote access with RDP. Normaly restarting the computers solve the problem.


I use RealVNC to connect to this rig, however, it never used to give this message. And yes, restarting the rig solves the issue until I access it again. Hopefully a clean reinstallation of the graphics driver will solve this.

Profile Bill GProject donor
Avatar
Send message
Joined: 1 Jun 01
Posts: 349
Credit: 43,152,075
RAC: 48,028
United States
Message 1300702 - Posted: 31 Oct 2012, 13:46:24 UTC - in response to Message 1300692.

I had this issue before, in my case happens when for some reason the nvidia driver is corrupted, for example if you try to do remote access with RDP. Normaly restarting the computers solve the problem.


I use RealVNC to connect to this rig, however, it never used to give this message. And yes, restarting the rig solves the issue until I access it again. Hopefully a clean reinstallation of the graphics driver will solve this.


This may not be a problem with your video driver, it may be a problem with the graphics drivers that RealVNC uses, just as if you were using RD. The reason I say this is because you are displaying the exact symptoms I had when using Windows RD. I found that LogMeIn (also free) did not cause this problem. It was members of this group that pointed me in the right direction at the time.
____________

Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Uneven usage of GPUs

Copyright © 2014 University of California