Message boards :
Number crunching :
Uneven usage of GPUs
Message board moderation
Author | Message |
---|---|
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
Hi all, apologies if this question has been raised previously but a quick search did not yield any results. I seem to have some issues with one of my rigs that is running two GTX 480s. Prior to adding the second card a couple of weeks back, the existing 480 used to show the progress of around 0.1 - 0.5% every second (average max of 10 min per WU). However, after I added the second card, the progress indicator for this card (device 0) seems to move on at a snails pace taking up to 30 min per WU, while the newer card (device 1) is crunching much faster. First card is on an x16 PIC-E slot while the second is on an x8. I have observed this on many WUs and hence don't think it is an isolated case. GPU-Z indicates both to be loaded at around 98%. I have also checked the task list for this rig and it shows 153 tasks under validation inconclusive, most of which I have discovered are due to wing mates using 560 ti with stock application. Do I need to tweak any settings to get both the cards crunch evenly? I do have another rig with two GTX 260 that is performing well without any fiddling, so I am confused as to what went wrong with this one. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Swap the cards around to see if its a slot problem or a card problem. The fact that the second card is in a x8 slot should not make any difference to the crunching speed. GPUZ will tell you the clock speeds of each card. Are they running at the same speed or has card 2 "downclocked" ? What NVidia driver version are you using ? T.A. |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
As per GPU-Z, both the cards are running the same shader clock speed of 1401 MHz, and both are 98-99% loaded. Driver version is 285.58. I will try swapping the cards when I get back from work. |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
Update: I have now swapped the GPUs and the issue persists. I have now also observed that both the cards exhibit this behavior, which is erratic in nature. For some time, the crunching seems fine and then for no apparent reason, one of them goes in to slow mode. Still cant figure out what is causing this issue. Will keep investigating... |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
how big of a power supply are you using? In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Try to free at least one CPU core. I fear you need to free 2. With each crime and every kindness we birth our future. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
The reason I ask about the PSU is that it seems there may not be enough power being given out so that both GPU's can work at their optimum In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Highlander Send message Joined: 5 Oct 99 Posts: 167 Credit: 37,987,668 RAC: 16 |
and what about some "normal" long running tasks like http://setiathome.berkeley.edu/result.php?resultid=2660698769 this is one on my machine, runtime also half an hour, Angle Rate 0.274226 from a tape beginning with 22no10ab. But with this AR, the runtime is pretty normal. - Performance is not a simple linear function of the number of CPUs you throw at the problem. - |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
|
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
The rig is powered by a 1200W Gigabyte Odin, which I am guessing should be more than enough for the two 480s. The motherboard is Gigabyte GA-MA790X-UD4P. I have also noticed that the system as a whole is sluggish in responding to any commands - be it a right click menu or opening and closing folders. I have killed all unnecessary background programs and even changed the antivirus from Avast to AVG (both free versions) as I have noticed the aggressive behavior of Avast. And the windows itself was reinstalled last month. Things go back to normal if I run only one card in either slots. Few of the tasks are now taking up to 90 minutes on that card. I am not very sure about allocating the CPU cores to the GPU with the Swan_sync command. Would anyone be able to guide me through? I will have to check out the BIOS setting later after getting back from work. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
This is from one of your result files: <core_client_version>6.10.60</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 480, 1535 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 15 clockRate = 1401000 Device 2: GeForce GTX 480, 1535 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 15 clockRate = 1401000 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 480 is okay SETI@home using CUDA accelerated device GeForce GTX 480 Priority of process raised successfully Priority of worker thread raised successfully Cuda Active: Plenty of total Global VRAM (>300MiB). All early cuFft plans postponed, to parallel with first chirp. It doesn't look like your card is downclocking, which is what you have said and what GPU-Z is telling you. Looking through your tasks and trying to compare your times with those of your wingmates, I don't see anything that looks slow. BUT BUT BUT BUT that depends on how many work units you are crunching at one time per card. This is the closest thing I can find to "slow" and it isn't slow, depending on the number you crunch at once: http://setiathome.berkeley.edu/workunit.php?wuid=1095067650 Other than what you see on the progress indicator, is there anything else that makes you think one card is slow? |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
Thanks for that indepth analysis tbret. What I am observing is that while one card is churning out the WUs at an average of 10-15 min, the other card is usually taking about 30 and sometimes even 90 min to finish one. Both the cards perform well when alone no matter which slot is used, but the moment they are put in together, things slow down. Here is a workunit that took 4,274.27 seconds to finish, and here is another one that took 3,203.11 seconds. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Both the cards perform well when alone no matter which slot is used, but the moment they are put in together, things slow down. What are the CPU & GPU temperatures with the cards by themselves, and the cards in there together? Grant Darwin NT |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Thanks for that indepth analysis tbret. What I am observing is that while one card is churning out the WUs at an average of 10-15 min, the other card is usually taking about 30 and sometimes even 90 min to finish one. Both the cards perform well when alone no matter which slot is used, but the moment they are put in together, things slow down. Ok, this is going to drive us nuts: 1) What sized PSU are you using? (either one is fast, but one is slow with two? may be power, but unless you have a multi-rail PSU and you're starving a card, I can't guess why it would be both, but not either, that gives you trouble) 2) Did you do a Custom/CLEAN driver reinstall with both cards in the computer? (go to 301 or 306) 3) What's the temperature of the cards? (may be the heat of both) 4) Either runs fast no matter which slot, just so long as it is only one? This is a weird one. OH, has that computer ever had an ATI driver on it? If so, you may need to use DriverSweeper to get rid of any remaining "pieces". Several of us have sort-of had what you are talking about happen to us. I've had to reinstall drivers on occasion (clean). I had to get rid of MSI Afterburner (uninstall) one time and that cleared it up (don't ask me why, could have been coincidence). What else, if anything, is running? By the way, GPU-Z does not always show a downclock after a driver crash, even though the card is downclocked. |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
The PSU is a Gigabyte Odin 1200W, and each GPU is connected to a separate rail.
Yup, it was a fresh installation of windows as well as the nvidia driver. Was using version 285.58 before, so just stuck with it. However, I think I had only one card in when installing the driver, and later popped in the other. Would I need to a clean reinstall of the driver with both the cards in?
EVGA Precision reports the temperatures at 47 deg C for card 1 and 52 deg C for card 2. Card 1, I am guessing is the inner one, which is slow.
I dont own any ATI cards, so never installed those drivers.
This is a dedicated crunching machine running 24/7. All I have apart from BOINC are the AVG antivirus, EVGA Precision, Winrar, Teamviewer and VNC.
I used to get driver crashes on this rig earlier which was fixed by a reinstall. And even if it is happening now, wouldnt it affect the performance of both the cards? Or would it make just one of the cards slow down? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
3) What's the temperature of the cards? (may be the heat of both) What is the ambient temperature? My GTX 560Ti & GTX 460 both run at over 70° with the fans running at almost full speed, but the ambient temperature is mid 30°c. When the temperature drops below 30°c, they run at about 70°, but with the fans only running at about 70% of maximum possible speed. Grant Darwin NT |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
The room temperature is set at 23 on the controls and I have set up the rigs so that the air blows directly over them. Plus I have Zalman VF3000F on both the cards. The maximum I have noticed on them is around 60 Deg C. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I don't know. I had a machine with a pair of 660Tis in it and one of them... Hey... I had a situation with *that* machine that I had to plug *both* cards into a monitor to get the one without the monitor *not* to down-clock. And that's a shot in the dark because in that machine right now, neither 660Ti has a monitor on it and the cards aren't down-clocked. (the video is coming from a 670 also installed in that one) I'm really at a loss and grasping at straws. PURE desperation guesswork: A)Uninstall Precision. (maybe reinstall with both cards installed?) B)I'd update the driver and see what happens. You can always go back, it's not like the change is irreversible. Strange and unusual things appear in the "fix" lists between versions of the drivers. And I think, with your strangeness, I might download and run DriverSweeper even though I don't think I've ever had to do it with just NVIDIA cards in the computer. Still, something is screwing it up. Are you using Precision or Precision X and are both cards at the same clock there? (Synch-ed?) C) Try it with both cards plugged into a monitor (even the same monitor; DVI and HDMI or whatever combination works for your equipment). This is a weird-one. I guess that's why you're asking for help, huh? EDIT: You don't need me making things more stupid. If something else occurs to me I'll come back and mention it, and I'll read what happens with interest, but obviously I don't have anything useful to suggest that I can assign a causal connection. |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
Thanks for the tips tbret, and you never know what the culprit is. I shall try all that once I get back and see if there is any improvement. And I am using Precision X 3.0.3 and both the cards are synced. I just pulled up a screen cap from this rig, and the difference in crunch time is clearly evident. New observation is that the progress indicator for the GPU in question just stops, as if paused, for a while before picking up again. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
You still didn't follow the simplest advice by Mike: http://setiathome.berkeley.edu/forum_thread.php?id=69788&postid=1298241#1298241 You don't know how to "free at least one CPU core"? The setting is: - if you use web preferences: http://setiathome.berkeley.edu/prefs.php?subset=global "On multiprocessors, use at most 100% of the processors" - if you use local preferences (do the change in BOINC Manager if you already use local preferences): http://boinc.berkeley.edu/wiki/Local_preferences "On multiprocessor systems, use at most [ 100.00 ] % of the processors" To see will this have any effect - change to 50% (this will free 3 cores on "AMD Phenom(tm) II X6 1055T Processor" (meaning that only 3 (instead of 6) CPU tasks will be started/run by BOINC)) If you see 'effect' - next try 99% (this will free 1 core on any CPU with up to 100 cores) If you see the same 'effect' as with 3 cores free - leave it at 99% If the 'effect' is less - next try 2 cores free: % = 100 * (AllCores - FreeCores) / AllCores % = 100 * (6 - 1) / 6 = 84% (always round UP) % = 100 * (6 - 2) / 6 = 67% So: - anything 67...83% will free 2 cores on a six-core Processor (six-thread Processor in case of Intel) - anything 50...66% will free 3 cores on a six-core Processor  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.