Uneven usage of GPUs |
![]() |
| log in |
Message boards : Number crunching : Uneven usage of GPUs
1 · 2 · 3 · Next
| Author | Message |
|---|---|
|
Hi all, apologies if this question has been raised previously but a quick search did not yield any results. | |
| ID: 1298145 · | |
|
Swap the cards around to see if its a slot problem or a card problem. | |
| ID: 1298154 · | |
|
As per GPU-Z, both the cards are running the same shader clock speed of 1401 MHz, and both are 98-99% loaded. Driver version is 285.58. I will try swapping the cards when I get back from work. | |
| ID: 1298155 · | |
|
Update: I have now swapped the GPUs and the issue persists. I have now also observed that both the cards exhibit this behavior, which is erratic in nature. For some time, the crunching seems fine and then for no apparent reason, one of them goes in to slow mode. Still cant figure out what is causing this issue. Will keep investigating... | |
| ID: 1298218 · | |
|
how big of a power supply are you using? | |
| ID: 1298219 · | |
|
Try to free at least one CPU core. | |
| ID: 1298241 · | |
|
The reason I ask about the PSU is that it seems there may not be enough power being given out so that both GPU's can work at their optimum | |
| ID: 1298254 · | |
|
and what about some "normal" long running tasks like http://setiathome.berkeley.edu/result.php?resultid=2660698769 | |
| ID: 1298261 · | |
|
Check your PCIe settings in BIOS, as some mobos can keep 1st PCIe bus in | |
| ID: 1298266 · | |
|
The rig is powered by a 1200W Gigabyte Odin, which I am guessing should be more than enough for the two 480s. The motherboard is Gigabyte GA-MA790X-UD4P. | |
| ID: 1298318 · | |
This is from one of your result files: <core_client_version>6.10.60</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 480, 1535 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 15 clockRate = 1401000 Device 2: GeForce GTX 480, 1535 MiB, regsPerBlock 32768 computeCap 2.0, multiProcs 15 clockRate = 1401000 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 480 is okay SETI@home using CUDA accelerated device GeForce GTX 480 Priority of process raised successfully Priority of worker thread raised successfully Cuda Active: Plenty of total Global VRAM (>300MiB). All early cuFft plans postponed, to parallel with first chirp. It doesn't look like your card is downclocking, which is what you have said and what GPU-Z is telling you. Looking through your tasks and trying to compare your times with those of your wingmates, I don't see anything that looks slow. BUT BUT BUT BUT that depends on how many work units you are crunching at one time per card. This is the closest thing I can find to "slow" and it isn't slow, depending on the number you crunch at once: http://setiathome.berkeley.edu/workunit.php?wuid=1095067650 Other than what you see on the progress indicator, is there anything else that makes you think one card is slow? | |
| ID: 1298319 · | |
|
Thanks for that indepth analysis tbret. What I am observing is that while one card is churning out the WUs at an average of 10-15 min, the other card is usually taking about 30 and sometimes even 90 min to finish one. Both the cards perform well when alone no matter which slot is used, but the moment they are put in together, things slow down. | |
| ID: 1298338 · | |
Both the cards perform well when alone no matter which slot is used, but the moment they are put in together, things slow down. What are the CPU & GPU temperatures with the cards by themselves, and the cards in there together? ____________ Grant Darwin NT. | |
| ID: 1298343 · | |
Thanks for that indepth analysis tbret. What I am observing is that while one card is churning out the WUs at an average of 10-15 min, the other card is usually taking about 30 and sometimes even 90 min to finish one. Both the cards perform well when alone no matter which slot is used, but the moment they are put in together, things slow down. Ok, this is going to drive us nuts: 1) What sized PSU are you using? (either one is fast, but one is slow with two? may be power, but unless you have a multi-rail PSU and you're starving a card, I can't guess why it would be both, but not either, that gives you trouble) 2) Did you do a Custom/CLEAN driver reinstall with both cards in the computer? (go to 301 or 306) 3) What's the temperature of the cards? (may be the heat of both) 4) Either runs fast no matter which slot, just so long as it is only one? This is a weird one. OH, has that computer ever had an ATI driver on it? If so, you may need to use DriverSweeper to get rid of any remaining "pieces". Several of us have sort-of had what you are talking about happen to us. I've had to reinstall drivers on occasion (clean). I had to get rid of MSI Afterburner (uninstall) one time and that cleared it up (don't ask me why, could have been coincidence). What else, if anything, is running? By the way, GPU-Z does not always show a downclock after a driver crash, even though the card is downclocked. | |
| ID: 1298347 · | |
The PSU is a Gigabyte Odin 1200W, and each GPU is connected to a separate rail.
Yup, it was a fresh installation of windows as well as the nvidia driver. Was using version 285.58 before, so just stuck with it. However, I think I had only one card in when installing the driver, and later popped in the other. Would I need to a clean reinstall of the driver with both the cards in?
EVGA Precision reports the temperatures at 47 deg C for card 1 and 52 deg C for card 2. Card 1, I am guessing is the inner one, which is slow.
I dont own any ATI cards, so never installed those drivers.
This is a dedicated crunching machine running 24/7. All I have apart from BOINC are the AVG antivirus, EVGA Precision, Winrar, Teamviewer and VNC.
I used to get driver crashes on this rig earlier which was fixed by a reinstall. And even if it is happening now, wouldnt it affect the performance of both the cards? Or would it make just one of the cards slow down? | |
| ID: 1298352 · | |
3) What's the temperature of the cards? (may be the heat of both) What is the ambient temperature? My GTX 560Ti & GTX 460 both run at over 70° with the fans running at almost full speed, but the ambient temperature is mid 30°c. When the temperature drops below 30°c, they run at about 70°, but with the fans only running at about 70% of maximum possible speed. ____________ Grant Darwin NT. | |
| ID: 1298365 · | |
The room temperature is set at 23 on the controls and I have set up the rigs so that the air blows directly over them. Plus I have Zalman VF3000F on both the cards. The maximum I have noticed on them is around 60 Deg C. | |
| ID: 1298366 · | |
I don't know. I had a machine with a pair of 660Tis in it and one of them... Hey... I had a situation with *that* machine that I had to plug *both* cards into a monitor to get the one without the monitor *not* to down-clock. And that's a shot in the dark because in that machine right now, neither 660Ti has a monitor on it and the cards aren't down-clocked. (the video is coming from a 670 also installed in that one) I'm really at a loss and grasping at straws. PURE desperation guesswork: A)Uninstall Precision. (maybe reinstall with both cards installed?) B)I'd update the driver and see what happens. You can always go back, it's not like the change is irreversible. Strange and unusual things appear in the "fix" lists between versions of the drivers. And I think, with your strangeness, I might download and run DriverSweeper even though I don't think I've ever had to do it with just NVIDIA cards in the computer. Still, something is screwing it up. Are you using Precision or Precision X and are both cards at the same clock there? (Synch-ed?) C) Try it with both cards plugged into a monitor (even the same monitor; DVI and HDMI or whatever combination works for your equipment). This is a weird-one. I guess that's why you're asking for help, huh? EDIT: You don't need me making things more stupid. If something else occurs to me I'll come back and mention it, and I'll read what happens with interest, but obviously I don't have anything useful to suggest that I can assign a causal connection. | |
| ID: 1298369 · | |
|
Thanks for the tips tbret, and you never know what the culprit is. I shall try all that once I get back and see if there is any improvement. | |
| ID: 1298375 · | |
|
| |
| ID: 1298380 · | |
Message boards : Number crunching : Uneven usage of GPUs
| Copyright © 2013 University of California |