Dual graphics card issues

Message boards : Number crunching : Dual graphics card issues
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1491196 - Posted: 19 Mar 2014, 13:26:14 UTC
Last modified: 19 Mar 2014, 13:39:43 UTC

Hello everyone,

I have had some issues in the past with running two NVIDIA GeForce GTX 650 Ti cards in the same rig. If I ran only 1 card, it would produce valid results. If I ran two cards, one of the cards would more than not generate inconclusves which would likely turn into invalids.

After chatting with everyone here, I came to the conclusion that it must be my old power supply that was causing the issue. After months of putting it off/thinking about it, I ordered a new power supply. I installed it, seems to work great, however it did not ultimately solve my problem.

The one card seems to produce valid results, whereas the second card still produces mostly inconclusives that I would expect will turn into invalids.

This is becoming an unfortunately costly endeavor. Lets assume that my PCI-E slots and graphics cards are in working condition. (I don't really want to buy more cards, or replace my motherboard.) With this said, can anyone think of any other reason that this type of scenario would occur? It does not seem to be an old power supply issue, which I was almost convinced of. As a footnote, I have ran two cards successfully on this machine in the past. They were just older cards. Additionally, the device manager says that both graphics cards are functioning properly. If it wasn't for SETI, I would assume that the entire rig was working properly. But, when I want to do SETI on both cards, only then does there exist some problem with the validity of the work (on one card).

Any and all advice or thoughts is much appreciated. I don't think I can spend any more money on this problem, LOL.

Edit: Forgot maybe something helpful. Task List for the computer in question. You may notice that from the work units from the last day or so, the valids are mostly from Device 1, and the inconclusives are mostly from Device 2.
ID: 1491196 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1491209 - Posted: 19 Mar 2014, 14:27:50 UTC

Assuming your PSU is good and can drive both cards at the same time.

Did you try to switch the GPUs? If you do that the problem follows the GPU or remain in the slot?
ID: 1491209 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1491211 - Posted: 19 Mar 2014, 14:33:09 UTC - in response to Message 1491196.  

Are you overclocking your GPUs? And if so, by how much? Sometimes when you OC too much, it will cause the GPUs to throw out invalids.
ID: 1491211 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1491212 - Posted: 19 Mar 2014, 14:35:19 UTC - in response to Message 1491196.  
Last modified: 19 Mar 2014, 14:54:06 UTC

As I remember it, you had an AMD 7750 & NV8800 previously. If you still have the 7750, have you tried replacing the problem 650 with it? I think you've already swapped slots with the 650s, but have you tried matching a different card with one 650? If one 650 works fine with the other card but the other 650 doesn't, I'd consider returning the 650 that doesn't work with the other non-650. If both 650s work fine with the other card, you've entered the twilight zone.
ID: 1491212 · Report as offensive
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1491213 - Posted: 19 Mar 2014, 14:50:35 UTC

Assuming your PSU is good and can drive both cards at the same time.

Did you try to switch the GPUs? If you do that the problem follows the GPU or remain in the slot?


The power supply is an 860W, so I have to believe that it is fine in that regard.

I did try switching the slots with my old power supply, but I unfortunately forget the result. I can try again, but I was really hoping to explore any solutions that didn't require needing to replace more hardware. I would assume that if the problem followed the card, or stuck to the PCI-E slot, the only option would to be replace whichever piece of the puzzle that is the culprit.

Are you overclocking your GPUs? And if so, by how much? Sometimes when you OC too much, it will cause the GPUs to throw out invalids.


I'll have to double check the default stats for the card, but I did play with this in the past. Whichever the case, both cards should have the same settings. At least, that is what I am trying to tell them to do via my GPU Tweak program.

As I remember it, you had an AMD 7750 & NV8800 previously. If you still have the 7750, have you tried replacing the problem 650 with it? I think you've already swapped slots with the 650s, but have you tried matching a different card with one 650?


That is correct! You have a very good memory. I have not tried to replace the 7750 with the potentially strange 650. The reason being that I think at that point, I would need different work unit types for each of the cards. Would that be true? I had to baby sit the type of work units in the past, and it is something I wouldn't like to repeat again with the 100WU GPU limit.

Thanks everyone for commenting.
ID: 1491213 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1491214 - Posted: 19 Mar 2014, 15:00:20 UTC - in response to Message 1491213.  

As I remember it, you had an AMD 7750 & NV8800 previously. If you still have the 7750, have you tried replacing the problem 650 with it? I think you've already swapped slots with the 650s, but have you tried matching a different card with one 650?


That is correct! You have a very good memory. I have not tried to replace the 7750 with the potentially strange 650. The reason being that I think at that point, I would need different work unit types for each of the cards. Would that be true? I had to baby sit the type of work units in the past, and it is something I wouldn't like to repeat again with the 100WU GPU limit.

Thanks everyone for commenting.

Supposedly if you set the cache to around 0.75 days you won't have to babysit the WUs. In any case, it would just be a short while as the cards are being tested. If you can verify one particular 650 as the problem, you can see about replacing it.
ID: 1491214 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1491215 - Posted: 19 Mar 2014, 15:00:30 UTC - in response to Message 1491213.  

Assuming your PSU is good and can drive both cards at the same time.

Did you try to switch the GPUs? If you do that the problem follows the GPU or remain in the slot?


The power supply is an 860W, so I have to believe that it is fine in that regard.

I did try switching the slots with my old power supply, but I unfortunately forget the result. I can try again, but I was really hoping to explore any solutions that didn't require needing to replace more hardware. I would assume that if the problem followed the card, or stuck to the PCI-E slot, the only option would to be replace whichever piece of the puzzle that is the culprit.

850W PSU must be OK

Unfurtunately you need to check if the problem is on the card or the slot, post what happening when you switch the cards, so we could try something else.
ID: 1491215 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1491216 - Posted: 19 Mar 2014, 15:04:26 UTC - in response to Message 1491215.  

one more question, how are the Temps and fan speeds on that GPU?
ID: 1491216 · Report as offensive
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1491217 - Posted: 19 Mar 2014, 15:16:07 UTC - in response to Message 1491216.  

one more question, how are the Temps and fan speeds on that GPU?


Quite Low. Never saw either card above 60C.

I will probably let these inconclusives invalidate themselves before switching the cards, and then post again in the thread. Thanks everyone.
ID: 1491217 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1491868 - Posted: 20 Mar 2014, 15:52:07 UTC - in response to Message 1491217.  

While the cards are in your hands you may take the opportunity to inspect them the Sherlock Holmes way - with looking glass and strong light.

Compare the soldering, traces, slot contacts (which you can clean by alcohol), pins where power cables connect, bulged capacitors, ...
Look also for any tiny conducting particles/material: metallic, graphite/soot, electrolyte ...

Even if you don't see such things consider using natural brush to clean the invisible 'normal' dust.

Check similar way the mainboard, especially the slots
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1491868 · Report as offensive
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1493377 - Posted: 22 Mar 2014, 13:36:43 UTC - in response to Message 1491868.  

While the cards are in your hands you may take the opportunity to inspect them the Sherlock Holmes way - with looking glass and strong light.

Compare the soldering, traces, slot contacts (which you can clean by alcohol), pins where power cables connect, bulged capacitors, ...
Look also for any tiny conducting particles/material: metallic, graphite/soot, electrolyte ...

Even if you don't see such things consider using natural brush to clean the invisible 'normal' dust.

Check similar way the mainboard, especially the slots


Thanks for the suggestion Bill, I haven't paid too much attention to the finer details like that.

I am back to report some experiments.

1) After posting here, I let the two cards run for about 12 hours or so. Device 2 was mostly responsible for a large number of inconclusives. Even though the number of inconclusives was alarmingly high, not very many of them turned into invalids.

2) I switched the cards, and let them run for another 12 hours. The same amount of alarmingly large inconclusives, however this time it seemed to come from Device 1. More valid WUs came from Device 1 than they did in Device 2 in experiment one, but that may be a coincidence.

The actual amount of invalids that happen don't seem to be a large number compared to the high number of inconclusives that stack up quite quickly. However, I'm not 100% sure how this works, and maybe as more time went on I would get more and more invalids (I got 4 from overnight).

Here is the task list if interested

So long story short, this does seem to point to the PCI slot being the culprit. However I am confused by this, considering as I've stated I was running two cards for months with no issue.

I'm also confused by the voltage reports going into the cards. Using GPU Tweak, if I tell the cards to run 1100 volts (or mV? I forget), neither card will. It will be about 1080 or 1065. However, if I just have 1 card present, it will run the 1100 I've instructed. This situation was what made me and some other folks think that replacing the power supply would fix my problem. However, the same issue remains.

I fear that my options now are to get a new motherboard. But again I feel like the fact that I've ran two cards fine before should mean the PCI slot is fine. But my experiments sort of suggest otherwise.

Sigh. Thanks to everyone for reading my thread and offering suggestions. I'm a bit lost as to what would be my next step other than just replacing more and more hardware.
ID: 1493377 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1493384 - Posted: 22 Mar 2014, 13:57:11 UTC - in response to Message 1493377.  

I know you have said that both cards are running cool. But, it is possible that it is a position thing with air flow or cooling. I sometimes wonder if the sensors really are accurate in all cases. You might want to try an extra fan or open case with fan blowing right on the cards and see if that makes any difference. I know in one of my computers one GPU shows 60C and the other GPU shows 80C and that is with the fans running 90% which is the highest I can manually crank them up. (and there are extra case fans as well)

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1493384 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1493385 - Posted: 22 Mar 2014, 13:59:10 UTC - in response to Message 1493377.  
Last modified: 22 Mar 2014, 14:06:05 UTC

So long story short, this does seem to point to the PCI slot being the culprit. However I am confused by this, considering as I've stated I was running two cards for months with no issue.

Yes now the PCI-e is the major suspect.

You could do realy few things to try to fix that if is realy the PCI-e slot itself, one think is a close look ate the connector itself, and see if you have a blend or partialy melted pin on the connector. That melted/eroded pin is normaly produced when you use a card who drives some of the power from the PCIe itself, something usual in the low ends or old GPU´s.

Another other common cause of problems like this is a leaking capacitor who serve one of the PCI-e slot. That´s could easely explain why all works for months and not now. In this case you need some technical help to substitute the leaking componenet, but if that is the case my sugestion... change the MB, when one capacitor starts to leaking on one MB soon others will do the same.

If you have technical still you could use a contact cleaning solution to clean the electric contacts on the PCI-e slot connector, a bad contact could be the cause of your problem.

Bad Mecanical fixation could produce that too, something extreamely reare on today´s computers but that happening especialy if you live close to a train line, subway or a construction site who could vibrate the host and make it produce a bad contact situation.

If the problem remains you will have no other option than mark the PCI-e as bad and don´t use it anymore and continue with only one GPU (unless you try to use a PCI extender to use another PCI slot) or buy another MB.
ID: 1493385 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1493389 - Posted: 22 Mar 2014, 14:33:01 UTC - in response to Message 1493377.  

So long story short, this does seem to point to the PCI slot being the culprit. However I am confused by this, considering as I've stated I was running two cards for months with no issue.

With plug in cards the slightest thing can cause an issue. A bit of moisture or dust, the slightest movement of the card or even moving the computer. It could have been a finger print on a contact coupled with humidity that did it and it happens only during certain operations or combination of operations.
ID: 1493389 · Report as offensive
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1493391 - Posted: 22 Mar 2014, 14:44:43 UTC

I know you have said that both cards are running cool. But, it is possible that it is a position thing with air flow or cooling. I sometimes wonder if the sensors really are accurate in all cases. You might want to try an extra fan or open case with fan blowing right on the cards and see if that makes any difference. I know in one of my computers one GPU shows 60C and the other GPU shows 80C and that is with the fans running 90% which is the highest I can manually crank them up. (and there are extra case fans as well)


Not a bad idea. I just checked the computer after it running all night and it seemed like half of the WUs were frozen in boinc, while the other half kept going. It wasn't split down the middle per device either, I don't think. I've restarted the computer and have got a fan going right inside the case so we'll see if there are any improvements there.


Thank you Juan for your advice. I'll have to consider which move I would like to take. I'm not very technical so the most obvious thing would be to get a new motherboard, but I'll likely put that off for a while and see if I can improve anything in the mean time.



With plug in cards the slightest thing can cause an issue. A bit of moisture or dust, the slightest movement of the card or even moving the computer. It could have been a finger print on a contact coupled with humidity that did it and it happens only during certain operations or combination of operations.


This computer is a frankenstein, and neither card is screwed in as tight as can be. The plugin cards are the only type I know, but it is unfortunate for them to be as sensitive as you say!
ID: 1493391 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1493397 - Posted: 22 Mar 2014, 15:19:03 UTC - in response to Message 1493391.  
Last modified: 22 Mar 2014, 15:22:06 UTC

This computer is a frankenstein, and neither card is screwed in as tight as can be.

That´s could be another possible cause of your problem, a tight fixed card could vibrate when it internal fan (who is not 100% balanced and produces a small vibraton) runs at high speed and could produce randomly fast bad contacs that could cause the problem. Be sure the card (and everything else) is properly fixed is mandatory for a problem free computer.

Extra fans are allways wellcomed, but as you said before your temps never pass 60C i don´t realy belive temp is your problem and BTW the 650 runs realy cold (i have one).

As i allways said, fix hardware related problem, is not a exact science, it´s allways a nightmare of "coulds" & "ifs".

Yes i know it´s a totaly pain in the... but as you said the problem apear to be focused on one PCI-e Slot, i still bet my chips on a leakage capacitor or a PCI-e bad contact/solder origin.

Actualy few care about the impact on the connectors life because the high current used be the GPU´s when they drive their power from the PCI-e slot... You should never subestimate the ohm law.
ID: 1493397 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1493499 - Posted: 22 Mar 2014, 18:55:40 UTC - in response to Message 1493377.  

While the cards are in your hands you may take the opportunity to inspect them the Sherlock Holmes way - with looking glass and strong light.

Compare the soldering, traces, slot contacts (which you can clean by alcohol), pins where power cables connect, bulged capacitors, ...
Look also for any tiny conducting particles/material: metallic, graphite/soot, electrolyte ...

Even if you don't see such things consider using natural brush to clean the invisible 'normal' dust.

Check similar way the mainboard, especially the slots


Thanks for the suggestion Bill, I haven't paid too much attention to the finer details like that.

I am back to report some experiments.

1) After posting here, I let the two cards run for about 12 hours or so. Device 2 was mostly responsible for a large number of inconclusives. Even though the number of inconclusives was alarmingly high, not very many of them turned into invalids.

2) I switched the cards, and let them run for another 12 hours. The same amount of alarmingly large inconclusives, however this time it seemed to come from Device 1. More valid WUs came from Device 1 than they did in Device 2 in experiment one, but that may be a coincidence.

The actual amount of invalids that happen don't seem to be a large number compared to the high number of inconclusives that stack up quite quickly. However, I'm not 100% sure how this works, and maybe as more time went on I would get more and more invalids (I got 4 from overnight).

Here is the task list if interested

So long story short, this does seem to point to the PCI slot being the culprit. However I am confused by this, considering as I've stated I was running two cards for months with no issue.

I'm also confused by the voltage reports going into the cards. Using GPU Tweak, if I tell the cards to run 1100 volts (or mV? I forget), neither card will. It will be about 1080 or 1065. However, if I just have 1 card present, it will run the 1100 I've instructed. This situation was what made me and some other folks think that replacing the power supply would fix my problem. However, the same issue remains.

I fear that my options now are to get a new motherboard. But again I feel like the fact that I've ran two cards fine before should mean the PCI slot is fine. But my experiments sort of suggest otherwise.

Sigh. Thanks to everyone for reading my thread and offering suggestions. I'm a bit lost as to what would be my next step other than just replacing more and more hardware.

Since your other cards work fine in the PCIe slots, there is no reason to suspect the slot may be the problem. There are many people running cards hanging off a 1x PCIe cable without any problems. I still think it would be a relevant test to run 1 NV650 at a time with one of your other cards. Other than that, if you haven't run the Driver Cleaner yet you might give that a go. If you still have the AMD driver installed, try removing both and then just installing the NV driver. If that doesn't work, try matching 1 of your other cards with one 650.
Display Driver Uninstaller (DDU) / Cleaner made for Display Drivers.(NVIDIA/AMD)
ID: 1493499 · Report as offensive
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1493579 - Posted: 22 Mar 2014, 20:24:24 UTC - in response to Message 1493499.  

Thanks again for the comment Tbar.

If I have already uninstalled the AMD drivers, do you think the driver wiper would still be a good idea?

I'm hesitant to try matching the current cards with older ones, even though I know I should. I've been uninstalling the old drivers every time I get a new card, so it would be quite the headache to figure all of that out again.

If I did try an old card with one of my current ones, and it worked fine, what would that suggest? I'm not sure.
ID: 1493579 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1493614 - Posted: 22 Mar 2014, 21:05:25 UTC - in response to Message 1493579.  

Thanks again for the comment Tbar.

If I have already uninstalled the AMD drivers, do you think the driver wiper would still be a good idea?

I'm hesitant to try matching the current cards with older ones, even though I know I should. I've been uninstalling the old drivers every time I get a new card, so it would be quite the headache to figure all of that out again.

If I did try an old card with one of my current ones, and it worked fine, what would that suggest? I'm not sure.

I also noticed you haven't updated to Windows 8.1. Considering that, I'd run the Driver Cleaner, install the NV driver and try that. If that doesn't help, run the Windows 8.1 Updater. After that, I'd pull one NV650, replace it with the AMD7750, and install the AMD driver. That would be basically what you had before that worked Perfectly Fine, one NV card and one AMD card. Then see what happens.

You had a Perfectly working System before you introduced the Two 650s. Since the Power supply didn't help, I'd concentrate on what you changed that started the problems, the two 650s. If you don't have the problems with the other cards, I'd send the 650s back before I bought a new system only to find it still didn't work. The AMD 7750 release date is only about 6 months earlier than the NV650, so you really can't call it an 'Old' card. Here's how my 7750 is doing in Windows 8.1, http://setiathome.berkeley.edu/show_host_detail.php?hostid=6796475
ID: 1493614 · Report as offensive
_
Avatar

Send message
Joined: 15 Nov 12
Posts: 299
Credit: 9,037,618
RAC: 0
United States
Message 1495083 - Posted: 25 Mar 2014, 13:07:35 UTC - in response to Message 1493614.  
Last modified: 25 Mar 2014, 13:08:31 UTC


I also noticed you haven't updated to Windows 8.1. Considering that, I'd run the Driver Cleaner, install the NV driver and try that. If that doesn't help, run the Windows 8.1 Updater. After that, I'd pull one NV650, replace it with the AMD7750, and install the AMD driver. That would be basically what you had before that worked Perfectly Fine, one NV card and one AMD card. Then see what happens.

You had a Perfectly working System before you introduced the Two 650s. Since the Power supply didn't help, I'd concentrate on what you changed that started the problems, the two 650s. If you don't have the problems with the other cards, I'd send the 650s back before I bought a new system only to find it still didn't work. The AMD 7750 release date is only about 6 months earlier than the NV650, so you really can't call it an 'Old' card. Here's how my 7750 is doing in Windows 8.1, http://setiathome.berkeley.edu/show_host_detail.php?hostid=6796475


Just thought I would post to give a slight update.

I've ran the Driver Cleaner. I uninstalled the AMD drivers (which I thought I already uninstalled!), and then uninstalled the NV drivers. Then, I reinstalled the latest NV drivers.

After about 2 days, Here is how the machine is doing. Not terrible, but not ideal either. I know that not all inconclusives will necessarily end up as invalids, but it just seems like a very high number of inconclusives nonetheless. I've turned the machine off this morning just to gather my mind about the subject.

Thanks for the suggestions Tbar. Next on your list is to upgrade to Windows 8.1 I think that I will try that tonight and see if anything improves. I may also attempt to make the cards be sturdier in the PCI slots. My case is mangled, so they aren't screwed in and are just kind of sitting in it. Not loose, but not being physically forced down or in place like if they were screwed in.

Thanks again to everyone who has been following my story here. I appreciate the dialog very much.
ID: 1495083 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Dual graphics card issues


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.