My 3 headed gtx 1070ti setup is dropping a gpu.

Questions and Answers : GPU applications : My 3 headed gtx 1070ti setup is dropping a gpu.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1976400 - Posted: 22 Jan 2019, 6:55:44 UTC

Nvidia-smi says it is "0000:02:00.0" is lost. I guess I am going to have to rotate the newest card with one of the older cards and see if the address changes.

If it does, the new card isn't upto the task(s). or the temperature.

All three cards are pegging 83C. So I tried reducing the available watts to 120 and the cards seemed to all be running cooler. But the problem re-occurred anyway. :(

Darn.
Tom
A proud member of the OFA (Old Farts Association).
ID: 1976400 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 20697
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1976403 - Posted: 22 Jan 2019, 7:33:11 UTC

83C is way to hot. Are they overclocked? Have they adequate cooling air getting to them? Is there adequate space for the exhaust air to get out (and not be drawn back in again)?
The last time I had a GPU drop "regularly" it was down to a faulty PSU which finally blew up taking the GPU it was dropping with it :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1976403 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4241
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1976479 - Posted: 22 Jan 2019, 16:13:15 UTC

are they plugged directly into the motherboard? or are they running on risers?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1976479 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1976500 - Posted: 22 Jan 2019, 23:13:36 UTC - in response to Message 1976400.  

Nvidia-smi says it is "0000:02:00.0" is lost. I guess I am going to have to rotate the newest card with one of the older cards and see if the address changes.

If it does, the new card isn't upto the task(s). or the temperature.

All three cards are pegging 83C. So I tried reducing the available watts to 120 and the cards seemed to all be running cooler. But the problem re-occurred anyway. :(

Darn.
Tom


I got to thinking. Maybe it is a problem with the gpu slot. The pcie x1 slot also has a lot of cables under it. So I am trying mashing all the gpus together and using an external fan on them so they stay "cooler". Will report back.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1976500 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1976501 - Posted: 22 Jan 2019, 23:16:42 UTC - in response to Message 1976479.  

are they plugged directly into the motherboard? or are they running on risers?


Directly into the MB but the bottom slot has a lot of cables under it. I specifically bought a shorter card for that slot but maybe I didn't get it seated well enough.

One other solution that will allow better cooling is move 1 video card outside the chasis. If I can't get them to stay below 80C I will look into that. I did try lower the power limit and it was amazing how much cooler they got. I think the gpus slowed down though.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1976501 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1976502 - Posted: 22 Jan 2019, 23:18:07 UTC - in response to Message 1976403.  

83C is way to hot. Are they overclocked? Have they adequate cooling air getting to them? Is there adequate space for the exhaust air to get out (and not be drawn back in again)?
The last time I had a GPU drop "regularly" it was down to a faulty PSU which finally blew up taking the GPU it was dropping with it :-(


I expect its not adequate cooling. They are running "stock" so any OC is on their own.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1976502 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1976662 - Posted: 23 Jan 2019, 21:58:31 UTC
Last modified: 23 Jan 2019, 22:01:11 UTC

After running all three gtx 1070 ti's in slots that didn't have cables under them I am reasonably confident that the original issue was not getting the gpu seated into the socket/slot.

With all three gpus cheek and jowl I didn't have an issue with them "dropping out". But everything was even hotter than it had been before.

Except I had to move the 3 fan Gigabyte gpu to the bottom most slot (the MB stands on its side). Because I couldn't get its middle fan of the Gigabyte gpu to stop hitting the next gpu below it. Even tried a wedge :(

So the 3 fan Gigabyte on the bottom was now running 67C, and the middle gpu was running 83C and the "top" one was upto 89C. That was last night. I dropped the watt limit on the "hot" gpu to 120 watts and it got back down to 83C so I left it running last night.

This afternoon I moved the GPU that is in between the other two gpus onto a riser card setup. So far there has been no visible difference in performance.

The goal is to allow all three gpus run at stock power levels while not getting so darned hot.

The problem is I may have to drop the power wattage on the Zotac gpu that is still in the case and/or maybe even the one that is outside the case. This is because when I was running two Gtx 1070Ti's and a gtx 1060 3GB on the bottom cable infested slot, all three were running about 83C.

First report, after 5-10 minutes in, is NO ONE is running above 67C. I need to go away for at least an hour and see how it looks then. I really hope this first report holds up. Yes, it looks a bit funky with the case open, a fan blowing on the gpus in the case and another card sitting on a riser card base just in front of the case. But this is in my house away from high traffic areas so it should be safe for the expensive gpu.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1976662 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13138
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1976684 - Posted: 24 Jan 2019, 0:08:17 UTC

You must not have adequate case ventilation. I have a 3 EVGA GTX 1070Ti Black Edition host that runs the cards from bottom to top at 65°, 68° , and 48° C right now. The bottom and middle cards are cheek to jowl and the top card has a one slot spacing advantage from the middle card. A 120 fan on the bottom of the case blowing on the bottom slot card and 3 120 front intake fans with a final 120mm rear exhaust fan. The cpu has a 280 AIO in the roof exhausting. Worst case for that middle card in summer was about 74° C.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1976684 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4241
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1976698 - Posted: 24 Jan 2019, 0:39:19 UTC - in response to Message 1976684.  

You must not have adequate case ventilation.


or not enough GPU fan speed. If I recall correctly, you run 100% GPU fan speed. And I personally run 85% for any card blocked by anther card and it certainly makes a big difference in temps. I think the blocked 1060 and 2070 run about 70C. no big deal at all. even overclocked.

If he's just running the cards at stock fan curves, then they will use pretty low fan RPMs and just thermal throttle around 83C.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1976698 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13138
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1976719 - Posted: 24 Jan 2019, 2:09:14 UTC - in response to Message 1976698.  

Very true, the cards will not run without thermal throttling on the stock fan curve. Because the video processor is not running at all on BOINC, the fan control mechanism has no input and does not see the need to ramp fan speeds. Some of the later iCX cards do have more sensors and can provide more inputs to the fan speed control algorithm even when the video processor clocks are idling.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1976719 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1976782 - Posted: 24 Jan 2019, 18:13:41 UTC

New issue. Apparently, I got myself upgraded to Linux version 18.04 and it started with a "Boinc Manager crashed 3 times in 4 minutes error". Then I saw a "system has internal error" ran the software updater which announced I was "up to date".

Then rebooted and can't log back in. :(

It accepts the login credentials and then cycles back to the login screen after a few seconds.

Darn.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1976782 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1976794 - Posted: 24 Jan 2019, 19:16:57 UTC - in response to Message 1976782.  
Last modified: 24 Jan 2019, 19:18:56 UTC

New issue. Apparently, I got myself upgraded to Linux version 18.04 and it started with a "Boinc Manager crashed 3 times in 4 minutes error". Then I saw a "system has internal error" ran the software updater which announced I was "up to date".

Then rebooted and can't log back in. :(

It accepts the login credentials and then cycles back to the login screen after a few seconds.

Darn.

Tom


I can get into command line in recovery mode. There it offers verision 14.xx

I don't suppose there is any EASY way to roll it back to 14.xx?

I suppose I can always reinstall but I don't appear to have Lubuntu 14.xxx any more.
edit--- found the version on archive at the website.----

Hmmmmm.....

Tom
A proud member of the OFA (Old Farts Association).
ID: 1976794 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13138
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1976801 - Posted: 24 Jan 2019, 19:53:05 UTC

Uggh, the old login loop problem. I've had it a couple of times now myself since updating to 18.04 LTS. Only occurred somewhere after the conversion and hasn't ever reappeared.

Here is the solution about removing the .xauthority file and recreating it with the correct ownership.

https://askubuntu.com/questions/1059458/ubuntu-18-04-on-login-loop-even-with-correct-password
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1976801 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5121
Credit: 276,046,078
RAC: 462
Message 1981668 - Posted: 22 Feb 2019, 16:14:42 UTC - in response to Message 1976801.  

Uggh, the old login loop problem. I've had it a couple of times now myself since updating to 18.04 LTS. Only occurred somewhere after the conversion and hasn't ever reappeared.

Here is the solution about removing the .xauthority file and recreating it with the correct ownership.

https://askubuntu.com/questions/1059458/ubuntu-18-04-on-login-loop-even-with-correct-password


I have come back to the above URL several times. I have had 1 success following what I think were a secondary suggestion. :(

When you factor in the amount of time the re-boots were taking, its almost faster to do a re-install. :(

Tom
A proud member of the OFA (Old Farts Association).
ID: 1981668 · Report as offensive

Questions and Answers : GPU applications : My 3 headed gtx 1070ti setup is dropping a gpu.


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.