GPU Lockups

Message boards : Number crunching : GPU Lockups
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1861698 - Posted: 15 Apr 2017, 19:39:53 UTC - in response to Message 1861671.  
Last modified: 15 Apr 2017, 19:58:48 UTC

I've replaced the pair of 980Ti's with a single 1080Ti and single-tasked it's doing about 1900 CPH with the same command-line below. It's only been cooking for a few days now so I'll have to give it time to see if it stays stable.


The new Ti is an amazing performer, isn't it.
How are your temp/fan% ?

1060 https://setiathome.berkeley.edu/workunit.php?wuid=2505759191
1080 https://setiathome.berkeley.edu/workunit.php?wuid=2505759161
and another pair
1080 http://setiathome.berkeley.edu/workunit.php?wuid=2505776592
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1861698 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1861735 - Posted: 15 Apr 2017, 23:01:23 UTC

After a night of cooking it was hovering around 65C and the fans were about 40%-50% I believe -- single-tasked though so you can see it cooling off between tasks so I bet it'll run a bit louder when I switch to double-tasks. NVSMI was also suggesting it was nowhere near TDP so I suspect there's still room to push it harder.
ID: 1861735 · Report as offensive
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 1861766 - Posted: 16 Apr 2017, 1:11:47 UTC - in response to Message 1861735.  

@Shaggie76:

I also have a 5960X, and was having BSOD problems once in 2-3 days, or twice a day. Turns out it was Nvidia driver 378.88. After updating to 378.92, I haven't had a single BSOD (knocking on wood) for 10 days now. Worth checking out?
ID: 1861766 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1861818 - Posted: 16 Apr 2017, 6:16:29 UTC - in response to Message 1861735.  

After a night of cooking it was hovering around 65C and the fans were about 40%-50% I believe -- single-tasked though so you can see it cooling off between tasks so I bet it'll run a bit louder when I switch to double-tasks. NVSMI was also suggesting it was nowhere near TDP so I suspect there's still room to push it harder.


I had to rise the power CAP from 250W to 285W so that power draw is not a limiting factor. It hits sometimes 272-280W. To keep temp below 67C I had to push the fan to 98%. The GPU load hovers at 93-99%. P2 state has 2088 MHz set and 2040MHz running as GPU clock and mem clock is 10690MHz. I can not get the GPU to do CUDA at P0.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1861818 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1861905 - Posted: 16 Apr 2017, 18:46:33 UTC

I found data from my first night of crunching:



So the temps were better than I recall but I'm guessing double-tasking will push it harder -- I'll give it another week or so before I do that.

I was doing a bunch of reading about Tesla Compute Cluster (TCC) mode -- you can enable this if your primary GFX is something else like IGP. In your setup you might be able to enable it since you have multi-GPU; it might get you to P0. If you try let me know -- I'm designing my next build and considering getting a chip with IGP if it means I can use TCC mode for the discrete GPUs.
ID: 1861905 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1861909 - Posted: 16 Apr 2017, 19:13:25 UTC
Last modified: 16 Apr 2017, 19:13:49 UTC

Running a GUPPI vlar right now:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.65                 Driver Version: 381.65                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108... WDDM  | 0000:01:00.0      On |                  N/A |
| 40%   69C    P2   151W / 250W |   1534MiB / 11264MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+


Definitely some power-headroom there; hopefully double-tasking eats that up.
ID: 1861909 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1861940 - Posted: 16 Apr 2017, 23:09:45 UTC - in response to Message 1861909.  
Last modified: 16 Apr 2017, 23:10:01 UTC

40% 69C

Almost 70°c and the fan running at only 40%? Those fan profiles really do go for quiet over cool.
On my Gigabyte GTX 1070s the fan speed is 48% with the temperatures at 64°c (there are 3 fans per GPU- so they shift a lot of air). Power consumption is about 60% of maximum. Ambient temperature is 30°c.
65°c seems to be the target temperature for these cards. When the air conditioner is running the fan speed drops right back, and the GPU temps stay around the 62-64°c region.
Grant
Darwin NT
ID: 1861940 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1861952 - Posted: 17 Apr 2017, 0:02:38 UTC - in response to Message 1861940.  

40% 69C

Almost 70°c and the fan running at only 40%? Those fan profiles really do go for quiet over cool.

I don't know if the temp displayed by Open HW Monitor is the same as NVSMI -- as you can see in the graph lower down it flirted with 60C during a night of crunching. So are you measuring with NVSMI or something else?
ID: 1861952 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1861956 - Posted: 17 Apr 2017, 0:16:05 UTC - in response to Message 1861952.  

I don't know if the temp displayed by Open HW Monitor is the same as NVSMI -- as you can see in the graph lower down it flirted with 60C during a night of crunching.

True, hadn't noticed the difference between the graph & text readings.

So are you measuring with NVSMI or something else?

I'm on Windows and use GPUz
Grant
Darwin NT
ID: 1861956 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1862009 - Posted: 17 Apr 2017, 6:48:55 UTC - in response to Message 1861905.  
Last modified: 17 Apr 2017, 6:54:10 UTC

...
I was doing a bunch of reading about Tesla Compute Cluster (TCC) mode -- you can enable this if your primary GFX is something else like IGP. In your setup you might be able to enable it since you have multi-GPU; it might get you to P0. If you try let me know -- I'm designing my next build and considering getting a chip with IGP if it means I can use TCC mode for the discrete GPUs.


I'll google the TCC mode. Thanks.

EDIT: Did some googling
root@Linux1:~/KWSN-Bench-Linux-MBv7_v2.01.08# nvidia-smi -g 2 -dm 1
Changing driver models is not supported for GPU 0000:09:00.0 on this platform.
Treating as warning and moving on.
All done.
root@Linux1:~/KWSN-Bench-Linux-MBv7_v2.01.08# nvidia-smi -g 3 -dm 1
Changing driver models is not supported for GPU 0000:0A:00.0 on this platform.
Treating as warning and moving on.
All done.
root@Linux1:~/KWSN-Bench-Linux-MBv7_v2.01.08# nvidia-smi -g 1 -dm 1
Changing driver models is not supported for GPU 0000:06:00.0 on this platform.
Treating as warning and moving on.
All done.


To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1862009 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : GPU Lockups


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.