1080 underclocking

Message boards : Number crunching : 1080 underclocking
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19048
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1807934 - Posted: 8 Aug 2016, 12:15:04 UTC - in response to Message 1807920.  

I posted this early last month, could this be the problem you are experiencing.

GPU FLOPS: Theory vs Reality - msg 1800841

Maybe, but if it's anything like my card then even with 3WUs at a time it's power consumption rarely gets up to 65% of it's maximum, let alone above it.

The problem might be that although overall the indications are that there is plenty of power. Could it be because of the way that power is connected that some parts have an abundance while other parts are not getting enough.

If one or more of the regulators are connected to the output stages only. But we don't use the output stages when crunching. So for our purposes that power is wasted or not available.

I know from CPU's and DSP's they can have many power connections, and each of these power connections are not rejoined internally.
ID: 1807934 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1807950 - Posted: 8 Aug 2016, 15:22:58 UTC - in response to Message 1807899.  

Thanks Grant, I've made that change to Nvidia control panel.

I'll check the P state later today when it comes back online. Powered down right now due to heat during the day.
ID: 1807950 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1807952 - Posted: 8 Aug 2016, 15:25:19 UTC - in response to Message 1807918.  

Hi WinterKnight,

I did notice last night it saying 65% of power at max load.

I'll have to keep an eye on it.

Thanks everyone.

Zalster
ID: 1807952 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1810561 - Posted: 20 Aug 2016, 0:11:52 UTC - in response to Message 1807952.  

Any progress with this issue?
Grant
Darwin NT
ID: 1810561 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1810563 - Posted: 20 Aug 2016, 0:21:30 UTC - in response to Message 1810561.  

Not really.

I've had to keep an additional overclocking on it to keep it at it's normal speed.

At this point as long it stays I'll just keep an eye on it

Z
ID: 1810563 · Report as offensive
_heinz
Volunteer tester

Send message
Joined: 25 Feb 05
Posts: 744
Credit: 5,539,270
RAC: 0
France
Message 1810666 - Posted: 20 Aug 2016, 7:48:52 UTC - in response to Message 1807710.  
Last modified: 20 Aug 2016, 7:51:31 UTC

@Zalster,
sometime when a job failed the card sets his frequency down to standard frequency without oc. Mostly I must reset the cards by newstart of the machine.
This happened sometimes on my 3 Titans.

edit: think this will happen when a Job crashed
D5400XS V8-Xeon
ID: 1810666 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1810673 - Posted: 20 Aug 2016, 8:03:57 UTC - in response to Message 1810666.  

@Zalster,
sometime when a job failed the card sets his frequency down to standard frequency without oc. Mostly I must reset the cards by newstart of the machine.
This happened sometimes on my 3 Titans.

edit: think this will happen when a Job crashed

The problem with this card isn't an overclock, just that for some reason it will run at it's factory boost speed, then drop down to it's base speed for no particularly obvious reason.
Grant
Darwin NT
ID: 1810673 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1810676 - Posted: 20 Aug 2016, 8:11:56 UTC - in response to Message 1810673.  

@Zalster,
sometime when a job failed the card sets his frequency down to standard frequency without oc. Mostly I must reset the cards by newstart of the machine.
This happened sometimes on my 3 Titans.

edit: think this will happen when a Job crashed

The problem with this card isn't an overclock, just that for some reason it will run at it's factory boost speed, then drop down to it's base speed for no particularly obvious reason.


Had this loose/distant youtube comment discussion with Jayz2Cents about GPU Boost not very long ago. People misunderstand what boost clocks are. The only guaranteed clocks are the base clock. Anything higher depends on how stable the hardware and the quality of the conditions you can provide.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1810676 · Report as offensive
_heinz
Volunteer tester

Send message
Joined: 25 Feb 05
Posts: 744
Credit: 5,539,270
RAC: 0
France
Message 1810679 - Posted: 20 Aug 2016, 9:05:06 UTC - in response to Message 1810676.  

[quote]@Zalster,
particularly obvious reason.


Had this loose/distant youtube comment discussion with Jayz2Cents about GPU Boost not very long ago. People misunderstand what boost clocks are. The only guaranteed clocks are the base clock. Anything higher depends on how stable the hardware and the quality of the conditions you can provide.


really, that's true Jason
D5400XS V8-Xeon
ID: 1810679 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1810682 - Posted: 20 Aug 2016, 9:21:34 UTC - in response to Message 1810676.  

Had this loose/distant youtube comment discussion with Jayz2Cents about GPU Boost not very long ago. People misunderstand what boost clocks are. The only guaranteed clocks are the base clock. Anything higher depends on how stable the hardware and the quality of the conditions you can provide.

I have to disagree on that
They charge a premium for the function, it would be nice if it delivered.
I'd consider a card unable to sustain it's rated Boost speed, when thermal & power limits aren't a factor, to be a problem.

If the card is well under it's maximum possible thermal load, and well under it's maximum possible power load then there is no reason for it to drop it's clock speed down.
And once the power &/or thermal issues have been resolved, it should then crank back up to it's maximum possible clock speed, in this case the Boost value.

A card dropping from it's maximum Base speed, when thermal & heat loads aren't even close to their limits would be considered a problem (didn't we have a series of cards that were down clocking for a while there?). The fact that with a manual overclock there are no power or heat issues, nor a raft of invalids indicates that it should be able to maintain it's rated Boost speed IMHO.
Grant
Darwin NT
ID: 1810682 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1810685 - Posted: 20 Aug 2016, 9:38:56 UTC - in response to Message 1810682.  

Yes, also true, which is why they may be able to charge a premium for the same chip, different fans/cooler, and why everyone has equal difficulty sustaining over 2.1GHz stable. Not as many mysteries as it might seem: the base clocks are in the engineering domain, and the boost clocks in the overclocking domain. No surprise the technology would eventually get better at the tasks than humans. That leaves no headroom, and plenty of situations the AI would throttle. Plenty of special cases that could bend that, or break it, but nonetheless a marketing point.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1810685 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19048
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1810693 - Posted: 20 Aug 2016, 11:07:24 UTC - in response to Message 1810682.  
Last modified: 20 Aug 2016, 11:08:28 UTC

If the card is well under it's maximum possible thermal load, and well under it's maximum possible power load then there is no reason for it to drop it's clock speed down.


But you actually don't know that.

As I indicated in my msg - 1807934 the GPU might have reached it's limit on one or more power input pins, while others, like the video output, because in our BOINC/Seti crunching we don't use it, are drawing the minimum power necessary.

Also on temps, there are probably more than one sensors, probably all connected to the same circuit which interrupts overclocking, these sensors act immediately, much faster than the one that measures and reports the GPU temperature. So again you don't know that one small part of the GPU has reached its temperature limit, except for the fact that the GPU has reduced its clock speed.
And even if it not Nvidia most internal chip temperature sensor circuit work in similar ways. There is an Intel FAQ that explains temperature limits and how they are applied. http://www.intel.com/content/www/us/en/support/processors/000005597.html
ID: 1810693 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1810697 - Posted: 20 Aug 2016, 11:34:02 UTC

Hi,

my gtx1080's run at P2 state. I have to do the following:
nvidia-settings -a "[GPU:0]/GPUOverVoltageOffset=16000"
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:0]/GPUGraphicsClockOffset[3]=200"
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-smi -i 1 -pl 215
/usr/bin/nvidia-smi -i 1 -ac 5005,1911

nvidia-settings -a "[GPU:1]/GPUOverVoltageOffset=16000"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:1]/GPUGraphicsClockOffset[3]=200"
nvidia-settings -a "[gpu:1]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-smi -i 2 -pl 215
/usr/bin/nvidia-smi -i 2 -ac 5005,1911

nvidia-settings -a "[GPU:2]/GPUOverVoltageOffset=16000"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:2]/GPUGraphicsClockOffset[3]=200"
nvidia-settings -a "[gpu:2]/GPUFanControlState=1" -a "[fan:2]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-smi -i 3 -pl 215
/usr/bin/nvidia-smi -i 3 -ac 5005,1911

nvidia-settings -a "[GPU:3]/GPUOverVoltageOffset=16000"
/usr/bin/nvidia-settings -a "[gpu:3]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:3]/GPUGraphicsClockOffset[3]=200"
nvidia-settings -a "[gpu:3]/GPUFanControlState=1" -a "[fan:3]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-smi -i 4 -pl 215
/usr/bin/nvidia-smi -i 4 -ac 5005,1911


The latest driver does not allow all these settings.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1810697 · Report as offensive
Profile M_M
Avatar

Send message
Joined: 20 May 04
Posts: 76
Credit: 45,752,966
RAC: 8
Serbia
Message 1810722 - Posted: 20 Aug 2016, 14:01:34 UTC - in response to Message 1810697.  

Yap, I have noticed this some time ago, and so far no known workaround to push it back to P0 during crunching... Seems like nVidia purposely locked compute to P2 with lower memory clock.

So effectively, nVidia for compute tasks (where it matters the most) is limiting memory bandwidth to lower then advertised 320GB/sec. Why, I don't know, this was never the case with 7x0 or earlier GPU series, but first seen recently on 9x0 (workaround using smi possible) and now on 10x0 (workaround not possible yet).
ID: 1810722 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1810737 - Posted: 20 Aug 2016, 15:03:11 UTC - in response to Message 1810722.  

Coming late to this discussion because I just picked up two 1070 and needed to see the current ideas of how to get the cards to run closer to stock P0 settings for distributed computing. Looks like Nvidia has hamstrung their GPGPU performance yet again like with Maxwell. So, I attacked the problem as is did with my 970 by using Nvidia Inspector. Added a mild +50 Mhz to core speed and a +400 Mhz to memory speed. GPU-Z has the cards running at 1923 and 1911 Mhz on the core clock and effective memory clock speed of 8400 Mhz. I am happy that I could still use the existing tools to get the card running optimally for GPGPU computing for the SETI project.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1810737 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1810748 - Posted: 20 Aug 2016, 15:43:49 UTC - in response to Message 1810697.  

Hi,

my gtx1080's run at P2 state. I have to do the following:
...
The latest driver does not allow all these settings.


Did you ever come across any documented reason for the drop to p2 state under compute workload ? I mean it seems obvious after 560ti factory overclocks were causing large scale issues at one point before the change, though I never saw 'official' reasons...
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1810748 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1810749 - Posted: 20 Aug 2016, 15:50:49 UTC - in response to Message 1810737.  
Last modified: 20 Aug 2016, 15:51:16 UTC

Coming late to this discussion because I just picked up two 1070 and needed to see the current ideas of how to get the cards to run closer to stock P0 settings for distributed computing. Looks like Nvidia has hamstrung their GPGPU performance yet again like with Maxwell. So, I attacked the problem as is did with my 970 by using Nvidia Inspector. Added a mild +50 Mhz to core speed and a +400 Mhz to memory speed. GPU-Z has the cards running at 1923 and 1911 Mhz on the core clock and effective memory clock speed of 8400 Mhz. I am happy that I could still use the existing tools to get the card running optimally for GPGPU computing for the SETI project.


for Windows, I also use a nVidia Inspector shortcut for my 980 to up the memory clock. Traditionally that information and control was via nvapi, though not present/exposed in the same way on Mac and Linux. I did spot in Cuda 8 rc, a new library included called nvml, which supposedly exposes what nvidia-smi command line utility uses (now also present on Windows it seems). Most likely then a new breed of tools will evolve, possibly cross platform.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1810749 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1810753 - Posted: 20 Aug 2016, 16:04:48 UTC - in response to Message 1810748.  

Hi,

my gtx1080's run at P2 state. I have to do the following:
...
The latest driver does not allow all these settings.


Did you ever come across any documented reason for the drop to p2 state under compute workload ? I mean it seems obvious after 560ti factory overclocks were causing large scale issues at one point before the change, though I never saw 'official' reasons...



I remember reading it some time back where they said, they couldn't guarantee accurate results in a given speed for scientific work, so they decreased the speed to what they figured was a safer speed, ie now the P2 state

If I can go back and find the statement I'll link it but that was way back when we first started to see the reduced speeds.
ID: 1810753 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1810756 - Posted: 20 Aug 2016, 16:10:53 UTC - in response to Message 1810753.  

Hi,

my gtx1080's run at P2 state. I have to do the following:
...
The latest driver does not allow all these settings.


Did you ever come across any documented reason for the drop to p2 state under compute workload ? I mean it seems obvious after 560ti factory overclocks were causing large scale issues at one point before the change, though I never saw 'official' reasons...



I remember reading it some time back where they said, they couldn't guarantee accurate results in a given speed for scientific work, so they decreased the speed to what they figured was a safer speed, ie now the P2 state

If I can go back and find the statement I'll link it but that was way back when we first started to see the reduced speeds.


Thanks. Will be interested, since that's the conclusion I came to from what was going on, though never did see confirmation of it myself.

One of those cases I would like to come across in one of the Cuda manuals or somesuch, though still working through the latest (8.0rc) updates haven't spotted a specific mention yet.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1810756 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1810765 - Posted: 20 Aug 2016, 16:43:55 UTC

Hi,

My settings are a workaround to make P2 performance equal P0.
The latest driver does not copy P0 settings to P2 but the one at the time of the initial release (late May/early June) does. That is why I do not use the latest driver.

A couple of years ago I bought a 780 and it did the same thing i.e. P2 on compute. The later drivers fixed that. I'm waiting for a new driver that will allow P0 for compute workloads.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1810765 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

Message boards : Number crunching : 1080 underclocking


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.