Message boards :
Number crunching :
1080 underclocking
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next
Author | Message |
---|---|
W-K 666 Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67 |
I posted this early last month, could this be the problem you are experiencing. The problem might be that although overall the indications are that there is plenty of power. Could it be because of the way that power is connected that some parts have an abundance while other parts are not getting enough. If one or more of the regulators are connected to the output stages only. But we don't use the output stages when crunching. So for our purposes that power is wasted or not available. I know from CPU's and DSP's they can have many power connections, and each of these power connections are not rejoined internally. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Thanks Grant, I've made that change to Nvidia control panel. I'll check the P state later today when it comes back online. Powered down right now due to heat during the day. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Hi WinterKnight, I did notice last night it saying 65% of power at max load. I'll have to keep an eye on it. Thanks everyone. Zalster |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Any progress with this issue? Grant Darwin NT |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Not really. I've had to keep an additional overclocking on it to keep it at it's normal speed. At this point as long it stays I'll just keep an eye on it Z |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
@Zalster, sometime when a job failed the card sets his frequency down to standard frequency without oc. Mostly I must reset the cards by newstart of the machine. This happened sometimes on my 3 Titans. edit: think this will happen when a Job crashed D5400XS V8-Xeon |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
@Zalster, The problem with this card isn't an overclock, just that for some reason it will run at it's factory boost speed, then drop down to it's base speed for no particularly obvious reason. Grant Darwin NT |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
@Zalster, Had this loose/distant youtube comment discussion with Jayz2Cents about GPU Boost not very long ago. People misunderstand what boost clocks are. The only guaranteed clocks are the base clock. Anything higher depends on how stable the hardware and the quality of the conditions you can provide. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
[quote]@Zalster, really, that's true Jason D5400XS V8-Xeon |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Had this loose/distant youtube comment discussion with Jayz2Cents about GPU Boost not very long ago. People misunderstand what boost clocks are. The only guaranteed clocks are the base clock. Anything higher depends on how stable the hardware and the quality of the conditions you can provide. I have to disagree on that They charge a premium for the function, it would be nice if it delivered. I'd consider a card unable to sustain it's rated Boost speed, when thermal & power limits aren't a factor, to be a problem. If the card is well under it's maximum possible thermal load, and well under it's maximum possible power load then there is no reason for it to drop it's clock speed down. And once the power &/or thermal issues have been resolved, it should then crank back up to it's maximum possible clock speed, in this case the Boost value. A card dropping from it's maximum Base speed, when thermal & heat loads aren't even close to their limits would be considered a problem (didn't we have a series of cards that were down clocking for a while there?). The fact that with a manual overclock there are no power or heat issues, nor a raft of invalids indicates that it should be able to maintain it's rated Boost speed IMHO. Grant Darwin NT |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yes, also true, which is why they may be able to charge a premium for the same chip, different fans/cooler, and why everyone has equal difficulty sustaining over 2.1GHz stable. Not as many mysteries as it might seem: the base clocks are in the engineering domain, and the boost clocks in the overclocking domain. No surprise the technology would eventually get better at the tasks than humans. That leaves no headroom, and plenty of situations the AI would throttle. Plenty of special cases that could bend that, or break it, but nonetheless a marketing point. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
W-K 666 Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67 |
If the card is well under it's maximum possible thermal load, and well under it's maximum possible power load then there is no reason for it to drop it's clock speed down. But you actually don't know that. As I indicated in my msg - 1807934 the GPU might have reached it's limit on one or more power input pins, while others, like the video output, because in our BOINC/Seti crunching we don't use it, are drawing the minimum power necessary. Also on temps, there are probably more than one sensors, probably all connected to the same circuit which interrupts overclocking, these sensors act immediately, much faster than the one that measures and reports the GPU temperature. So again you don't know that one small part of the GPU has reached its temperature limit, except for the fact that the GPU has reduced its clock speed. And even if it not Nvidia most internal chip temperature sensor circuit work in similar ways. There is an Intel FAQ that explains temperature limits and how they are applied. http://www.intel.com/content/www/us/en/support/processors/000005597.html |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, my gtx1080's run at P2 state. I have to do the following: nvidia-settings -a "[GPU:0]/GPUOverVoltageOffset=16000" /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:0]/GPUGraphicsClockOffset[3]=200" nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=90" /usr/bin/nvidia-smi -i 1 -pl 215 /usr/bin/nvidia-smi -i 1 -ac 5005,1911 nvidia-settings -a "[GPU:1]/GPUOverVoltageOffset=16000" /usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:1]/GPUGraphicsClockOffset[3]=200" nvidia-settings -a "[gpu:1]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=90" /usr/bin/nvidia-smi -i 2 -pl 215 /usr/bin/nvidia-smi -i 2 -ac 5005,1911 nvidia-settings -a "[GPU:2]/GPUOverVoltageOffset=16000" /usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:2]/GPUGraphicsClockOffset[3]=200" nvidia-settings -a "[gpu:2]/GPUFanControlState=1" -a "[fan:2]/GPUTargetFanSpeed=90" /usr/bin/nvidia-smi -i 3 -pl 215 /usr/bin/nvidia-smi -i 3 -ac 5005,1911 nvidia-settings -a "[GPU:3]/GPUOverVoltageOffset=16000" /usr/bin/nvidia-settings -a "[gpu:3]/GPUMemoryTransferRateOffset[3]=1100" -a "[gpu:3]/GPUGraphicsClockOffset[3]=200" nvidia-settings -a "[gpu:3]/GPUFanControlState=1" -a "[fan:3]/GPUTargetFanSpeed=90" /usr/bin/nvidia-smi -i 4 -pl 215 /usr/bin/nvidia-smi -i 4 -ac 5005,1911 The latest driver does not allow all these settings. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8 |
Yap, I have noticed this some time ago, and so far no known workaround to push it back to P0 during crunching... Seems like nVidia purposely locked compute to P2 with lower memory clock. So effectively, nVidia for compute tasks (where it matters the most) is limiting memory bandwidth to lower then advertised 320GB/sec. Why, I don't know, this was never the case with 7x0 or earlier GPU series, but first seen recently on 9x0 (workaround using smi possible) and now on 10x0 (workaround not possible yet). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Coming late to this discussion because I just picked up two 1070 and needed to see the current ideas of how to get the cards to run closer to stock P0 settings for distributed computing. Looks like Nvidia has hamstrung their GPGPU performance yet again like with Maxwell. So, I attacked the problem as is did with my 970 by using Nvidia Inspector. Added a mild +50 Mhz to core speed and a +400 Mhz to memory speed. GPU-Z has the cards running at 1923 and 1911 Mhz on the core clock and effective memory clock speed of 8400 Mhz. I am happy that I could still use the existing tools to get the card running optimally for GPGPU computing for the SETI project. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hi, Did you ever come across any documented reason for the drop to p2 state under compute workload ? I mean it seems obvious after 560ti factory overclocks were causing large scale issues at one point before the change, though I never saw 'official' reasons... "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Coming late to this discussion because I just picked up two 1070 and needed to see the current ideas of how to get the cards to run closer to stock P0 settings for distributed computing. Looks like Nvidia has hamstrung their GPGPU performance yet again like with Maxwell. So, I attacked the problem as is did with my 970 by using Nvidia Inspector. Added a mild +50 Mhz to core speed and a +400 Mhz to memory speed. GPU-Z has the cards running at 1923 and 1911 Mhz on the core clock and effective memory clock speed of 8400 Mhz. I am happy that I could still use the existing tools to get the card running optimally for GPGPU computing for the SETI project. for Windows, I also use a nVidia Inspector shortcut for my 980 to up the memory clock. Traditionally that information and control was via nvapi, though not present/exposed in the same way on Mac and Linux. I did spot in Cuda 8 rc, a new library included called nvml, which supposedly exposes what nvidia-smi command line utility uses (now also present on Windows it seems). Most likely then a new breed of tools will evolve, possibly cross platform. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Hi, I remember reading it some time back where they said, they couldn't guarantee accurate results in a given speed for scientific work, so they decreased the speed to what they figured was a safer speed, ie now the P2 state If I can go back and find the statement I'll link it but that was way back when we first started to see the reduced speeds. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hi, Thanks. Will be interested, since that's the conclusion I came to from what was going on, though never did see confirmation of it myself. One of those cases I would like to come across in one of the Cuda manuals or somesuch, though still working through the latest (8.0rc) updates haven't spotted a specific mention yet. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, My settings are a workaround to make P2 performance equal P0. The latest driver does not copy P0 settings to P2 but the one at the time of the initial release (late May/early June) does. That is why I do not use the latest driver. A couple of years ago I bought a 780 and it did the same thing i.e. P2 on compute. The later drivers fixed that. I'm waiting for a new driver that will allow P0 for compute workloads. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.