i7 970 Power Tuning Failures

Message boards : Number crunching : i7 970 Power Tuning Failures
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1834292 - Posted: 5 Dec 2016, 15:03:11 UTC

I recently put together an AMD FX-6300 hex-core system to drive a set of three of GTX 1070's (2 tasks each, no CPU tasks). I was delighted to see that this new machine was humming along at a cool 400 watts despite churning out as much credit as a similar triple-GTX 1070 machine I'd been running for months.

The older machine is a i7-970 hex-core + HyperThreading system and when compared to the new machine's cool 400 watts the older machine's nearly 500 watt load was rather disappointing!

I was curious to see if I could bring it down to comparable levels despite the fact that the i7-970 (140W TDP) was released Q3'2010 and the AMD FX-6300 (95W TDP) was released Q3'2012.

Initially the draw on the i7 was over 500W and after analyzing the credit/hour of the CPU-tasks sharing hyper-threaded cores with GPU tasks I realized it was an utter waste of time: it was maybe 30 CPH for 6 CPU tasks vs the nearly 3000 CPH I was getting from the 6 GPU tasks. I disabled CPU work and let the queue drain and this got me maybe 20W-30W of savings (it's hard to tell because the signal is noisy)

The i7 had been optimized for hard-crunching and so it had previously had SpeedStep disabled; enabling this didn't seem to make much difference. I also disabled Intel TurboBoost and that didn't make a big difference either -- I'm guessing that the CPU-side of the GPU tasks may not trigger it enough to notice. After enabling speed-step I looked at the cores with Open Hardware Monitor and saw that even without CPU tasks a full rack of GPU tasks put all 6 cores at max-frequency.

I also tried disabling HyperThreading and this made no difference to the power-draw either (which makes sense since a HT isn't a real core).

The last thing I tried was increasingly aggressive underclocking -- based on my experiences overclocking I'd hoped that simply reducing the frequency would save me some power; tragically even cutting the CPU frequency in half (and disabling SpeedStep so it would stay there) didn't make any significant difference! I wasn't ballsy enough to try undervolting to go with it -- stability is more important to me than saving a few watts.

At the end of it all I left HT on, SpeedStep on, TurboBoost off, and the default clock ratios. I'm now dreaming of the next generation of hex-cores and am super curious how cool they can run!
ID: 1834292 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1834323 - Posted: 5 Dec 2016, 17:27:58 UTC

When you said SpeedStep didn't make any difference. How did you have your OS processor power settings set?
For my i7-860 I have found that even at idle dropping the clock reduced the power usage. However one of the biggest power drops was reducing the memory voltage from 1.5 to 1.35. I had DIMMs with a profile that supported that voltage. I don't know if all DIMMs will be OK that low.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1834323 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1834324 - Posted: 5 Dec 2016, 17:37:40 UTC

That's a good point but I just checked and the min setting was set to 5% -- I guess the GPU app spinning makes the OS think it needs the cores running full-speed? I could try capping the max but to be fair I should do the same to the AMD system too!

I never would have guessed that the memory made that much of a difference -- when I built the new machine I got higher-end RAM assuming that it might help keep the GPU fed but maybe next time I should get min-spec speed!
ID: 1834324 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1834325 - Posted: 5 Dec 2016, 17:39:18 UTC

I suppose the other thing I never bothered trying was -use_sleep; I think I'll let these new settings soak in for a week before changing anything else though.
ID: 1834325 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1834326 - Posted: 5 Dec 2016, 17:47:43 UTC - in response to Message 1834324.  

That's a good point but I just checked and the min setting was set to 5% -- I guess the GPU app spinning makes the OS think it needs the cores running full-speed? I could try capping the max but to be fair I should do the same to the AMD system too!

No, you shouldn't AMD and NV OpenCL runtimes differ hugely in sync mode chosen.
AMD can yield CPU while NV will do just spin-wait loop comsuming CPU in vain.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1834326 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1834462 - Posted: 6 Dec 2016, 12:24:25 UTC

I just tried disabling CPU work on my 6700K with the AMD 290 and Nvidia 780, the GPUs between them didn't even amount to one core being busy. There is a pretty big discrepancy in per-core performance between the i7-970 and 6700K, though. If I had to guesstimate, I'd say it was using maybe 40% of one core for both GPUs, although it's a bit difficult to tell with the bars constantly bouncing around on all 8 cores (er, threads, it's hyperthreading, 4 cores).

What I typically do is put the CPU at 50%, no matter how many GPU tasks I'm running and it all seems to work out fine. That means I run 6x on the 980x system and 4x on the 6700K system. I keep an eye on my GPU usage, TDP %, memory/core/bus utilization and that keeps both the 290 and 780 at 99%-ish on the core and usually 65-85% on the memory, bus utilization is typically quite low, rarely going over 15%, but that's a PCI-Ex 3.0 x8 bus, so it's a ton of bandwidth that probably won't ever be used.

On the 980x system, the 1070s TDP % is rarely over 50%, no matter how many tasks I run, concurrently. GPU is 99% utilization, memory hovers around 50-75%, bus utilization rarely goes above 25% (PCI-Ex 2.0 x8 on this computer).

Once I get a stable RAC on Linux with the 980x, I'll move the 1070s over to the 6700K to see if it makes a difference, but I suspect that it won't.
ID: 1834462 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1834468 - Posted: 6 Dec 2016, 12:58:33 UTC

You can check the actual CPU time in the task summary here on the website; I have scripts that I use that work out stats for my tasks and they also monitor how much of a CPU-thread they used; in all my machines it says CPU use per task for my 1070's is around 85-90%. Maybe I'm using more aggressive SoG command-line parameters?

I've also heard that what we see as 100% GPU utilization may not indicate all CU's being active; ie: it says "something" is running 100% of the time but it may only be using 1 of the available CUs. Despite the GPU being "100% load" I've observed significant power fluctuations to the wall depending on what kind of tasks are running so I'm inclined to believe it.
ID: 1834468 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1834584 - Posted: 7 Dec 2016, 10:02:49 UTC - in response to Message 1834468.  

Shaggy, you should try switching one of your machines over to Linux and watch the RAC for a month, with the "special sauce" application that's floating around. I think you'll be amazed at the RAC that you'll achieve under linux vs what you're getting under Windows. I was getting about 40k RAC under Win10 with my 2x 1070s, but that same system was #18 on the Top Computers list, with a RAC approaching 80k.

If you want to give it a try, download Manjaro "Cinnamon" edition. It will boot up and install with the Nvidia drivers already installed if you choose "non-free drivers" at the boot screen.

The command line parameters I'm using on my Linux box for the 2x1070s is: " -unroll 5 -pfb 8 -pfp 128" (you only run 1 task per GPU under Linux because it uses the GPU very efficiently). What command line parameters are you using?
-baron_iv
Proud member of:
GPU Users Group
ID: 1834584 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1834611 - Posted: 7 Dec 2016, 13:03:31 UTC - in response to Message 1834584.  
Last modified: 7 Dec 2016, 13:04:09 UTC

Shaggy, you should try switching one of your machines over to Linux and watch the RAC for a month, with the "special sauce" application that's floating around. I think you'll be amazed at the RAC that you'll achieve under linux vs what you're getting under Windows. I was getting about 40k RAC under Win10 with my 2x 1070s, but that same system was #18 on the Top Computers list, with a RAC approaching 80k.

I tried Linux for a few weeks with one of my systems with dual 1070's: the stock app was old (no SoG) and was slow, then TBar built me an app closer to the latest but it didn't support multiple tasks at once and was significantly slower (and for some reason he still didn't build for SoG despite it being the clear winner for all my windows systems). I built the latest NVidia drivers from their source to make sure I had everything up to date and even then I would get the occasioanl crash when running multiple tasks at once (possibly because the Linux version of the app doesn't seem to support "-instances_per_device 2".).

I started looking into the source to see if I could built the latest (and possibly fix instances_per_device); I sent in patches to clean up the source so it would compile with GCC 6.2 (included with the newest Ubuntu) as a test of how my help would be received. My patch wasn't taken and I concluded if trivial changes like were too much bother to accept then I wasn't going to bother trying to address bigger issues.

I switched to Win10 and have been getting much better results. I didn't need to wait a month for the RAC to settle because the scripts I wrote can analyze credit/hour without waiting.

I also very apprehensive about Petri's special sauce -- if it isn't good enough for integrating into the public branch then I have serious reservations about the results it produces. If it's ever integrated into the public release I'll be happy to try it but given how long it's incubated I have nagging doubts about whether that will actually happen.

So no-thanks; I gave Linux more than a casual try and ended up buying Windows 10 licenses instead.
ID: 1834611 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1834684 - Posted: 7 Dec 2016, 20:55:02 UTC

You might want to revisit the Linux special sauce app that TBar just posted on Crunchers Anonymous site. Seems to be running quite well for him.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1834684 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1834688 - Posted: 7 Dec 2016, 21:14:01 UTC

Both the TBar & Petri brews are designed to be single task specials - using all the GPU processing power they can, in a very efficient manner that all but prevents running multiple concurrent tasks without loosing a mountain of their performance. They are CUDA applications, unlike Raistmer's OpenCL "SoG" app which needs a fair contribution from the CPU to work properly. Fomr what I've read they more than compensate for being "single minded" by dint of running several times faster than SoG does on the same GPU. Since they use so little CPU I think they should be much less dependent on the type of CPU in use.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1834688 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1834690 - Posted: 7 Dec 2016, 21:17:08 UTC

Check my computer with the dual 1070s in about a month and compare it to your Windows system. I suspect that my 2 cards under Linux, running one instance per GPU, will outperform your 3-GPU system. Nobody who runs the special sauce app on Linux runs more than 1 task at a time, because it's not necessary, the usage on Linux is more thorough with this app.

I also purchased a monster today, an AMD R9 Fury (3580 cores). They're on sale on Newegg for $259. On the AMD side, the Furys (Fiji) pretty much rule the roost. I'm going to put it in my 6700K system with the AMD 290 and see what they can do. That's almost 6000 GPU cores of computing power (~14 teraflops). I don't think they'll beat my 1070s, though...and my power bill is about to go up. Once AMD gets the issues with GCN 1 cards squared away with their drivers, I might move that system to Linux too. I'm really not a fan of Windows.
-baron_iv
Proud member of:
GPU Users Group
ID: 1834690 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1834704 - Posted: 7 Dec 2016, 22:42:45 UTC - in response to Message 1834690.  
Last modified: 7 Dec 2016, 22:43:01 UTC

Check my computer with the dual 1070s in about a month and compare it to your Windows system. I suspect that my 2 cards under Linux, running one instance per GPU, will outperform your 3-GPU system. Nobody who runs the special sauce app on Linux runs more than 1 task at a time, because it's not necessary, the usage on Linux is more thorough with this app.

I also purchased a monster today, an AMD R9 Fury (3580 cores). They're on sale on Newegg for $259. On the AMD side, the Furys (Fiji) pretty much rule the roost. I'm going to put it in my 6700K system with the AMD 290 and see what they can do. That's almost 6000 GPU cores of computing power (~14 teraflops). I don't think they'll beat my 1070s, though...and my power bill is about to go up. Once AMD gets the issues with GCN 1 cards squared away with their drivers, I might move that system to Linux too. I'm really not a fan of Windows.

I've not been impressed with the performance of the Fury cards compared to my R9 390x, but at that price I'm really tempted to grab one to play with. There is a $20 rebate on it as well. Which would basically cover the tax for me.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1834704 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 1834751 - Posted: 8 Dec 2016, 5:23:06 UTC - in response to Message 1834690.  

Nobody who runs the special sauce app on Linux runs more than 1 task at a time, because it's not necessary, the usage on Linux is more thorough with this app.

Keep in mind the CPU usage & performance don't necessarily have anything to do with the OS being used as such. It's the application. Were the application to be ported to Windows, those systems would see similar benefits.
Grant
Darwin NT
ID: 1834751 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1834786 - Posted: 8 Dec 2016, 11:11:27 UTC - in response to Message 1834704.  

Check my computer with the dual 1070s in about a month and compare it to your Windows system. I suspect that my 2 cards under Linux, running one instance per GPU, will outperform your 3-GPU system. Nobody who runs the special sauce app on Linux runs more than 1 task at a time, because it's not necessary, the usage on Linux is more thorough with this app.

I also purchased a monster today, an AMD R9 Fury (3580 cores). They're on sale on Newegg for $259. On the AMD side, the Furys (Fiji) pretty much rule the roost. I'm going to put it in my 6700K system with the AMD 290 and see what they can do. That's almost 6000 GPU cores of computing power (~14 teraflops). I don't think they'll beat my 1070s, though...and my power bill is about to go up. Once AMD gets the issues with GCN 1 cards squared away with their drivers, I might move that system to Linux too. I'm really not a fan of Windows.

I've not been impressed with the performance of the Fury cards compared to my R9 390x, but at that price I'm really tempted to grab one to play with. There is a $20 rebate on it as well. Which would basically cover the tax for me.


If you look at the "top gpus" list, the fuji GPUs are the highest performers, followed by the RX 4xx series (at less than 90% of the performance of the Fuji cards) and it's downhill from there. The 390x has over 2800 cores, iirc, so the Fury should out-perform it by a bit, but I doubt it will be a huge gain and I'd be very, very surprised if it's more than 10-12%.

The price went up to $299 (down to $279 with the rebate), which is still a pretty good deal.
-baron_iv
Proud member of:
GPU Users Group
ID: 1834786 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1834795 - Posted: 8 Dec 2016, 12:56:13 UTC - in response to Message 1834786.  

If you look at the "top gpus" list, the fuji GPUs are the highest performers, followed by the RX 4xx series (at less than 90% of the performance of the Fuji cards) and it's downhill from there. The 390x has over 2800 cores, iirc, so the Fury should out-perform it by a bit, but I doubt it will be a huge gain and I'd be very, very surprised if it's more than 10-12%.

My recent analysis puts the RX 480 (Ellesmere) at around 75%
the throughput of Fiji cards. There really were quite a disappointment.
ID: 1834795 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1834800 - Posted: 8 Dec 2016, 13:25:50 UTC - in response to Message 1834786.  

Check my computer with the dual 1070s in about a month and compare it to your Windows system. I suspect that my 2 cards under Linux, running one instance per GPU, will outperform your 3-GPU system. Nobody who runs the special sauce app on Linux runs more than 1 task at a time, because it's not necessary, the usage on Linux is more thorough with this app.

I also purchased a monster today, an AMD R9 Fury (3580 cores). They're on sale on Newegg for $259. On the AMD side, the Furys (Fiji) pretty much rule the roost. I'm going to put it in my 6700K system with the AMD 290 and see what they can do. That's almost 6000 GPU cores of computing power (~14 teraflops). I don't think they'll beat my 1070s, though...and my power bill is about to go up. Once AMD gets the issues with GCN 1 cards squared away with their drivers, I might move that system to Linux too. I'm really not a fan of Windows.

I've not been impressed with the performance of the Fury cards compared to my R9 390x, but at that price I'm really tempted to grab one to play with. There is a $20 rebate on it as well. Which would basically cover the tax for me.


If you look at the "top gpus" list, the fuji GPUs are the highest performers, followed by the RX 4xx series (at less than 90% of the performance of the Fuji cards) and it's downhill from there. The 390x has over 2800 cores, iirc, so the Fury should out-perform it by a bit, but I doubt it will be a huge gain and I'd be very, very surprised if it's more than 10-12%.

The price went up to $299 (down to $279 with the rebate), which is still a pretty good deal.

The "Top GPUs" should always be taken with a large grain of salt. The ranking isn't necessarily a true reflection of a GPUs performance. Case in point is the #1 and #2 rankings. "AMD Radeon (TM) R9 Fury Series" is the GPU description that anyone with BOINC 7.6.23+ will get for the Fury card & "Fiji" is what anyone with BOINC 7.6.22 or older will get. So since those are all the same cards why are the "Fiji" ones ~40% slower?

With the 390x's 2816 cores vs a Fury's 3584 you would think that the Fury would be at least 25% faster, but he fact is that the task run times are within a few seconds of each other.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1834800 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1834817 - Posted: 8 Dec 2016, 15:03:52 UTC

Shaggy, I've been keeping an eye on my Linux box this morning and my 1070s are completing one of the VLAR/Guppi tasks in around 4 minutes. It was taking about twice that on Win10. I can't tell exactly how many results the GPUs, alone, did yesterday, but the computer (2x1070 and 5 cores running on a 980x) did just a shade under 90k yesterday. When that computer was 18th place, I was using the computer as my main desktop, now it's free to devote all of its resources to SETI, with the exception of Squid Web Cache, which won't take that big of a toll on CPU usage. I bet that computer gets me in the top 15 this time, with only 2x1070s.

It's going to make you jealous enough to switch to Linux. :D
-baron_iv
Proud member of:
GPU Users Group
ID: 1834817 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1834827 - Posted: 8 Dec 2016, 16:00:50 UTC - in response to Message 1834817.  

It's going to make you jealous enough to switch to Linux. :D

If the results don't match stock then I'm not interested; I'm in this for science not internet points :)
ID: 1834827 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1834846 - Posted: 8 Dec 2016, 17:48:19 UTC - in response to Message 1834817.  
Last modified: 8 Dec 2016, 17:49:53 UTC

Shaggy, I've been keeping an eye on my Linux box this morning and my 1070s are completing one of the VLAR/Guppi tasks in around 4 minutes. It was taking about twice that on Win10. I can't tell exactly how many results the GPUs, alone, did yesterday, but the computer (2x1070 and 5 cores running on a 980x) did just a shade under 90k yesterday. When that computer was 18th place, I was using the computer as my main desktop, now it's free to devote all of its resources to SETI, with the exception of Squid Web Cache, which won't take that big of a toll on CPU usage. I bet that computer gets me in the top 15 this time, with only 2x1070s.

It's going to make you jealous enough to switch to Linux. :D

If you look at your app details you will see Number of tasks today . The reset for that seems to be ~18:00 UTC. For SETI@home v8 (anonymous platform, NVIDIA GPU) you have 442 for your Linux box at the moment.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1834846 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : i7 970 Power Tuning Failures


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.