To Hyperthread or not to Hyperthread

Message boards : Number crunching : To Hyperthread or not to Hyperthread
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1687875 - Posted: 4 Jun 2015, 23:07:03 UTC

I know we have been thru this before but I never really paid much attention as at the time I didn't have a chip where I could do this.

So the question, Is there any advantage to hyperthreading the chip and using all the virtual cores or is it better to turn off the hyperthread and only work with the physical cores.

Does Hyperthreading impede GPU crunching in any way?

Thanks,

Zalster
ID: 1687875 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1687880 - Posted: 4 Jun 2015, 23:16:52 UTC - in response to Message 1687875.  

I am sure there are lots of opinions and even valid reasons why one way is better than the other but from my limited experience I found if ran 8 virtual cores the tasks took twice as long as with 4 cores so it was a wash. The only advantage I could see was that when I left 1 core free to support the GPU I still had 7/8 of the CPU available for crunching with hyperthreading instead of only 3/4 of the CPU if I didn’t so I run with hyperthreading. YMMV.
ID: 1687880 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1687882 - Posted: 4 Jun 2015, 23:35:35 UTC

Zalster it will depend on the chip you use .

The newer the chip the less it effects the times .

I 5 , I 7 or above then i would use it otherwise something like a Core 2 Duo or a Pentium 4 then no i would not use it

Also the speed of the chip makes a difference also . If it about 3.2 gig or faster then i would use it , 2.3 gig or slower not so much

A simple version of what happens is that when you have HT on it cuts the speed in 1/2 and uses 1/2 the speed for 1 core and 1/2 the speed for the vertual core .

There's more to it but that's a simplistic way to understand it . All thou the latest chips are much better at this and don't slow times down as much
ID: 1687882 · Report as offensive
The Jedi Alliance - Ranger
Avatar

Send message
Joined: 27 Dec 00
Posts: 72
Credit: 60,982,863
RAC: 0
United States
Message 1687889 - Posted: 5 Jun 2015, 0:09:30 UTC

I set up hyper threading for another reason...a larger work queue. With an i7 and a GTX 970 GPU I got 200 WU in my queue. It didn't matter how fast I was burning through WUs. With hyper threading I run 3 virtual Ubuntu machines, 1 virtual Server 2008 machine and 1 virtual Win7 machine, 1 cpu on the physical i7 and the GPU. I now get 700 WU in my queue. I have noticed no BOINC performance degradation using the virtual machines on my i7 4770 running at 3.4 GHz using Hyper-V. I have also tried hyper threading on my Core 2 Quad 9550 running at 2.83 GHz, also running virtual Ubuntu but using Virtual Box. The jury is still out on those machines because they were set up just before a widespread outage and have suffered through two smaller outages since.

Regarding the GPU, with or without hyper threading, as long as I leave 1 physical cpu for the gpu work I have been able to run 6 simultaneous gpu tasks. I leave 1 physical cpu for myself.

I'm a fan of hyper threading.
ID: 1687889 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1687896 - Posted: 5 Jun 2015, 0:34:15 UTC - in response to Message 1687889.  

The Core 2 series lacked HyperThreading. Only the Pentium 4/D and Core i5/i7 series have SMT.


As to the topic at hand, I do much more on my machines that straight out SETI, so I use HyperThreading for the additional threading power.
ID: 1687896 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1687933 - Posted: 5 Jun 2015, 2:42:43 UTC

With my i3-390 & six i7-860s I run HT on & use all cores for SETI@home. As it results in more work done per day. However I don't know about the latest generation. As all of my new CPUs are sans HT. However I have been wanting to get a new i3 to run tests with.
If there turned out to be no advantage to using HT on the current gen CPUs I would leave HT on & limit BOINC to only the physical cores. Then let the HT cores run the OS & support any GPUs.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1687933 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1687934 - Posted: 5 Jun 2015, 2:45:51 UTC - in response to Message 1687933.  

I don't have HT on right now for the i7 but even with all the work units I'm doing, I don't exceed 6 cores so it leaves 2 free cores as is.

My thought was, is it worth turning it on and then running some work on the HT cores? Might not be worth it but I thought to ask before I tried it

Zalster
ID: 1687934 · Report as offensive
The Jedi Alliance - Ranger
Avatar

Send message
Joined: 27 Dec 00
Posts: 72
Credit: 60,982,863
RAC: 0
United States
Message 1687951 - Posted: 5 Jun 2015, 3:51:41 UTC

I've just analyzed the WU completed by the physical i7 and the virtual i7 machines. It appears as though the virtual machines are completing work slightly faster, although I'm not sure if this is due to the prowess of Hyper-V or if it is just the mix of WU received. Where I do see a significant benefit is in the total number of WU in my queue.

Let's say you're running an i7 with 8 cores and all 8 core are running s@h at an average of 3 hours each. The maximum number of WU you will get in your queue is 100. That is a total of 300 hours of potential work, divided by the 8 cores will be completed in 37.5 hours. Let's say that s@h goes dry or down for 2-3 days. You're running on empty in 1.5 days.

Now, let's say you're funning the same i7 but with 8 Hyper-V virtual machines, each one allocated 1 physical core. My results suggest you will still complete the WU at an average of 3 hours each, but now you will have 800 WU in your queue. Each virtual machine will have 300 hours of work, but for only 1 cpu. You will be able to withstand a 12.5 day drought.

While it is true that a 12+ day drought of WUs is rare, a 2-3 day drought is fairly common. For me using Hyper-V was just a way around the 100 WU limit.
ID: 1687951 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1688040 - Posted: 5 Jun 2015, 10:11:30 UTC
Last modified: 5 Jun 2015, 10:14:03 UTC

The Intel HT gives a BIG boost in that you can claim to have x2 the number of real physical CPU cores available (and call them virtual cores).


The physical reality is that each physical core is 'multiplexed' to alternately (simplistically) give one CPU cycle to one virtual core and then the next CPU cycle is given to the second virtual core, and so on. Hence, your virtual CPU cores each run at half speed if both are being used.

You suffer a slight slowdown in the extra overhead for the OS to manage the Intel HT (or even to badly ignore it altogether).

You gain a small boost from being able to opportunistically execute more threads if you have software that takes advantage of that.


So... My brief summary from my experience is:

If you are looking for performance, then you need to optimise for the HT just as you need to optimise for using a GPU for example.

In very brief summary, you get somewhere between a x0.9 and a x1.3 speedup in THROUGHPUT if you utilise all available threads.

For single threaded tasks, switching HT off can give you a better/faster response.


(Note that for the recent AMD CPUs, you have real physical cores for each thread and there is an FPU shared between two cores. Very good structure for general use and for server use. For s@h, you only need half the cores computing to max out the FPUs. The other cores should then be used to support GPUs or for general running. There's some good opportunity there for performance comparisons ;-) )


Happy fast crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1688040 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1688059 - Posted: 5 Jun 2015, 10:56:42 UTC

ID: 1688059 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1688060 - Posted: 5 Jun 2015, 11:00:51 UTC - in response to Message 1688059.  

ID: 1688060 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1688558 - Posted: 6 Jun 2015, 15:16:03 UTC - in response to Message 1688060.  
Last modified: 6 Jun 2015, 15:25:23 UTC

http://www.agner.org/optimize/blog/read.php?i=6

Excellent summary and two good examples, thanks.


The very brief summary for consumers is that of befuddlement by the claimed 'x2' for the Marketing numbers.

For the developers, there is a lot of hard work needed to optimise before being able to gain any useful performance boost. Whatever number of 'virtual' CPUs you might have still rely on the reality of the real-world physical CPU cores!


All good useful stuff. Shame about what I see as the Marketing corruption behind it all.

Anyone for the higher (Marketing?) GHz of Intel's Hyper Pipelined Technology and their "Netburst" Marketing?...


Happy faster crunch in,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1688558 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1688567 - Posted: 6 Jun 2015, 15:48:24 UTC - in response to Message 1688558.  

Many many thanks Martin.
Netburst was totally a nonsense...with its ridiculously double clocked (x2 core frequency) A.L.Us and lack of a fundamental piece of hardware: barrel shifter.
Faster PIIIs (e.g Tualatin) did have this integrated (barrel shifter was introduced dating back at the 80386 era).

Marco.
ID: 1688567 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1688575 - Posted: 6 Jun 2015, 16:18:27 UTC - in response to Message 1688567.  
Last modified: 6 Jun 2015, 16:25:42 UTC

... Netburst was totally a nonsense...with its ridiculously double clocked (x2 core frequency) A.L.Us and lack of a fundamental piece of hardware: barrel shifter.
Faster PIIIs (e.g Tualatin) did have this integrated (barrel shifter was introduced dating back at the 80386 era).

The barrel shifter omission will have been one of the design compromises for the higher GHz requiring deep ("Hyper!") pipelining. Barrel shift logic is quite a long logic path and so to pipeline that up 'NetBurst'-style would likely have been too much of a design compromise and cost.


We really do need three or more good sized equally matched competitors in this game. Monopolies are always overly costly...


Here's hoping AMD and ARM (or some others) can yet even up the CPU/GPU games!

Happy cool fast crunchin',
Martin


ps:

Instead of 'NetBurst'-style, perhaps we should dance to "Gangnam Style" Yeah!!! ;-)

(There's just got to be some good viral Marketing in that... :-( )
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1688575 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1688782 - Posted: 7 Jun 2015, 1:55:06 UTC - in response to Message 1687875.  

I know we have been thru this before but I never really paid much attention as at the time I didn't have a chip where I could do this.

So the question, Is there any advantage to hyperthreading the chip and using all the virtual cores or is it better to turn off the hyperthread and only work with the physical cores.

Does Hyperthreading impede GPU crunching in any way?

Thanks,

Zalster

I did a test back in 2014 running stock aps on my I7 920 Visata machine. I ran both 4 cores and then did HT on. Then ran 4 cores with my GTS 250. Then with HT on with my GTS 250. And than ran 7 cores with my GTS 250. Here is my summary at the end of the test. Running with HT on with a GPU does help RAC with stock apps anyway. I havent had the guts to do it with lunatics yet. Maybe after the WOW event I might try it. Of course your milage my vary.


Message 1565148 30 Aug. 2014
This running stock MB only test is now done. NNT is set. And I will go back to running lunatics.

I started this test with an average RAC of 5450 running luntaics doing MB and AP work.
I went to stock MB only with HT on no GPU useing 8 cores. RAc ended up at 2017 Average.

Turned off HT ran four cores no GPU average rac at end was 1,834

HT off and running GPU RAC average 5,043.
Freed one core HT off and running GPU Average RAC 5,083

HT on 8 cores and GPU running average RAC 5,326.

Freed one core and running GPU, average RAC 5,442. End of test.

I came into this test with alot of preconcieved notions on what would happen.
The only one I had right was not running a GPU will hurt RAC a lot.
For my machine the I7 920 with my GTS 250, Its debateible if freeing up a core with HT on or off makes a dfferance. Higher end cards thats something else.
[/quote]

Old James
ID: 1688782 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1689238 - Posted: 8 Jun 2015, 18:28:09 UTC - in response to Message 1687875.  
Last modified: 8 Jun 2015, 18:58:25 UTC

Based on my read @ NVidia-Maxwell FAQ, Hyperthreading ON *Activates Maxwell's Unified Memory Feature- which provides *better I/O - throughput between GPU/CPU/RAM.
=
Suggestion: Turn Hyperthreading ON in BIOS (*I turn Speedstep OFF(its in the Power area) and TurboBoost ON too)
=
But still set 8 Core config at BOINC and SETI Computer Prefs

et Voila!

8 Free Cores(Threads) for your GPUs to Feast on(*without "Freeing a Core" the other way)
=
Just sayin.... ;)

ID: 1689238 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1689271 - Posted: 8 Jun 2015, 21:13:25 UTC

It is like cutting a cake into more and more pieces and expecting the sum of the pieces to more than one cake.

Under very special circumstances, there might me a marginal improvement when using hyper-threading, but in general, it does not make things better.
ID: 1689271 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1689274 - Posted: 8 Jun 2015, 21:18:26 UTC - in response to Message 1689271.  

Yeah, I'm not planning on using the Hyperthread. It's doing just fine without it

Zalster
ID: 1689274 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1689367 - Posted: 9 Jun 2015, 3:30:39 UTC - in response to Message 1689274.  

PS - *Could be doing AVX 2.0 and Intel openCL 2.0 APs
with that bad boy i7-5960X CPU

ID: 1689367 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1689369 - Posted: 9 Jun 2015, 3:41:52 UTC - in response to Message 1689274.  

Yeah, I'm not planning on using the Hyperthread. It's doing just fine without it

Zalster

Another consideration is power usage when using HT. My i7-860 showed 11.1% increase in power consumption & 27.7% increase in work output when running HT on & 8 tasks.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1689369 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : To Hyperthread or not to Hyperthread


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.