Hyper Threading question

Message boards : Number crunching : Hyper Threading question
Message board moderation

To post messages, you must log in.

AuthorMessage
Treasurer

Send message
Joined: 13 Dec 05
Posts: 109
Credit: 1,569,762
RAC: 0
Germany
Message 1128617 - Posted: 17 Jul 2011, 6:05:46 UTC
Last modified: 17 Jul 2011, 6:08:31 UTC

Sorry if the question sounds stupid, but the last time i had an Intel CPU it was an Pentium III. Yesterday a friend of mine, for whom i set up a new computer, gave me his old one for free. Its a Pentium 4 with HT, one of the later P4s i guess. With HT on BOINC shows 2 cores with 2500 MIPS each. With HT off it shows 1 core with 2500 MIPS.
Now the stupid question: Is this how HT works one more virtual core out of nowhere? I always thought HT "splits" the core in 2 virtual cores with half the performance of the actual core.
ID: 1128617 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1128620 - Posted: 17 Jul 2011, 6:28:45 UTC

I was using a P4 with HT for crunching a few years ago. I believe it is similar to the results for the i7's and similar. With HT enabled, there will be a slow-down, but it somehow ends up not being half the speed per core. You end up doing more work per 24 hours with HT enabled than with it disabled. But not by a large margin.

Do some trial-runs and see what happens.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1128620 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1128626 - Posted: 17 Jul 2011, 6:49:35 UTC - in response to Message 1128617.  

I always thought HT "splits" the core in 2 virtual cores with half the performance of the actual core.

It depends on the application. HT can give a sinlge physical core almost the performance of 2 physical cores, sometimes it's as low as 1.1 cores. Generally it's around 1.3-1.6 cores.
The fact is the P4 was a failure as architectures go, HT was the only thing that allowed it to (almost) keep up with AMD's Athlons.
Grant
Darwin NT
ID: 1128626 · Report as offensive
Profile BMH
Avatar

Send message
Joined: 27 May 99
Posts: 419
Credit: 166,294,083
RAC: 125
United Kingdom
Message 1128673 - Posted: 17 Jul 2011, 10:04:29 UTC

"Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task." http://en.wikipedia.org/wiki/Hyper-threading

Performance-wise, I think Hyper Threading just adds a bit of head-room above a single-core processor, but it's not the same as a CPU with two physical cores.

Even some modern CPUs have something similar. Take the new Intel Core i7 2600K for example, it has four physical cores but can handle eight threads.
Brian.
ID: 1128673 · Report as offensive
Ianab
Volunteer tester

Send message
Joined: 11 Jun 08
Posts: 732
Credit: 20,635,586
RAC: 5
New Zealand
Message 1128677 - Posted: 17 Jul 2011, 10:20:14 UTC

Hyper-threading works by putting 2 threads though one physical core. How does this help? In normal processing the CPU often has idle cycles where it is waiting for something else to happen. Memory / disk / network etc

By running 2 threads you make use of a lot of those "idle" cycles.

Benchmarks can be fooled by this, and tests the core at the same base speed, HT on or not. But back in the real world, your machine may do a work unit in 6 hours without HT, but with HT you can do 2 in 8 hours. 4 per day, or 6 per day.

As the others have noted, some of the new I series CPUs use a similar system, but with multiple cores, each one doing 2 threads.

Ian
ID: 1128677 · Report as offensive
Profile BMH
Avatar

Send message
Joined: 27 May 99
Posts: 419
Credit: 166,294,083
RAC: 125
United Kingdom
Message 1128689 - Posted: 17 Jul 2011, 11:24:57 UTC

I have a P4-based system with HT enabled. Because the system has a GPU crunching I specify in BOINC to only crunch on half the CPU (as I do with all my dual-core CPUs). However if this is half the P4's total performance, this is not going to be the same as using all of the physical core and leaving the hyper threading bit to supply the GPU.

I see with my Core i5 (which has 2 physical cores but 4 threads) that if I tell BOINC to only cruch on 50% of the CPU, it crunches at 25% on each of the 4 'processors' according to the Windows Task Manager. I wonder if this would be as efficient as being able to specify that BOINC can use only the two physical cores and to leave the two remaining virtual cores free, or indeed if this is already possible?
Brian.
ID: 1128689 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 1128698 - Posted: 17 Jul 2011, 12:18:16 UTC

My primary host has Nehalem generation hyperthreading (mine is specifically a Westmere). In that case the performance of each virtual core depends heavily on what other tasks are running.

If nothing much else is going on, then a BOINC task (I run Einstein mostly, and my careful measurements were there) running on a single virtual core in hyper threaded state has very closely the same performance as the same BOINC task running on a single physical core with hyperthreading turned off. This differs from my recollection of behavior on a Gallatin--the Northwood derivative with large cache which was my sole personal exposure to Willamette-generation hyperthreading. My recollection is that I believed that just having hyperthreading turned on gravely lowered the single-core performance, even if all other virtual cores were idle. But I can't recheck, as my Gallatin is long since scrapped.

On the other hand, as is usually the case for me, the per core performance when all eight cores are busy with BOINC tasks is considerably lower, though the total system throughput is nevertheless appreciably higher.

A couple of months after I got my Westmere system up and running, I started this thread over at Einstein, which has quite a lot of data on this topic, not only my own but from others.
ID: 1128698 · Report as offensive
Treasurer

Send message
Joined: 13 Dec 05
Posts: 109
Credit: 1,569,762
RAC: 0
Germany
Message 1128704 - Posted: 17 Jul 2011, 12:57:39 UTC - in response to Message 1128620.  



Do some trial-runs and see what happens.


CPU-Z indicates this CPU capable of SSE 1,2 and 3. The lunatics installer offers SSE3 and SSSE3 apps. Which one is better?
ID: 1128704 · Report as offensive
Profile BMH
Avatar

Send message
Joined: 27 May 99
Posts: 419
Credit: 166,294,083
RAC: 125
United Kingdom
Message 1128712 - Posted: 17 Jul 2011, 13:20:52 UTC - in response to Message 1128698.  

A couple of months after I got my Westmere system up and running, I started this thread over at Einstein, which has quite a lot of data on this topic, not only my own but from others.

Thanks for the link archae86, I found that an interesting read.

CPU-Z indicates this CPU capable of SSE 1,2 and 3. The lunatics installer offers SSE3 and SSSE3 apps. Which one is better?

As Sten-Arne said, you have to go with SSE3. SSSE3 is a newer instruction set, I presume that means better but I guess it just allows for specific tasks that make use of that instruction set when the CPU supports it.

http://en.wikipedia.org/wiki/SSE3
http://en.wikipedia.org/wiki/SSSE3
Brian.
ID: 1128712 · Report as offensive
Dave Mickey

Send message
Joined: 19 Oct 99
Posts: 178
Credit: 11,122,965
RAC: 0
United States
Message 1128738 - Posted: 17 Jul 2011, 14:42:42 UTC - in response to Message 1128712.  

Just because sometimes I feel like tinkering so that I can see for myself:

If I were to experiment with my p4 3.0 machine to run without HT (so that I could observe the effect), does that require disabling HT somewhere in the BIOS?

And if I do that, will that make boinc think that it is a different/new hostID instead of the hostID that it is now with HT?

Presuming that it does make a new hostID, will the merge hosts feature put it back so that all the history will be under one host ID?

(I'm actually considering doing this under einstein, BTW...)

thx


Dave

ID: 1128738 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1128796 - Posted: 17 Jul 2011, 16:10:10 UTC - in response to Message 1128738.  

Just because sometimes I feel like tinkering so that I can see for myself:

If I were to experiment with my p4 3.0 machine to run without HT (so that I could observe the effect), does that require disabling HT somewhere in the BIOS?

And if I do that, will that make boinc think that it is a different/new hostID instead of the hostID that it is now with HT?

Presuming that it does make a new hostID, will the merge hosts feature put it back so that all the history will be under one host ID?

(I'm actually considering doing this under einstein, BTW...)

thx


On my Asus mobos, HT capability is turned on and off in the bios.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1128796 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1128811 - Posted: 17 Jul 2011, 16:25:07 UTC
Last modified: 17 Jul 2011, 16:27:40 UTC

To gain maximum performace out of HT with Nehalem, the key is to have very fast memory subsystem(and in particular low latency) or at least, to increase UnCore frequency(3600 MHz is achievable wit realtively low VTT voltage). Because of high memory bandwidth and relatively low latency, compared to earlier Intel chips, Nehalems were first chips to benefit a much from HT.
These benefits from using HT on my i7-920@4.2 GHz with memory 1600 MHz 7-7-7 and UnCore at 4000 MHz(this is key part as I wrote) can reach 50% total system tasks output.
Btw, HT can be tricky when trying to get maximum performance from GPUs, but thats another story.
ID: 1128811 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1128869 - Posted: 17 Jul 2011, 18:58:40 UTC - in response to Message 1128738.  

With HT on and off you may get slightly different benchmarks in BOINC. Also the version of BOINC will give you different benchmarks. I did some tests with 6.10.58 a while back to answer a question someone had.
Back when the HT processors first came out in the classic Seti@Home days HT was found to be give you a benefit of 25-30% depending on the work received. Since the one you have is a newer generation Prescott there are some enhancements to HT they made that are should to make it a greater benefit.
As hbomber mentioned memory speed has been found to help HT performance, at least on the newer Core ix processors. So depending on the motherboard you have you could try pushing the memory and FSB to their limits.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1128869 · Report as offensive

Message boards : Number crunching : Hyper Threading question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.