Hyper Threading question


log in

Advanced search

Message boards : Number crunching : Hyper Threading question

Author Message
Treasurer
Send message
Joined: 13 Dec 05
Posts: 109
Credit: 1,564,514
RAC: 0
Germany
Message 1128617 - Posted: 17 Jul 2011, 6:05:46 UTC
Last modified: 17 Jul 2011, 6:08:31 UTC

Sorry if the question sounds stupid, but the last time i had an Intel CPU it was an Pentium III. Yesterday a friend of mine, for whom i set up a new computer, gave me his old one for free. Its a Pentium 4 with HT, one of the later P4s i guess. With HT on BOINC shows 2 cores with 2500 MIPS each. With HT off it shows 1 core with 2500 MIPS.
Now the stupid question: Is this how HT works one more virtual core out of nowhere? I always thought HT "splits" the core in 2 virtual cores with half the performance of the actual core.
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2357
Credit: 8,948,968
RAC: 3,924
United States
Message 1128620 - Posted: 17 Jul 2011, 6:28:45 UTC

I was using a P4 with HT for crunching a few years ago. I believe it is similar to the results for the i7's and similar. With HT enabled, there will be a slow-down, but it somehow ends up not being half the speed per core. You end up doing more work per 24 hours with HT enabled than with it disabled. But not by a large margin.

Do some trial-runs and see what happens.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5948
Credit: 62,428,061
RAC: 39,264
Australia
Message 1128626 - Posted: 17 Jul 2011, 6:49:35 UTC - in response to Message 1128617.

I always thought HT "splits" the core in 2 virtual cores with half the performance of the actual core.

It depends on the application. HT can give a sinlge physical core almost the performance of 2 physical cores, sometimes it's as low as 1.1 cores. Generally it's around 1.3-1.6 cores.
The fact is the P4 was a failure as architectures go, HT was the only thing that allowed it to (almost) keep up with AMD's Athlons.
____________
Grant
Darwin NT.

Profile BMH
Avatar
Send message
Joined: 27 May 99
Posts: 333
Credit: 96,056,105
RAC: 24,320
United Kingdom
Message 1128673 - Posted: 17 Jul 2011, 10:04:29 UTC

"Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task." http://en.wikipedia.org/wiki/Hyper-threading

Performance-wise, I think Hyper Threading just adds a bit of head-room above a single-core processor, but it's not the same as a CPU with two physical cores.

Even some modern CPUs have something similar. Take the new Intel Core i7 2600K for example, it has four physical cores but can handle eight threads.
____________
Brian.

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 678
Credit: 12,771,936
RAC: 1,983
New Zealand
Message 1128677 - Posted: 17 Jul 2011, 10:20:14 UTC

Hyper-threading works by putting 2 threads though one physical core. How does this help? In normal processing the CPU often has idle cycles where it is waiting for something else to happen. Memory / disk / network etc

By running 2 threads you make use of a lot of those "idle" cycles.

Benchmarks can be fooled by this, and tests the core at the same base speed, HT on or not. But back in the real world, your machine may do a work unit in 6 hours without HT, but with HT you can do 2 in 8 hours. 4 per day, or 6 per day.

As the others have noted, some of the new I series CPUs use a similar system, but with multiple cores, each one doing 2 threads.

Ian

Profile BMH
Avatar
Send message
Joined: 27 May 99
Posts: 333
Credit: 96,056,105
RAC: 24,320
United Kingdom
Message 1128689 - Posted: 17 Jul 2011, 11:24:57 UTC

I have a P4-based system with HT enabled. Because the system has a GPU crunching I specify in BOINC to only crunch on half the CPU (as I do with all my dual-core CPUs). However if this is half the P4's total performance, this is not going to be the same as using all of the physical core and leaving the hyper threading bit to supply the GPU.

I see with my Core i5 (which has 2 physical cores but 4 threads) that if I tell BOINC to only cruch on 50% of the CPU, it crunches at 25% on each of the 4 'processors' according to the Windows Task Manager. I wonder if this would be as efficient as being able to specify that BOINC can use only the two physical cores and to leave the two remaining virtual cores free, or indeed if this is already possible?
____________
Brian.

archae86
Send message
Joined: 31 Aug 99
Posts: 889
Credit: 1,572,794
RAC: 3
United States
Message 1128698 - Posted: 17 Jul 2011, 12:18:16 UTC

My primary host has Nehalem generation hyperthreading (mine is specifically a Westmere). In that case the performance of each virtual core depends heavily on what other tasks are running.

If nothing much else is going on, then a BOINC task (I run Einstein mostly, and my careful measurements were there) running on a single virtual core in hyper threaded state has very closely the same performance as the same BOINC task running on a single physical core with hyperthreading turned off. This differs from my recollection of behavior on a Gallatin--the Northwood derivative with large cache which was my sole personal exposure to Willamette-generation hyperthreading. My recollection is that I believed that just having hyperthreading turned on gravely lowered the single-core performance, even if all other virtual cores were idle. But I can't recheck, as my Gallatin is long since scrapped.

On the other hand, as is usually the case for me, the per core performance when all eight cores are busy with BOINC tasks is considerably lower, though the total system throughput is nevertheless appreciably higher.

A couple of months after I got my Westmere system up and running, I started this thread over at Einstein, which has quite a lot of data on this topic, not only my own but from others.
____________

Treasurer
Send message
Joined: 13 Dec 05
Posts: 109
Credit: 1,564,514
RAC: 0
Germany
Message 1128704 - Posted: 17 Jul 2011, 12:57:39 UTC - in response to Message 1128620.



Do some trial-runs and see what happens.


CPU-Z indicates this CPU capable of SSE 1,2 and 3. The lunatics installer offers SSE3 and SSSE3 apps. Which one is better?
____________

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3761
Credit: 21,477,467
RAC: 14,959
Sweden
Message 1128709 - Posted: 17 Jul 2011, 13:10:36 UTC - in response to Message 1128704.



Do some trial-runs and see what happens.


CPU-Z indicates this CPU capable of SSE 1,2 and 3. The lunatics installer offers SSE3 and SSSE3 apps. Which one is better?


Notice the extra S in SSSE3. Your CPU is not capable of SSSE3, only SSE3 then according to CPU-Z.
____________

Profile BMH
Avatar
Send message
Joined: 27 May 99
Posts: 333
Credit: 96,056,105
RAC: 24,320
United Kingdom
Message 1128712 - Posted: 17 Jul 2011, 13:20:52 UTC - in response to Message 1128698.

A couple of months after I got my Westmere system up and running, I started this thread over at Einstein, which has quite a lot of data on this topic, not only my own but from others.

Thanks for the link archae86, I found that an interesting read.

CPU-Z indicates this CPU capable of SSE 1,2 and 3. The lunatics installer offers SSE3 and SSSE3 apps. Which one is better?

As Sten-Arne said, you have to go with SSE3. SSSE3 is a newer instruction set, I presume that means better but I guess it just allows for specific tasks that make use of that instruction set when the CPU supports it.

http://en.wikipedia.org/wiki/SSE3
http://en.wikipedia.org/wiki/SSSE3
____________
Brian.

Dave Mickey
Send message
Joined: 19 Oct 99
Posts: 178
Credit: 10,817,373
RAC: 1,084
United States
Message 1128738 - Posted: 17 Jul 2011, 14:42:42 UTC - in response to Message 1128712.

Just because sometimes I feel like tinkering so that I can see for myself:

If I were to experiment with my p4 3.0 machine to run without HT (so that I could observe the effect), does that require disabling HT somewhere in the BIOS?

And if I do that, will that make boinc think that it is a different/new hostID instead of the hostID that it is now with HT?

Presuming that it does make a new hostID, will the merge hosts feature put it back so that all the history will be under one host ID?

(I'm actually considering doing this under einstein, BTW...)

thx


____________
Dave

hbomber
Volunteer tester
Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1128811 - Posted: 17 Jul 2011, 16:25:07 UTC
Last modified: 17 Jul 2011, 16:27:40 UTC

To gain maximum performace out of HT with Nehalem, the key is to have very fast memory subsystem(and in particular low latency) or at least, to increase UnCore frequency(3600 MHz is achievable wit realtively low VTT voltage). Because of high memory bandwidth and relatively low latency, compared to earlier Intel chips, Nehalems were first chips to benefit a much from HT.
These benefits from using HT on my i7-920@4.2 GHz with memory 1600 MHz 7-7-7 and UnCore at 4000 MHz(this is key part as I wrote) can reach 50% total system tasks output.
Btw, HT can be tricky when trying to get maximum performance from GPUs, but thats another story.
____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4664
Credit: 123,646,922
RAC: 97,540
United States
Message 1128869 - Posted: 17 Jul 2011, 18:58:40 UTC - in response to Message 1128738.

With HT on and off you may get slightly different benchmarks in BOINC. Also the version of BOINC will give you different benchmarks. I did some tests with 6.10.58 a while back to answer a question someone had.
Back when the HT processors first came out in the classic Seti@Home days HT was found to be give you a benefit of 25-30% depending on the work received. Since the one you have is a newer generation Prescott there are some enhancements to HT they made that are should to make it a greater benefit.
As hbomber mentioned memory speed has been found to help HT performance, at least on the newer Core ix processors. So depending on the motherboard you have you could try pushing the memory and FSB to their limits.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Message boards : Number crunching : Hyper Threading question

Copyright © 2014 University of California