Intel P4 Hyperthreading

Message boards : Number crunching : Intel P4 Hyperthreading
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Fragmire

Send message
Joined: 3 Apr 99
Posts: 4
Credit: 2,645
RAC: 0
United States
Message 1653 - Posted: 25 Jun 2004, 3:16:58 UTC

BOINC doesn't seem to be hyperthreading aware. My computer's benchmark according to BOINC is halved in the integer performance category when hyperthreading is turned on. Also, the seti@home client only utilizes 50% of my CPU at any time. Is this by design?
ID: 1653 · Report as offensive
Profile Michael Brennecke

Send message
Joined: 2 Apr 04
Posts: 33
Credit: 205,887
RAC: 0
United States
Message 1661 - Posted: 25 Jun 2004, 3:22:15 UTC

My 2.8 P4-HT's do the same but they use 50% for each instance of S@H as they are working on 2 WU's at a time so overall 100% is used.

My non-HT 2.4 P4 does them faster but only one at a time.

I haven't really looked at the benchmarks
ID: 1661 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 1696 - Posted: 25 Jun 2004, 4:24:45 UTC - in response to Message 1653.  

> BOINC doesn't seem to be hyperthreading aware. My computer's benchmark
> according to BOINC is halved in the integer performance category when
> hyperthreading is turned on. Also, the seti@home client only utilizes 50% of
> my CPU at any time. Is this by design?
>
You are ABSOLUTELY corect, 50% of the cpu is being used. But since Boinc is not currently setup to work on HT machines it is using 1/2 of your total cpu's 100% of the time.
HT is NOT the same as dual cpu's! It is one cpu acting like 2, Boinc is telling you it found both cpu's but is only using 1 of them, hence the 50%.
HT awareness should be along in the future, more and more computers are doing this so it will come!

ID: 1696 · Report as offensive
Profile Kevin Erickson

Send message
Joined: 16 Oct 99
Posts: 31
Credit: 52,969
RAC: 0
United States
Message 1698 - Posted: 25 Jun 2004, 4:25:14 UTC

My 2.6 P$ HT uses both "processors at the same time so realy on 50% on each WU . So it does two at a time. ut you are right it could use more of the HT technology to do the calculations faster.
ID: 1698 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 1847 - Posted: 25 Jun 2004, 12:06:33 UTC

Guys,

What you see is what you get. You are seeing two work units being processed and both of the "logical" cpus is running full out. I have 3 each 2.8 GHz non-ht machines and they do a WU in about 3 hours.

My two HT machines do two WU in about the same time. So, HT is there, and is being fully exploited. What we are not seeing yet is the effect of optimized code. But when the dust settles I am sure that we will find a site where versions of the software that have been optimized for individual processors will soon appear ...

Then, and only then, will we likely see an improvement in processing speed.
<p>


<p>
ID: 1847 · Report as offensive
Profile Legacy
Avatar

Send message
Joined: 10 Dec 99
Posts: 134
Credit: 1,778,571
RAC: 0
Singapore
Message 1868 - Posted: 25 Jun 2004, 12:50:29 UTC
Last modified: 25 Jun 2004, 12:52:52 UTC

Please do the benchmark again and READ carefully this time. It says
Benchmark results:
Number of CPUs: 2
1766 double precision MIPS (Whetstone) per CPU
2504 integer MIPS (Dhrystone) per CPU
Notice the words PER CPU
If you have 2 CPUs multiple the results by 2 to get the total.
Simple Maths.
ID: 1868 · Report as offensive
Profile Jean-David Beyer

Send message
Joined: 10 Jun 99
Posts: 60
Credit: 1,301,105
RAC: 1
United States
Message 1894 - Posted: 25 Jun 2004, 13:59:50 UTC - in response to Message 1698.  

> My 2.6 P$ HT uses both "processors at the same time so realy on 50% on each WU
> . So it does two at a time. ut you are right it could use more of the HT
> technology to do the calculations faster.
>
I do not understand this observation. I have two Hyperthreaded processors; i.e., the OS (Linux) acts as though there were four processors in this machine. I told BOINC to use up to 4 processors and it is doing that: BOINC started four instances of setiathome and they are all running along flat out as shown below. Note that %cpu refers to % of one cpu, but this machine has "four".

USER PRI NI SIZE RSS SHARE %CPU %MEM CTIME COMMAND

boinc 39 19 15300 14M 1308 98.2 0.3 110:43 setiathome_3.08_i686-
boinc 39 19 15808 15M 1300 98.2 0.3 44:32 setiathome_3.08_i686-
boinc 39 19 16368 15M 1300 98.2 0.3 43:39 setiathome_3.08_i686-
boinc 39 19 16320 15M 1300 95.6 0.3 238:37 setiathome_3.08_i686-

I do not see how this machine would do setiathome faster were the code re-written for hyperthreading. Just run two instances of setiathome (one instance of BOINC is enough) and let the OS take care of it.
ID: 1894 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1903 - Posted: 25 Jun 2004, 14:13:45 UTC

Windows (in their infinite wisdom) reports that on a dual processor machine each processor has a maximum of 50%. I have a couple of dual processors, and they crunch 2WUs in the same that an otherwise identical single processor machine processes 1 WU. However, M$oft reports each of the S@H tasks as taking 50% on the duals, and 100% on the single. I believe this is more of an OS display issue than a BOINC issue.
ID: 1903 · Report as offensive
Profile Jean-David Beyer

Send message
Joined: 10 Jun 99
Posts: 60
Credit: 1,301,105
RAC: 1
United States
Message 2635 - Posted: 29 Jun 2004, 13:24:15 UTC - in response to Message 1903.  

> Windows (in their infinite wisdom) reports that on a dual processor machine
> each processor has a maximum of 50%. I have a couple of dual processors, and
> they crunch 2WUs in the same that an otherwise identical single processor
> machine processes 1 WU. However, M$oft reports each of the S@H tasks as
> taking 50% on the duals, and 100% on the single. I believe this is more of an
> OS display issue than a BOINC issue.
> <a> href="http://www.boinc.dk/index.php?page=user_statistics&userid=9915">
>
I did not write down the numbers, but I tried running 1, 2, 3, and 4 instances of setiathome on this two hyperthreaded processor machine; i.e., can run four threads simultaneously.

IIRC, one work unit took about 3 hours 20 minutes.
When I tried two work units at once, the time was about the same. I forget what happened at three work units, but when I tried four, they took about 4 hours each. I infer that a hyperthreaded processor is more than a single thread processor, but less than two. At least as Linux runs them.

Not that this experiment is not perfect, since it assumes all work units take the same amount of time, and they do not.
ID: 2635 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 2644 - Posted: 29 Jun 2004, 13:36:34 UTC - in response to Message 2635.  

> I did not write down the numbers, but I tried running 1, 2, 3, and 4 instances
> of setiathome on this two hyperthreaded processor machine; i.e., can run four
> threads simultaneously.
>
> IIRC, one work unit took about 3 hours 20 minutes.
> When I tried two work units at once, the time was about the same. I forget
> what happened at three work units, but when I tried four, they took about 4
> hours each. I infer that a hyperthreaded processor is more than a single
> thread processor, but less than two. At least as Linux runs them.
>
> Not that this experiment is not perfect, since it assumes all work units take
> the same amount of time, and they do not.
>
An HT processor has several different areas of the processor. For example, Logic, Integer math, and Floating Point Math all happen in separate areas of the chip. The idea is to allow a second process to use an area of the chip that the first process is not using. However, S@H is very heavily Floating point, so the Floating Point area will be very heavily used, and be the bottleneck for getting work done. If you had two processes, one that was all integer, and the other that was all floating point, they would collapse together better.

ID: 2644 · Report as offensive
Profile Thierry Van Driessche
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3083
Credit: 150,096
RAC: 0
Belgium
Message 2664 - Posted: 29 Jun 2004, 14:21:02 UTC - in response to Message 1868.  
Last modified: 20 Jul 2004, 13:33:33 UTC

> Please do the benchmark again and READ carefully this time. It says
> Benchmark results:
> Number of CPUs: 2
> 1766 double precision MIPS (Whetstone) per CPU
> 2504 integer MIPS (Dhrystone) per CPU
> Notice the words PER CPU
> If you have 2 CPUs multiple the results by 2 to get the total.
> Simple Maths.

Huuum,

I do not agree fully with this. Looking to the results I got using the latest software, these are the numbers:
HT enabled
--- - 2004-06-25 12:28:35 - 1585 double precision MIPS (Whetstone) per CPU
--- - 2004-06-25 12:28:35 - 1876 integer MIPS (Dhrystone) per CPU
HT disabled
--- - 2004-06-26 10:37:59 - 1851 double precision MIPS (Whetstone) per CPU
--- - 2004-06-26 10:37:59 - 4001 integer MIPS (Dhrystone) per CPU
This means the numbers are not exactly the double.

There is another point. CPU time. When I used an earlier version of the software, I found a factor of 1.3 to 1.4 concerning CPU time using or not HT. When using HT the CPU time was 1.36 time longer per WU then if HT was disabled.
I did not try to find out with the actual software what is the CPU time difference using HT or not.


Greetings from Belgium.
ID: 2664 · Report as offensive
Profile enusbaum
Volunteer tester

Send message
Joined: 29 Apr 00
Posts: 15
Credit: 5,921,750
RAC: 0
United States
Message 2711 - Posted: 29 Jun 2004, 16:15:12 UTC

The point I think people are missing is that a HyperThreaded CPU is not TWO physical CPUs. Your maximum processing speed is the speed of the processor. There are a lot of cache control optimizations on the chip itself, so thats why you see a slight performance increase when you enable HT, but, for the most part, you only have ONE CPU.

So when you run two WUs on a 3Ghz HT CPU, each WU isnt getting 3Ghz. Each WU is (roughly) getting 1.5Ghz (with cache control, it might be 1.6, but this is just for example). Don't think that when you enable HT your computer gives each WU 3Ghz, thus becoming a 6Ghz processor.. because that just isnt how it works.

This is the reason why your benchmarks are for the most part HALFED and why work units almost take twice as long on an HT CPU.

It's not code optimizations or anything of that nature which is causing BOINC to run 'slow' when HT is enabled, you just have two threads running that are both using 50% of your ONE CPU (each 50% being a 'Virtual CPU').
ID: 2711 · Report as offensive
delete me

Send message
Joined: 3 Apr 99
Posts: 2
Credit: 67
RAC: 0
Switzerland
Message 2724 - Posted: 29 Jun 2004, 16:38:54 UTC

I did a test when the HT chips first came out about 2 years ago... I had a dell server with 2 x 1.4Ghz XeonMP cpus (and 4Gb RAM). To measure the performace, I used the Seti classic cli.

With HT turned off, each processor took about 4 hours per WU per cpu, so in a 12 hour period, it would complete (2 cpus x (12/4) =) 6 work units.

With HT turned on, I assigned a seti-cli to each virtual cpu. Each WU was processed in roughly 6 hours per vCPU. So every 6 hours, 4 WU's completed, giving a total of (4 cpus x (12/6) = ) 8 WU's per 12 hour period.

This is roughly a 33% increase in performance, which leads me to believe that HT doesnt just slip the processor in two, but obviously makes better use of the whole CPU allowing the HT to increase the overall effeciency...

In this case, the sum of the parts is much greater than the whole...

ID: 2724 · Report as offensive
Profile Thierry Van Driessche
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3083
Credit: 150,096
RAC: 0
Belgium
Message 2745 - Posted: 29 Jun 2004, 17:18:47 UTC - in response to Message 2724.  
Last modified: 20 Jul 2004, 13:17:06 UTC

> This is roughly a 33% increase in performance, which leads me to believe that
> HT doesnt just slip the processor in two, but obviously makes better use of
> the whole CPU allowing the HT to increase the overall effeciency...

The number of 33% you found corresponds to what Intel roughly claims as a gain by using HT. They announce an increase of around 30%.

Greetings from Belgium.
ID: 2745 · Report as offensive
ric
Volunteer tester
Avatar

Send message
Joined: 16 Jun 03
Posts: 482
Credit: 666,047
RAC: 0
Switzerland
Message 2812 - Posted: 30 Jun 2004, 14:24:27 UTC - in response to Message 2644.  

In my cases, 3 x 3.06 MHz, each 512 ddr400 memory, no oclock


Seti classic, Version 3.03

each makes 14-15(sometimes 16) WUs per day HT on, 2 instances running

I started first HT on, 1 x Instance running, 11-12 WUs per day,
so I started with 2 Instances per HT CPU and it runs fine.

The daily troughtput for the HT CPUs went higher

Some month ago, I made a test on a amd 2400 (running at 2016 MHz)
normaly a WU needs 3h09 (+/-) to complete.

I started 10 Instances of seti classic and went to sleep.

each instance with "stop_after_send.txt"

It took a long time to complete.
the CPU time reported by setiQ, yes the needed CPU Time was nearly the same
as normal, but the "effective" duration took much longer.
The daily troughtput for the amd 2400 decreased


I prefere one WU after the other (2 on HT) so I'am able to return it soon.

The target of the test was to see, if the system keep stable, that it did

ID: 2812 · Report as offensive
Profile ahleong
Volunteer tester

Send message
Joined: 6 Apr 03
Posts: 16
Credit: 25,615
RAC: 0
Hong Kong
Message 2872 - Posted: 30 Jun 2004, 16:43:16 UTC

3.06 MHz
MHz?
ID: 2872 · Report as offensive
Profile SwissNic
Avatar

Send message
Joined: 27 Nov 99
Posts: 78
Credit: 633,713
RAC: 0
United Kingdom
Message 2890 - Posted: 30 Jun 2004, 17:17:31 UTC - in response to Message 2872.  

> 3.06 MHz
> MHz?
>
>

Yeah - it was the first release of a P4 with HT enabled.
------------------------------------------------
Once you have ruled out the impossible, everything else, however improbable, is possible!
A.C. Doyle.
ID: 2890 · Report as offensive
Profile Jean-David Beyer

Send message
Joined: 10 Jun 99
Posts: 60
Credit: 1,301,105
RAC: 1
United States
Message 5229 - Posted: 8 Jul 2004, 10:22:38 UTC - in response to Message 2812.  

> In my cases, 3 x 3.06 MHz, each 512 ddr400 memory, no oclock
>
>
> Seti classic, Version 3.03
>
> each makes 14-15(sometimes 16) WUs per day HT on, 2 instances running
>
> I started first HT on, 1 x Instance running, 11-12 WUs per day,
> so I started with 2 Instances per HT CPU and it runs fine.
>
> The daily troughtput for the HT CPUs went higher
>
Running classic setiathome on Red Hat Enterprise Linux 3 ES, with two 3.06GHz XEON processors with 1 Megabyte L3 cache, running one instance, I got a little over 7 work units per day. With two instances I got around 15 work units per day. I forget what I got with three instances, but it was less than 22 work units per day.

With four instances of setiathome, I get 24 work units completed per day.

Since my OS thinks I have 4 processors (actually two hyperthreaded processors), it did not make sense to try more instances, but I would expect the law of diminishing returns to remain in force and to give even less benefit. I am a bit afraid that the work units per day would actually decrease before 24 work units per day with 5 instances, but I never made the test.

> I prefere one WU after the other (2 on HT) so I'am able to return it soon.

Since I doubt you meant you had 3 MHz processor (more likely 3 GHz processor), I doubt the ability to return work units soon is really important. I seem to recall that if you return results within a week it is good enough. You may even have more time than that if you are content to provide confirmation of results and not new results. But they can use confirmation too.

I set up my cache for up to 7 days work, since at the moment I frequently run out of work (no doubt because the server is not delivering enough work for me to do). When the servers are working better, perhaps I will reduce the cache to 5 days.
>
> The target of the test was to see, if the system keep stable, that it did
>
I do not see why the system would not be stable unless you are running Windows or something.
ID: 5229 · Report as offensive
Guido_A_Waldenmeier_

Send message
Joined: 3 Apr 99
Posts: 482
Credit: 4,774
RAC: 0
Liechtenstein
Message 5231 - Posted: 8 Jul 2004, 10:29:21 UTC

[/url] [/url]
Bei der Eroberung des Weltraums sind zwei Probleme zu lösen:Die Schwerkraft und der Papierkrieg. Mit der Schwerkraft wären wir fertig geworden.Wernher von Braun
ID: 5231 · Report as offensive
ric
Volunteer tester
Avatar

Send message
Joined: 16 Jun 03
Posts: 482
Credit: 666,047
RAC: 0
Switzerland
Message 5285 - Posted: 8 Jul 2004, 14:43:07 UTC - in response to Message 5229.  

> > In my cases, 3 x 3.06 MHz, each 512 ddr400 memory, no oclock
> >Yeah just 2 "mistakes" GHz and 3.00 GHz (800er frontside bus)


> Running classic setiathome on Red Hat Enterprise Linux 3 ES, with two 3.06GHz
> XEON processors with 1 Megabyte L3 cache,
the higher cache the better

==> but you do have 2 Real CPUs(2x2=4 virtual cpus), HT ist 2 Virtual CPUs

> processors), it did not make sense to try more instances, but I would expect

A lot of things do not make sense but spends a lot of pleasure


Caches: I use in the Profile the "Subprofiles" Home, Work and & Scholl.

For the faster PCs a higher Q, for the slower PCs a lower Q.

> >
> > The target of the test was to see, if the system keep stable, that it
> did
> >
> I do not see why the system would not be stable unless you are running Windows
> or something.
>
I use w2k XP and also some 98, the are realy stable (But I do not install everything what can get....)

Also in the farm, 5 diskless stations (Linux router Project) they are still running 4 seti1 ... :-)

=>> now I reduces the number crunshing 4 seti 2, due I got still 0 total credits and Recent average credit (today july 8th)
(I still run seti 1 so there is hope to complete the next expected milestone)

the test with the 10 instances was made on a 98er system...


While I'am writing this, we do have blizzard, hoping the electric Power will not mess...

friendly greetings

Richard
ID: 5285 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Intel P4 Hyperthreading


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.