Message boards :
Number crunching :
Intel P4 Hyperthreading
Message board moderation
Author | Message |
---|---|
Fragmire Send message Joined: 3 Apr 99 Posts: 4 Credit: 2,645 RAC: 0 |
BOINC doesn't seem to be hyperthreading aware. My computer's benchmark according to BOINC is halved in the integer performance category when hyperthreading is turned on. Also, the seti@home client only utilizes 50% of my CPU at any time. Is this by design? |
Michael Brennecke Send message Joined: 2 Apr 04 Posts: 33 Credit: 205,887 RAC: 0 |
My 2.8 P4-HT's do the same but they use 50% for each instance of S@H as they are working on 2 WU's at a time so overall 100% is used. My non-HT 2.4 P4 does them faster but only one at a time. I haven't really looked at the benchmarks |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> BOINC doesn't seem to be hyperthreading aware. My computer's benchmark > according to BOINC is halved in the integer performance category when > hyperthreading is turned on. Also, the seti@home client only utilizes 50% of > my CPU at any time. Is this by design? > You are ABSOLUTELY corect, 50% of the cpu is being used. But since Boinc is not currently setup to work on HT machines it is using 1/2 of your total cpu's 100% of the time. HT is NOT the same as dual cpu's! It is one cpu acting like 2, Boinc is telling you it found both cpu's but is only using 1 of them, hence the 50%. HT awareness should be along in the future, more and more computers are doing this so it will come! |
Kevin Erickson Send message Joined: 16 Oct 99 Posts: 31 Credit: 52,969 RAC: 0 |
My 2.6 P$ HT uses both "processors at the same time so realy on 50% on each WU . So it does two at a time. ut you are right it could use more of the HT technology to do the calculations faster. |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
Guys, What you see is what you get. You are seeing two work units being processed and both of the "logical" cpus is running full out. I have 3 each 2.8 GHz non-ht machines and they do a WU in about 3 hours. My two HT machines do two WU in about the same time. So, HT is there, and is being fully exploited. What we are not seeing yet is the effect of optimized code. But when the dust settles I am sure that we will find a site where versions of the software that have been optimized for individual processors will soon appear ... Then, and only then, will we likely see an improvement in processing speed. <p> <p> |
Legacy Send message Joined: 10 Dec 99 Posts: 134 Credit: 1,778,571 RAC: 0 |
Please do the benchmark again and READ carefully this time. It says Benchmark results: Number of CPUs: 2 1766 double precision MIPS (Whetstone) per CPU 2504 integer MIPS (Dhrystone) per CPU Notice the words PER CPU If you have 2 CPUs multiple the results by 2 to get the total. Simple Maths. |
Jean-David Beyer Send message Joined: 10 Jun 99 Posts: 60 Credit: 1,301,105 RAC: 1 |
> My 2.6 P$ HT uses both "processors at the same time so realy on 50% on each WU > . So it does two at a time. ut you are right it could use more of the HT > technology to do the calculations faster. > I do not understand this observation. I have two Hyperthreaded processors; i.e., the OS (Linux) acts as though there were four processors in this machine. I told BOINC to use up to 4 processors and it is doing that: BOINC started four instances of setiathome and they are all running along flat out as shown below. Note that %cpu refers to % of one cpu, but this machine has "four". USER PRI NI SIZE RSS SHARE %CPU %MEM CTIME COMMAND boinc 39 19 15300 14M 1308 98.2 0.3 110:43 setiathome_3.08_i686- boinc 39 19 15808 15M 1300 98.2 0.3 44:32 setiathome_3.08_i686- boinc 39 19 16368 15M 1300 98.2 0.3 43:39 setiathome_3.08_i686- boinc 39 19 16320 15M 1300 95.6 0.3 238:37 setiathome_3.08_i686- I do not see how this machine would do setiathome faster were the code re-written for hyperthreading. Just run two instances of setiathome (one instance of BOINC is enough) and let the OS take care of it. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Windows (in their infinite wisdom) reports that on a dual processor machine each processor has a maximum of 50%. I have a couple of dual processors, and they crunch 2WUs in the same that an otherwise identical single processor machine processes 1 WU. However, M$oft reports each of the S@H tasks as taking 50% on the duals, and 100% on the single. I believe this is more of an OS display issue than a BOINC issue. |
Jean-David Beyer Send message Joined: 10 Jun 99 Posts: 60 Credit: 1,301,105 RAC: 1 |
> Windows (in their infinite wisdom) reports that on a dual processor machine > each processor has a maximum of 50%. I have a couple of dual processors, and > they crunch 2WUs in the same that an otherwise identical single processor > machine processes 1 WU. However, M$oft reports each of the S@H tasks as > taking 50% on the duals, and 100% on the single. I believe this is more of an > OS display issue than a BOINC issue. > <a> href="http://www.boinc.dk/index.php?page=user_statistics&userid=9915"> > I did not write down the numbers, but I tried running 1, 2, 3, and 4 instances of setiathome on this two hyperthreaded processor machine; i.e., can run four threads simultaneously. IIRC, one work unit took about 3 hours 20 minutes. When I tried two work units at once, the time was about the same. I forget what happened at three work units, but when I tried four, they took about 4 hours each. I infer that a hyperthreaded processor is more than a single thread processor, but less than two. At least as Linux runs them. Not that this experiment is not perfect, since it assumes all work units take the same amount of time, and they do not. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
> I did not write down the numbers, but I tried running 1, 2, 3, and 4 instances > of setiathome on this two hyperthreaded processor machine; i.e., can run four > threads simultaneously. > > IIRC, one work unit took about 3 hours 20 minutes. > When I tried two work units at once, the time was about the same. I forget > what happened at three work units, but when I tried four, they took about 4 > hours each. I infer that a hyperthreaded processor is more than a single > thread processor, but less than two. At least as Linux runs them. > > Not that this experiment is not perfect, since it assumes all work units take > the same amount of time, and they do not. > An HT processor has several different areas of the processor. For example, Logic, Integer math, and Floating Point Math all happen in separate areas of the chip. The idea is to allow a second process to use an area of the chip that the first process is not using. However, S@H is very heavily Floating point, so the Floating Point area will be very heavily used, and be the bottleneck for getting work done. If you had two processes, one that was all integer, and the other that was all floating point, they would collapse together better. |
Thierry Van Driessche Send message Joined: 20 Aug 02 Posts: 3083 Credit: 150,096 RAC: 0 |
> Please do the benchmark again and READ carefully this time. It says > Benchmark results: > Number of CPUs: 2 > 1766 double precision MIPS (Whetstone) per CPU > 2504 integer MIPS (Dhrystone) per CPU > Notice the words PER CPU > If you have 2 CPUs multiple the results by 2 to get the total. > Simple Maths. Huuum, I do not agree fully with this. Looking to the results I got using the latest software, these are the numbers: HT enabled --- - 2004-06-25 12:28:35 - 1585 double precision MIPS (Whetstone) per CPU --- - 2004-06-25 12:28:35 - 1876 integer MIPS (Dhrystone) per CPU HT disabled --- - 2004-06-26 10:37:59 - 1851 double precision MIPS (Whetstone) per CPU --- - 2004-06-26 10:37:59 - 4001 integer MIPS (Dhrystone) per CPU This means the numbers are not exactly the double. There is another point. CPU time. When I used an earlier version of the software, I found a factor of 1.3 to 1.4 concerning CPU time using or not HT. When using HT the CPU time was 1.36 time longer per WU then if HT was disabled. I did not try to find out with the actual software what is the CPU time difference using HT or not. Greetings from Belgium. |
enusbaum Send message Joined: 29 Apr 00 Posts: 15 Credit: 5,921,750 RAC: 0 |
The point I think people are missing is that a HyperThreaded CPU is not TWO physical CPUs. Your maximum processing speed is the speed of the processor. There are a lot of cache control optimizations on the chip itself, so thats why you see a slight performance increase when you enable HT, but, for the most part, you only have ONE CPU. So when you run two WUs on a 3Ghz HT CPU, each WU isnt getting 3Ghz. Each WU is (roughly) getting 1.5Ghz (with cache control, it might be 1.6, but this is just for example). Don't think that when you enable HT your computer gives each WU 3Ghz, thus becoming a 6Ghz processor.. because that just isnt how it works. This is the reason why your benchmarks are for the most part HALFED and why work units almost take twice as long on an HT CPU. It's not code optimizations or anything of that nature which is causing BOINC to run 'slow' when HT is enabled, you just have two threads running that are both using 50% of your ONE CPU (each 50% being a 'Virtual CPU'). |
delete me Send message Joined: 3 Apr 99 Posts: 2 Credit: 67 RAC: 0 |
I did a test when the HT chips first came out about 2 years ago... I had a dell server with 2 x 1.4Ghz XeonMP cpus (and 4Gb RAM). To measure the performace, I used the Seti classic cli. With HT turned off, each processor took about 4 hours per WU per cpu, so in a 12 hour period, it would complete (2 cpus x (12/4) =) 6 work units. With HT turned on, I assigned a seti-cli to each virtual cpu. Each WU was processed in roughly 6 hours per vCPU. So every 6 hours, 4 WU's completed, giving a total of (4 cpus x (12/6) = ) 8 WU's per 12 hour period. This is roughly a 33% increase in performance, which leads me to believe that HT doesnt just slip the processor in two, but obviously makes better use of the whole CPU allowing the HT to increase the overall effeciency... In this case, the sum of the parts is much greater than the whole... |
Thierry Van Driessche Send message Joined: 20 Aug 02 Posts: 3083 Credit: 150,096 RAC: 0 |
> This is roughly a 33% increase in performance, which leads me to believe that > HT doesnt just slip the processor in two, but obviously makes better use of > the whole CPU allowing the HT to increase the overall effeciency... The number of 33% you found corresponds to what Intel roughly claims as a gain by using HT. They announce an increase of around 30%. Greetings from Belgium. |
ric Send message Joined: 16 Jun 03 Posts: 482 Credit: 666,047 RAC: 0 |
In my cases, 3 x 3.06 MHz, each 512 ddr400 memory, no oclock Seti classic, Version 3.03 each makes 14-15(sometimes 16) WUs per day HT on, 2 instances running I started first HT on, 1 x Instance running, 11-12 WUs per day, so I started with 2 Instances per HT CPU and it runs fine. The daily troughtput for the HT CPUs went higher Some month ago, I made a test on a amd 2400 (running at 2016 MHz) normaly a WU needs 3h09 (+/-) to complete. I started 10 Instances of seti classic and went to sleep. each instance with "stop_after_send.txt" It took a long time to complete. the CPU time reported by setiQ, yes the needed CPU Time was nearly the same as normal, but the "effective" duration took much longer. The daily troughtput for the amd 2400 decreased I prefere one WU after the other (2 on HT) so I'am able to return it soon. The target of the test was to see, if the system keep stable, that it did |
ahleong Send message Joined: 6 Apr 03 Posts: 16 Credit: 25,615 RAC: 0 |
3.06 MHz MHz? |
SwissNic Send message Joined: 27 Nov 99 Posts: 78 Credit: 633,713 RAC: 0 |
> 3.06 MHz > MHz? > > Yeah - it was the first release of a P4 with HT enabled. ------------------------------------------------ Once you have ruled out the impossible, everything else, however improbable, is possible! A.C. Doyle. |
Jean-David Beyer Send message Joined: 10 Jun 99 Posts: 60 Credit: 1,301,105 RAC: 1 |
> In my cases, 3 x 3.06 MHz, each 512 ddr400 memory, no oclock > > > Seti classic, Version 3.03 > > each makes 14-15(sometimes 16) WUs per day HT on, 2 instances running > > I started first HT on, 1 x Instance running, 11-12 WUs per day, > so I started with 2 Instances per HT CPU and it runs fine. > > The daily troughtput for the HT CPUs went higher > Running classic setiathome on Red Hat Enterprise Linux 3 ES, with two 3.06GHz XEON processors with 1 Megabyte L3 cache, running one instance, I got a little over 7 work units per day. With two instances I got around 15 work units per day. I forget what I got with three instances, but it was less than 22 work units per day. With four instances of setiathome, I get 24 work units completed per day. Since my OS thinks I have 4 processors (actually two hyperthreaded processors), it did not make sense to try more instances, but I would expect the law of diminishing returns to remain in force and to give even less benefit. I am a bit afraid that the work units per day would actually decrease before 24 work units per day with 5 instances, but I never made the test. > I prefere one WU after the other (2 on HT) so I'am able to return it soon. Since I doubt you meant you had 3 MHz processor (more likely 3 GHz processor), I doubt the ability to return work units soon is really important. I seem to recall that if you return results within a week it is good enough. You may even have more time than that if you are content to provide confirmation of results and not new results. But they can use confirmation too. I set up my cache for up to 7 days work, since at the moment I frequently run out of work (no doubt because the server is not delivering enough work for me to do). When the servers are working better, perhaps I will reduce the cache to 5 days. > > The target of the test was to see, if the system keep stable, that it did > I do not see why the system would not be stable unless you are running Windows or something. |
Guido_A_Waldenmeier_ Send message Joined: 3 Apr 99 Posts: 482 Credit: 4,774 RAC: 0 |
[/url] [/url] Bei der Eroberung des Weltraums sind zwei Probleme zu lösen:Die Schwerkraft und der Papierkrieg. Mit der Schwerkraft wären wir fertig geworden.Wernher von Braun |
ric Send message Joined: 16 Jun 03 Posts: 482 Credit: 666,047 RAC: 0 |
> > In my cases, 3 x 3.06 MHz, each 512 ddr400 memory, no oclock > >Yeah just 2 "mistakes" GHz and 3.00 GHz (800er frontside bus) > Running classic setiathome on Red Hat Enterprise Linux 3 ES, with two 3.06GHz > XEON processors with 1 Megabyte L3 cache, the higher cache the better ==> but you do have 2 Real CPUs(2x2=4 virtual cpus), HT ist 2 Virtual CPUs > processors), it did not make sense to try more instances, but I would expect A lot of things do not make sense but spends a lot of pleasure Caches: I use in the Profile the "Subprofiles" Home, Work and & Scholl. For the faster PCs a higher Q, for the slower PCs a lower Q. > > > > The target of the test was to see, if the system keep stable, that it > did > > > I do not see why the system would not be stable unless you are running Windows > or something. > I use w2k XP and also some 98, the are realy stable (But I do not install everything what can get....) Also in the farm, 5 diskless stations (Linux router Project) they are still running 4 seti1 ... :-) =>> now I reduces the number crunshing 4 seti 2, due I got still 0 total credits and Recent average credit (today july 8th) (I still run seti 1 so there is hope to complete the next expected milestone) the test with the 10 instances was made on a 98er system... While I'am writing this, we do have blizzard, hoping the electric Power will not mess... friendly greetings Richard |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.