one cpu or two cpu's, that is the question

Message boards : Number crunching : one cpu or two cpu's, that is the question
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 99190 - Posted: 15 Apr 2005, 16:12:08 UTC

One thing I get comfort from (as I assume many new people should) is that any answer they get to a question should be trusted. Because, if the person answering the question makes even the slightest error, or even a percieved error, then other members of this forum will quickly correct them. So, a person who asks a question just has to wait 20 minutes after the first answer to be sure of it's accuracy.

sorry to be off topic.

I know if I make a mistake, I'll be quickly corrected, and as long as I see the bigger picture (many others learning from someone elses posts)then I can try to bury my feelings about being corrected.

tony
ID: 99190 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 99193 - Posted: 15 Apr 2005, 16:19:05 UTC - in response to Message 99190.  


> sorry to be off topic.
>
> I know if I make a mistake, I'll be quickly corrected, and as long as I see
> the bigger picture (many others learning from someone elses posts)then I can
> try to bury my feelings about being corrected.
>
> tony
>

Thats for sure!

I'll answer a question the 20min later someone else will say the same thing
but the add a bunch of unneeded info which confuses the situation....

Hi Tony!




ID: 99193 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 99198 - Posted: 15 Apr 2005, 16:24:41 UTC

Sometimes the best way to speed something up is by leaving it alone.

A stripped-down Linux kernel is always fastest - more so when compiled for a particular CPU. And when you have the kernel and the CPU it's being run on trying to manage clock cycles... [shakes head] You only need one traffic cop - Two cause chaos.

Turn off HT if you're only crunching. Turn it on when you're shooting up aliens in a 3D game.
ID: 99198 · Report as offensive
fienna

Send message
Joined: 23 Aug 99
Posts: 14
Credit: 864,061
RAC: 0
United States
Message 99203 - Posted: 15 Apr 2005, 16:35:13 UTC

yes, disabling HT is an ok thing to do. for single threaded processes, it will ALWAYS be faster to not have HT enabled. for multi-threaded processes, it will always be faster to have HT, though you'll never get more around a 40% increase in speed. because BOINC lets you run multiple processes, using HT does mean that you process work units faster, however, it looks like your benchmarks are cut in half, meaning that in effect, there's no difference in the amount of claimed credit.

Now this is where things get interesting. Let's say that i'm using 2 HT processors (so 4 total) and all of them are processing BOINC units. My claimed credit will be about 1/2 as much as if i had HT disabled. (i measured it and it looks like it's about 42% less per unit) however, i'll be producing roughly 1.8ish to 1.9 times as many work units in the same amount of time, meaning that my total claimed credit is only reduced by about 10-20% HOWEVER, because i'm crunching units SO MUCH FASTER than most of the people out there (xeons are good like that) about 80-90% of the time MY claimed credit is the one that is dropped. the next highest claimed credit (the one that is actually counted) is on AVERAGE 40% or so higher than mine, which offsets the loss of claimed credit earlier.

meaning that even though my claimed credit drops when i have HT enabled, i'm making up for it with the higher claims of others. so despite the loss of claimed credits when you have HT enabled, it's much better to use it cause you can do twice as many units and get relatively the same credit for each unit (effectively doubling your output.)

This thinking more or less works with any HT enabled INTEL Pentium 4 or Xeon processor - there are a LOT of Athlons out there with inferior integer and FPU scores and people using old processors to run BOINC.



--
dead stars still burn
dead still stars burn
<br>
<img src="http://www.boincstats.com/stats/banner.php?cpid=eb47c3deca50beb5e7e1d23c186e7bad">
ID: 99203 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 99280 - Posted: 15 Apr 2005, 19:04:00 UTC
Last modified: 15 Apr 2005, 19:05:10 UTC

The use of HT in a single physical processor is like all attempts to increase throughput, a compromise. In a simple case lets look at using two physical processors in a system. WIth that you would think that you would get 2x performance. But now you have two processors using the same memory bandwidth, meaning that you have created a new throughput bottleneck.

Think of trying to fill the pool while watering the yard ... both tasks take longer because you have only so much capacity to pipe water through your house.

With HT turned ON, you get a boost in throughput at the expense of run time. As was stated below you get 40% to 60% more done in a unit time, but each of the tasks takes longer. If you compare the numbers for a 2.8 GHz single threaded processor against a HT at 3.0 GHz you will see that the 2.8 GHz machine is faster. But it only does one WU at a time. With HT on, you get two done at the same time (see the numbers). If you look at those numbers you can see that to "beat" the 2.8 GHz machine I had to use a 3.2 GHz chip with four times the cache memory.

The up and comming processors are going to be increasing the parallelism with more physical processors and internal logical processors and / or additional special function accellerators (think Floating Point Units).

We may even see "designer" chips where they are constructed of different "blends" of accellerators so that you can pick the CPU that will do best for your workloads. For those of us with BOINC Farms we want lots of FPU and some integer, but may trade off on some of the other possible accellerators ...

[edit]
FOrgot, for those interested there are "lessons" where I talk to the evvolution of the CPU ... there are also small discussions of Logical vs. physical processors and FPU ... also see Floating Point ...
ID: 99280 · Report as offensive
Profile Steve Cressman
Volunteer tester
Avatar

Send message
Joined: 6 Jun 02
Posts: 583
Credit: 65,644
RAC: 0
Canada
Message 99294 - Posted: 15 Apr 2005, 19:39:08 UTC - in response to Message 99203.  

- there are a LOT of Athlons out there with inferior integer and FPU
> scores

Why are people so misinformed about the speed of AMD?

Measured floating point speed 1909.55 million ops/sec
Measured integer speed 3248.21 million ops/sec

What is inferior about those numbers?



98SE XP2500+ @ 2.1 GHz Boinc v5.8.8

And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer.
ID: 99294 · Report as offensive
fienna

Send message
Joined: 23 Aug 99
Posts: 14
Credit: 864,061
RAC: 0
United States
Message 99304 - Posted: 15 Apr 2005, 20:07:53 UTC - in response to Message 99294.  

> - there are a LOT of Athlons out there with inferior integer and FPU
> > scores
>
> Why are people so misinformed about the speed of AMD?
>
> Measured floating point speed 1909.55 million ops/sec
> Measured integer speed 3248.21 million ops/sec
>
> What is inferior about those numbers?

What is your typical Claimed Credit per work unit?

Also, there is a wide variety of Athlon processors available. I know that the 3000+ series and up are nice and fast, but still not as good as a Xeon processor.

BTW - for a great white paper about HT - go here: http://www.intel.com/business/bss/products/hyperthreading/server/ht_server.pdf
--
dead stars still burn
dead still stars burn
<br>
<img src="http://www.boincstats.com/stats/banner.php?cpid=eb47c3deca50beb5e7e1d23c186e7bad">
ID: 99304 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 99315 - Posted: 15 Apr 2005, 20:27:38 UTC - in response to Message 99294.  

> Measured floating point speed 1909.55 million ops/sec
> Measured integer speed 3248.21 million ops/sec
>
> What is inferior about those numbers?

Nothing at all ... they are fine numbers. They, like all benchmarks, are meaningless. They key question is "how fast is the processor to do my actual workload?"

The Dell dual-Xeon I just bought impresses me in that it is doing 4 work units at the same time my other machines are doing only 2 ... though I wrote down the benchmark numbers they are useless ... what is interesting is that it does a SETI@Home WU in 2.5 to 3.0 hours ...
ID: 99315 · Report as offensive
fienna

Send message
Joined: 23 Aug 99
Posts: 14
Credit: 864,061
RAC: 0
United States
Message 99316 - Posted: 15 Apr 2005, 20:35:22 UTC - in response to Message 99315.  


> The Dell dual-Xeon I just bought impresses me in that it is doing 4 work units
> at the same time my other machines are doing only 2 ... though I wrote down
> the benchmark numbers they are useless ... what is interesting is that it does
> a SETI@Home WU in 2.5 to 3.0 hours ...

yeah, but the sad thing is that your credit for those wu's is dependent in part by thos meaningless benchmarks because of the new system.
--
dead stars still burn
dead still stars burn
<br>
<img src="http://www.boincstats.com/stats/banner.php?cpid=eb47c3deca50beb5e7e1d23c186e7bad">
ID: 99316 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13739
Credit: 208,696,464
RAC: 304
Australia
Message 99354 - Posted: 15 Apr 2005, 22:53:34 UTC - in response to Message 99315.  

> > Measured floating point speed 1909.55 million ops/sec
> > Measured integer speed 3248.21 million ops/sec
> >
> > What is inferior about those numbers?
>
> Nothing at all ... they are fine numbers. They, like all benchmarks, are
> meaningless. They key question is "how fast is the processor to do my
> actual workload?"

Yep, Whetstone & Dhrytsone have no relationship to actual computing performance- especially the Whetstone.

This chart is the perfect example of that. If you were to buy a CPU based on those figures you'd think that any 2.4GHz Intel CPU or faster would be better than the best that AMD has to offer for the desktop. Yuo'd be very disappointed as in actual appliactions the AMD is generally faster than the fastest Intel Desktop CPU, and by a significant amount in some of them.
Grant
Darwin NT
ID: 99354 · Report as offensive
Profile Kajunfisher
Volunteer tester
Avatar

Send message
Joined: 29 Mar 05
Posts: 1407
Credit: 126,476
RAC: 0
United States
Message 99365 - Posted: 15 Apr 2005, 23:18:36 UTC

AuthenticAMD mobile AMD Athlon(tm) XP 2000+ w/1GB RAM

Measured floating point speed 1131.93 million ops/sec (today)

Measured integer speed 2595.49 million ops/sec (today)

It may not be the fastest out there (that's for sure) but I'm content with it for now

As I said before, I have seen these benchmark stats change, up & down

*Question
Do all the BOINC projects use the same benchmark program? I'm sure the answer is in Paul's FAQ, just happened to think about it in the middle of my post...


.

No matter where you go, there you are...
ID: 99365 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 99382 - Posted: 16 Apr 2005, 0:10:57 UTC - in response to Message 99280.  

> The use of HT in a single physical processor is like all attempts to increase
> throughput, a compromise. In a simple case lets look at using two physical
> processors in a system. WIth that you would think that you would get 2x
> performance. But now you have two processors using the same memory bandwidth,
> meaning that you have created a new throughput bottleneck.

... and you get the additional overhead of trying to manage tasks running on two CPUs.

Back in the day (when I had dual and quad Burroughs B-Series mainframes at my disposal) it was generally accepted that each additional processor added something like .8 so dual mainframes were 1.8 times the power of a single machine, three came out around 2.6, etc.
ID: 99382 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 99640 - Posted: 16 Apr 2005, 15:07:58 UTC - in response to Message 99365.  

> *Question
> Do all the BOINC projects use the same benchmark program? I'm sure the answer
> is in Paul's FAQ, just happened to think about it in the middle of my post...

Yes. The benchmark is part of the BOINC Daemon.

There is code that will allow a project to tilt the scoring if they desire, but as far as I know, no project has chosen to do so.

@Ned,

Yea, 1.8 to 1.9 ... though for the IAPX432 that held good until about the 3rd of 4th processor and then total performance dropped way low because the processors were talking to each other so much they had no time left to work. :)
ID: 99640 · Report as offensive
Profile Kajunfisher
Volunteer tester
Avatar

Send message
Joined: 29 Mar 05
Posts: 1407
Credit: 126,476
RAC: 0
United States
Message 99661 - Posted: 16 Apr 2005, 16:04:22 UTC - in response to Message 99640.  

> > *Question
> > Do all the BOINC projects use the same benchmark program? I'm sure the
> answer
> > is in Paul's FAQ, just happened to think about it in the middle of my
> post...
>
> Yes. The benchmark is part of the BOINC Daemon.
>
> There is code that will allow a project to tilt the scoring if they desire,
> but as far as I know, no project has chosen to do so.
>
could that possibly mean "in their favor"? what i mean is, that the WU they send is predetermined (larger/slower/more cpu intensive) to maybe avoid server overloads, etc.? (i'm way out on a limb here i think)


.
No matter where you go, there you are...
ID: 99661 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 99709 - Posted: 16 Apr 2005, 18:00:24 UTC - in response to Message 99661.  

> could that possibly mean "in their favor"? what i mean is, that the WU they
> send is predetermined (larger/slower/more cpu intensive) to maybe avoid server
> overloads, etc.? (i'm way out on a limb here i think)

Not to tilt it in any direction, but to "balance" the amount of floating point vs. integer work in the science application.
ID: 99709 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : one cpu or two cpu's, that is the question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.