Cache is King, Pentium-M CPU is the fastest!!!

Message boards : Number crunching : Cache is King, Pentium-M CPU is the fastest!!!
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile jimmyhua

Send message
Joined: 16 Apr 05
Posts: 97
Credit: 369,588
RAC: 0
Guam
Message 103847 - Posted: 25 Apr 2005, 23:14:26 UTC
Last modified: 25 Apr 2005, 23:15:32 UTC

Well, I read in another thread that their Laptop computer beat out most of their desktops most of the time. I got curious and tried out my laptop computers.

WELL, That was an understatement. Here are my fastest processors:

1. Pentium 4 Extreme with 1 MB L2 cache. This is a dual core P4 3.2 Ghz with 1 MB of shared cache between the two processors.

2. AMD Sempron XP 2500+. This is really an Athlon 1.7 Ghz on Socket A, with 512K cache

3. Pentium M 1.6 Ghz with 1 MB L2 cache.

4. Duron 800 Mhz with 256K cache.

*** Results:

Pentium M is the fastest at crunching WU's, with a turn around time of between 2 and 2.5 hours.

AMD Sempron XP 2500+, is the 2nd fastest with a turn around time of between 4-6 hours.

Pentium 4 (with dual processors), takes 4 hours. (only did 1 WU), so far.

Duron 800 takes 8+ hours.

-----

How can a 1.6 Ghz processor trounce a 3.2 Ghz processor of the same architecture? It doesn't make any sense. Unless you realize the speed at which these CPU bound processes execute is limited due to millions and millions of cache misses.

After some deep thought, if you look at how much cache each machine has and say => more cache is faster.

It all makes sense:

256K cache Duron should be the slowest.

512K cache of Sempron
512K cache shared between 2 P4s

These should be able the same speed, and they are. (Well the P4s are slightly faster due to a faster Memory bus).

1MB cache of the Pentium-M should be the fastest. And it is!!!

-----------

To test my theory, I went into the preferences of BOINC, and made seti only run one process of itself on the multi-cored machine.

This essentially made it a single P4 processor with 1 MB cache (instead of a dual cored machine with a shared 1 MB cache).

Guess what? Turn around times for a single WU is now 2 hours, just a little faster than the Pentium-M processor (again due to faster memory architecture).

So, if you want a server farm. Make sure you buy CPU's that have 1 MB cache or more.

I believe that the seti program is so big, that you'd need about 4MB cache before you start to get diminishing returns by getting a larger cache. Too bad, I haven't seen a processor on the market that has THAT MUCH CACHE!!!

Jimmy

ID: 103847 · Report as offensive
Profile Prognatus

Send message
Joined: 6 Jul 99
Posts: 1600
Credit: 391,546
RAC: 0
Norway
Message 103853 - Posted: 25 Apr 2005, 23:24:41 UTC
Last modified: 25 Apr 2005, 23:25:10 UTC

Interesting, Jimmy! I had a hunch about this.

> I haven't seen a processor on the market that has THAT MUCH CACHE!!!

Intel Xeon has a couple of models, I beleive. (3rd level cache).

ID: 103853 · Report as offensive
Profile jimmyhua

Send message
Joined: 16 Apr 05
Posts: 97
Credit: 369,588
RAC: 0
Guam
Message 103858 - Posted: 25 Apr 2005, 23:35:30 UTC

Even more bizzare is the credit ratings.

Since the benchmarks on CPUs with TONS OF CACHE, and CPUs with a little cache actually end up benchmarking about the same.... The CPU benchmarks are more dependent on MHz (must be a small loop, so small cache works fine)

The Pentium-M, has low benchmarks, but spits out the WUs faster than anyone else, and claims very very very little credit for them.

So if you want the most credits, get lots of machines with little cache ;-).

I think BOINC should keep track of WU's completed in addition to credit ratings, as I believe it is more accurate in the long run.

Either that, or make a benchmark off of crunching a standardized WU (cache variances will now be accounted for).

Jimmy



ID: 103858 · Report as offensive
Profile KPX
Volunteer tester

Send message
Joined: 20 Oct 02
Posts: 14
Credit: 4,072,542
RAC: 0
Czech Republic
Message 103860 - Posted: 25 Apr 2005, 23:46:33 UTC - in response to Message 103858.  

I knew that processors with larger cache were faster in Seti Classic, so no surprise they are good for BOINC as well. But getting less credit for it? Bad idea, and I agree that should be changed.

However, regarding credit rating of various processors - it would be interesting to know, which BOINC application is the best at giving you more credit for less computing time. Any research done here?

KPX
ID: 103860 · Report as offensive
Profile Prognatus

Send message
Joined: 6 Jul 99
Posts: 1600
Credit: 391,546
RAC: 0
Norway
Message 103862 - Posted: 25 Apr 2005, 23:50:20 UTC
Last modified: 26 Apr 2005, 0:07:06 UTC

> The CPU benchmarks are more dependent on MHz

Yeah, but the benchmark doesn't say anything about how much credit you'll get. As I understand it, it's just an estimate of how long it'll take to finish a WU. Besides, the benchmark differs, depending on which version of BOINC you're using. v4.19 yields higher numbers than 4.25 (or later), due to a "bug". (Actually a design flaw caused the compiler to optimize the benchmark loop too much...)

> So if you want the most credits, get lots of machines with little cache ;-).

Nah, I'd like a big bad stack of double core 4400+ Opteron's - or Xeon's, when they're available. ;)

About credits: you should check out the settings in your account and try to optimize the values. Here are some of my General Preferences:

Switch between applications every 60 minutes
Write to disk at most every 900 seconds
Use no more than 90% of total virtual memory
Connect to network about every 10 days

ID: 103862 · Report as offensive
Profile jimmyhua

Send message
Joined: 16 Apr 05
Posts: 97
Credit: 369,588
RAC: 0
Guam
Message 103863 - Posted: 26 Apr 2005, 0:04:50 UTC - in response to Message 103860.  

> I knew that processors with larger cache were faster in Seti Classic, so no
> surprise they are good for BOINC as well. But getting less credit for it? Bad
> idea, and I agree that should be changed.

Yeah, but double?

With double the cache you get double the speed! That is insane.

I used to be a part of the distributed.net community, and back then double the cache meant 20% faster. And generally speaking, the prices on the processors also reflect that statement.

Here I am now, looking for a Socket A processor with 1MB cache (on eBay), it would literally double the speed my Sempron is crunching.

Not to mention, my laptop is crunching twice as fast as anything out there, yet only consuming a grand total of 35 watts of POWER!!!! It's the ideal seti farm machine!!!


Jimmy

ID: 103863 · Report as offensive
shady

Send message
Joined: 2 Feb 03
Posts: 40
Credit: 2,640,527
RAC: 0
United Kingdom
Message 103871 - Posted: 26 Apr 2005, 0:33:24 UTC - in response to Message 103847.  


> -----
>
> How can a 1.6 Ghz processor trounce a 3.2 Ghz processor of the same
> architecture? It doesn't make any sense. Unless you realize the speed at
> which these CPU bound processes execute is limited due to millions and
> millions of cache misses.

>
> Jimmy
>
>

What makes the largest difference is that the pentium-M is NOT the same architecture as the pentium 4 , the pentium-M is much closer to a pentium 3 than it is to a pentium 4.

I have a pentium M running at 2.66ghz that is doing boinc seti wu's in an average of 1hr 25mins.Whilst it does not claim much credit, it's claimed value is normally the lowest so I will actually get more credit than it claims.

Shady






<img src='http://www.boincsynergy.com/images/stats/comb-1527.jpg'>
ID: 103871 · Report as offensive
Matthew Williams

Send message
Joined: 7 May 99
Posts: 3
Credit: 148,519
RAC: 0
United States
Message 103897 - Posted: 26 Apr 2005, 2:29:48 UTC

My Athlon XP-M (desktop PC) is running at about 2.2 GHz. It usually takes 2.5 hours on SETI Classic and the new BOINC. I don't think the Sempron is as fast as a normal Athlon XP. We recently setup about 50 or so Dell Latitude D600s at work once I realized that the Pentium M 1.6GHz cpu would do classic work units in about 2 hours flat. Today I setup 10 or so IBM T40s with 1.5GHz Pentium Ms to run BOINC. I am eager to see how they fare in this new client.

I think one of the main reasons that the Pentium M is so fast is because SETI was originally optimised and built to run on the P2/P3 arcitecture. The large PIII M cache does help but the 400MHz BUS coupled with DDR probably helps it even more.
ID: 103897 · Report as offensive
Profile jimmyhua

Send message
Joined: 16 Apr 05
Posts: 97
Credit: 369,588
RAC: 0
Guam
Message 103957 - Posted: 26 Apr 2005, 5:15:57 UTC
Last modified: 26 Apr 2005, 5:17:05 UTC

I guess the only thing that will convince you it is the cache, is if I had a Pentium 3 with 1 MB cache with same Ghz, and a Pentium 3 with 2 MB cache with same Ghz (running off the same exact Motherboard), and show that it literally runs twice as fast.

I think I am in luck! They actually sell such beasts!!! Too bad it's not worth my time to pony up the money to make a point like this. I just share my data, and my conclusions/opinions based on the data.

I believe the cache helps ALOT MORE (for Seti applications) than a faster BUS. Lets face it. My laptop with Pentium-M is running off a substandard PC100 SDRAM bus. It isn't even interleaved. How can you tell me my Laptop with a strangled memory bandwidth is matching my P4 3.2 Ghz machine with DDR2-400, that Memory Bandwidth matters more than cache?

If this was the case, again the P4 with the modern memory bus should be leaving the Pentium M in the dust. Sure faster memory bus definitely helps, but it isn't the main factor that is slowing us down here.

Although the architecture of the Pentium-M is quite different to a Pentium 4. It is alot closer to a Pentium-4 than it is to a Pentium-3.

Just my opinion.

After doing some calculations. I think the Pentium-M's are probably running as fast as they can go and 1MB cache is optimal. However, the Pentium 4s and Athlons need 4MB to really shine.


Jimmy

ID: 103957 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 103983 - Posted: 26 Apr 2005, 7:49:19 UTC

Though the execution times may not be as fast, on many of the Intel processors you also have to take into account that they are doing more than one Work Unit at a time.

As an example, my Xeon does two WU per physical processor and in the case of SETI@Home has an average time of 2:20 (161 WUs processed). But I am doing two at once.
ID: 103983 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 103999 - Posted: 26 Apr 2005, 8:50:55 UTC - in response to Message 103957.  

> I guess the only thing that will convince you it is the cache, is if I had a
> Pentium 3 with 1 MB cache with same Ghz, and a Pentium 3 with 2 MB cache with
> same Ghz (running off the same exact Motherboard), and show that it literally
> runs twice as fast.

It would be interesting to see, but i wouldn't expect it to run twice as fast.

The P4 benefits from large caches as it has very deep pipelines & a miss prediction results in a huge penalty. A large cache helps reduce that penalty.
AMDs K7 series CPUs don't benefit from large caches due to their much shorter pipelines.

The Pentium M while having more in common with the P3 than the P4 is still quite a different beast all together. There have been many optimisations done to help improve it's performance at lower clock speeds & hence lower power usage.
Grant
Darwin NT
ID: 103999 · Report as offensive
shady

Send message
Joined: 2 Feb 03
Posts: 40
Credit: 2,640,527
RAC: 0
United Kingdom
Message 104029 - Posted: 26 Apr 2005, 12:43:32 UTC

If twice the cache equaled twice the speed then a pentium M with 2Mb would be twice the speed of a pentium M with 1MB of cache , and that is simply not the case.

In your example , a 1.6 with 1mb of L2 is taking upto 2.5 hrs and my 2.66 with 2mb of L2 "still" takes upto 1.5 hrs and thats with a processor fsb of 165 with memory running at 220 (read 440 in "marketing" language).

The p4's with 1MB of cache also do not run twice as fast as the p4's with 512kb of cache and clock for clock a pentium M with 1mb is faster at seti than a pentium 4 with 1mb.

Shady








<img src='http://www.boincsynergy.com/images/stats/comb-1527.jpg'>
ID: 104029 · Report as offensive
Profile SunRedRX7
Avatar

Send message
Joined: 9 May 03
Posts: 50
Credit: 11,180,795
RAC: 18
United States
Message 104031 - Posted: 26 Apr 2005, 12:55:54 UTC - in response to Message 103863.  


> With double the cache you get double the speed! That is insane.

Its not just doubling the cache thats doing it, its the architecture changes made in the Pentium M.

If you want to really test this theory, find some WU times from a Athlon XP with a Barton core(512K cache) vs a Thoroughbred Core(256K cache). You can find them with the same mhz as well. Here is some reviews comparing the 2.
http://www.digital-daily.com/motherboard/amd-barton/
http://www.overclockers.com.au/article.php?id=171618&P=3

If someone has an Athlon XP SEMPRON 2600+ @ 1.83ghz(166x11) with 256k cache they want to compare to my Athlon XP BARTON 2500+ at ~1.83ghz(stock 166x11)

My Bartons results are here
http://setiweb.ssl.berkeley.edu/results.php?hostid=689664
Looks like average WU time around 11,000 seconds.

I do have some benchmarks of an XP 1700 running 166x10(1.66ghz), Please note the errors on the 2nd page are due to a conflict with FREE-AV and climate prediction, not machine instability.
http://setiweb.ssl.berkeley.edu/results.php?hostid=31043
Average times look to be around 12,000 seconds. So it takes a 1000second hit for a .17ghz deficit and 256k less cache, or less then 10%.

Someone should do a comparison for some Pentium 4 chips between the same mhz, with varying cache sizes(Prescott vs Northwood vs Williamette).


BOINC WIKI
Overclockers.com's Forum
ID: 104031 · Report as offensive
Dan the Man

Send message
Joined: 15 Nov 00
Posts: 28
Credit: 453,086
RAC: 0
Germany
Message 104052 - Posted: 26 Apr 2005, 14:11:53 UTC

Don't generalize:
I've observed that the same CPU creates different WU finishing
times on other systems. So it even seems to depend on system
components, configuration and so on...

Example:
P4 3,06 HT (Northwood) on Gigabyte MB,
Chipset SIS 655: 4 hrs (remember HT = 2 WUs in 4 Hrs)
same CPU on FSC D-1528 i845 Chipset: 4:16 hrs up to 5:10 hrs (per 2 WUs)

another Example:
Athlon XP 2400+ Thouroughbred on ECS K7VZA,
Chipset KT133A, SD-RAM: up to 4:30 hrs
same CPU on ECS K7VTA3 Chipset KT333 DDR-RAM: 3:05 hrs

Cheers, Daniel


ID: 104052 · Report as offensive
Profile jimmyhua

Send message
Joined: 16 Apr 05
Posts: 97
Credit: 369,588
RAC: 0
Guam
Message 104073 - Posted: 26 Apr 2005, 15:21:43 UTC - in response to Message 104031.  

> If someone has an Athlon XP SEMPRON 2600+ @ 1.83ghz(166x11) with 256k cache
> they want to compare to my Athlon XP BARTON 2500+ at ~1.83ghz(stock 166x11)
>
> My Bartons results are here
> http://setiweb.ssl.berkeley.edu/results.php?hostid=689664
> Looks like average WU time around 11,000 seconds.

Yup, 10-11K seconds.

I have a Sempron 2500+ @ 1.75 Ghz. Should be ~5% slower.

Results are:

15-17K seconds.

You're right yours isn't twice as fast as I predicted it would be. For it to be twice as fast, you'd be going at 7-8K seconds... Or I'd be going 20-22K seconds.

I also realized there's no such thing as a Sempron 2500+ (running on Socket A) with 512K L2 cache :-). Mine really has only 256KB and it throws a wrench in my first post.

WELL, it looks like I did some overgeneralizing. It looks like only the P4 may benefit ALOT from alot more L2 cache.

Also, it looks like some very big differences can be achieved with a different memory architecture with the Athlon chips.

Athlon XP 2400+
"Chipset KT133A, SD-RAM: up to 4:30 hrs
same CPU on ECS K7VTA3 Chipset KT333 DDR-RAM: 3:05 hrs"

Is your Barton chip running DDR or standard SDRAM? I am running off DDR. If you're running SDRAM, this may account for it, and my "Double your cache and double your speed" generalization may actually still hold for Athlon.

Shady said, "The p4's with 1MB of cache also do not run twice as fast as the p4's with 512kb of cache and clock for clock a pentium M with 1mb is faster at seti than a pentium 4 with 1mb."

Actually I think the difference between 1 MB and 512KB on a P4 is close to double especially when you get to the higher 3+ GHz range. Still have to test this though (on strictly single core setups).

Let me word it another way.

Is there anything else out there (single CPU, single thread) that will beat Pentium-M 1.6 Ghz 7-9K seconds performance? (Yeah i know, some of you are lucky to have a Pentium-M 1.8 Ghz and it runs faster)

I have been turned off to Dual Core setups due to my experience. Looks like a waste to me. I can run 1 thread on my Dual Cored P4, 2 hours per WU, runs cooler too. Or, I can run 2 threads, and it takes 4 hours per WU, but it does twice at the same time. Also runs hot. (Half a dozen, or 6 eggs, which is more?). I attribute this problem to not enough cache on my P4.


Jimmy



ID: 104073 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 104080 - Posted: 26 Apr 2005, 15:34:17 UTC - in response to Message 104073.  

> I have been turned off to Dual Core setups due to my experience. Looks like a
> waste to me. I can run 1 thread on my Dual Cored P4, 2 hours per WU, runs
> cooler too. Or, I can run 2 threads, and it takes 4 hours per WU, but it does
> twice at the same time. Also runs hot. (Half a dozen, or 6 eggs, which is
> more?). I attribute this problem to not enough cache on my P4.

In general, with the Intel HT you actually do get more done with running the system with it on. It is not a huge gain, only in the 60% area. but that is still more done per unit time ...

ID: 104080 · Report as offensive
Dan the Man

Send message
Joined: 15 Nov 00
Posts: 28
Credit: 453,086
RAC: 0
Germany
Message 104091 - Posted: 26 Apr 2005, 16:07:13 UTC

@Jimmy

"Is your Barton chip running DDR or standard SDRAM? I am running off DDR. If you're running SDRAM, this may account for it, and my "Double your cache and double your speed" generalization may actually still hold for Athlon."

Actually it's a Thoroughbred B core. 256 KB L2 cache.
It ran both on DDR- and SDR-RAM. (3:05 vs. 4:30)
Think you're right, that might account for the difference.
ID: 104091 · Report as offensive
Metod, S56RKO
Volunteer tester

Send message
Joined: 27 Sep 02
Posts: 309
Credit: 113,221,277
RAC: 9
Slovenia
Message 104138 - Posted: 26 Apr 2005, 17:51:28 UTC - in response to Message 104073.  

> Is there anything else out there (single CPU, single thread) that will beat
> Pentium-M 1.6 Ghz 7-9K seconds performance? (Yeah i know, some of you are
> lucky to have a Pentium-M 1.8 Ghz and it runs faster)

There is: Pentium-M 2GHz :)
Metod ...
ID: 104138 · Report as offensive
Profile jimmyhua

Send message
Joined: 16 Apr 05
Posts: 97
Credit: 369,588
RAC: 0
Guam
Message 104252 - Posted: 26 Apr 2005, 23:24:28 UTC - in response to Message 104138.  

> > Is there anything else out there (single CPU, single thread) that will
> beat
> > Pentium-M 1.6 Ghz 7-9K seconds performance? (Yeah i know, some of you
> are
> > lucky to have a Pentium-M 1.8 Ghz and it runs faster)
>
> There is: <a> href="http://setiweb.ssl.berkeley.edu/show_host_detail.php?hostid=128155">Pentium-M
> 2GHz[/url] :)
>

Well, that proves it! Pentium-M doesn't need anymore cache. A simple increase in GHz, increases the speed quite linearly!!!

Dang. Don't they sell Pentium-M setups in desktops? heh.

Jimmy
ID: 104252 · Report as offensive
Profile Prognatus

Send message
Joined: 6 Jul 99
Posts: 1600
Credit: 391,546
RAC: 0
Norway
Message 104261 - Posted: 26 Apr 2005, 23:38:17 UTC
Last modified: 27 Apr 2005, 0:04:25 UTC

> Don't they sell Pentium-M setups in desktops?

Yes, more and more do just that. Also take a look here.

UPDATED: Here are a couple of links to Pentium-M desktop motherboards:

AOpen i855GMEm-LFS
DFI 855GME-MGF

ID: 104261 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Cache is King, Pentium-M CPU is the fastest!!!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.