checking for an AMD AstroPulse

Message boards : Number crunching : checking for an AMD AstroPulse
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1217136 - Posted: 12 Apr 2012, 4:11:19 UTC

Some of you Astropulsers might find this interesting (and, then again, maybe not). I have a question; please don't let the "tome" length keep you from skimming this.

What follows is a link to some pending Astropulse work from several of my computers, all running AMD processors; one with Phenom II, one with an FX 8120, and one with an old Athlon 64 x2.

http://setiathome.berkeley.edu/results.php?userid=48101&offset=0&show_names=0&state=3&appid=12

We all know that WUs can vary quite a bit in crunching time, but between those at the above link and some Valid results I did not link-to, a few trends seem to be developing.

The Phenom II seems to be best (that's a relative-thing; I know your i7s and even i5s will whip it) and with a NON AVX version of Lunatics Astropulse running it looks like the trend is for the FX 8120 to take 20% longer than the Phenom II to do WUs.

Maybe that's not surprising since the FX 8120 is "sharing" FPUs.

What surprises me is that the Athlon 64 x 2 is taking twice as long as the FX 8120 to do a work unit.

You'd think (just looking on the surface) that the FX processor is hamstrung compared to the Phenom II by sharing a floating point unit. But what's the Athlon 64's excuse for being half as fast as even the FX?

Athlon 64:

Build features: Non-graphics FFTW USE_INCREASED_PRECISION USE_SSE x86
CPUID: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+

Cache: L1=64K L2=1024K

The FX 8120:

Build features: Non-graphics FFTW USE_CONVERSION_OPT USE_SSE x86
CPUID: AMD FX(tm)-8120 Eight-Core Processor

Cache: L1=64K L2=2048K
Features used: MMX SSE


The Phenom 1100T shows:

Build features: Non-graphics FFTW USE_INCREASED_PRECISION USE_SSE x86
CPUID: AMD Phenom(tm) II X6 1100T Processor

Cache: L1=64K L2=512K



I think all of these are rev. 555 but I could be wrong.

Still, as I look through all these AP WUs one thing stands-out like a sore thumb; if you aren't using Lunatics optimized AP applications, you are wasting electricity and time. Look at this, for instance:

http://setiathome.berkeley.edu/workunit.php?wuid=966483290


It also looks like if you ARE running Lunatics' optimized applications and you aren't running them on an Intel processor, you are STILL wasting electricity and time. Look at this result:

http://setiathome.berkeley.edu/workunit.php?wuid=966199216

Am I over-generalizing, or might I be better-off letting a couple of my old Intel machines (currently crunching nothing) crunch AP under Lunatics' optimized apps and just stop crunching AP on my AMDs altogether?

I'm not at all sure that I really want to set my old P4 1.8GHz or old Dual-Core E5200 2.50GHz machines to crunch again (I took them out of crunching service specifically because they were no longer "efficient crunchers per watt").

But if these old Intel computers would crunch AP so much more efficiently than the AMD (getting just as much work done as a newer, faster, AMD processor per unit of time), I might want to hook them back up as AP-only crunchers and stop the AMD processors from crunching APs at all.

The P4 1.8GHz is so old it isn't good for much else, but I *really* don't want the additional heat with summer coming-on unless there is a "work-per-watt" efficiency to get.

Is it really not possible to get an AMD processor to crunch almost as fast as an Intel with optimized apps, or is it that there just is no interest in really optimizing for AMD?


Thoughts, opinions, clarifications, corrections??? All are welcomed and appreciated.
ID: 1217136 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1217152 - Posted: 12 Apr 2012, 4:58:08 UTC
Last modified: 12 Apr 2012, 5:06:02 UTC

I can't speak for Raistmer's r555 build, but do believe it's similar enough to my own r557 build that I can make some very general comments that should mostly apply.

Firstly, when dealing with Astropulse in particular, you are dealing with large datasets. Indeed original stock code managed these datasets relatively inefficiently, and as such during processing they tend to thrash caches & memory subsystems quite heavily, relative to laboratory ideal code. Improvements over time by Raistmer, Joe Segur, and myself, have tended to be fairly generically targeted so far. That is, not 'overly' targeted toward particular Intel or AMD CPUs, though these higher level fairly generic optimisations still will work better on Some CPUs than others.

In addition, while the main AP codebases on windows are usually built with Microsoft's Visual studio, fft library portions are, in the case of r555, fftw project supplied DLL's built with GCC compiler, or in r557's case, statically linked in Intel compiler based. My point being, that we're using a mixture of compilation methods & tools, such that it takes compiler technology out of the broader equation, and tends to place performance focus on the way the work 'fits' the hardware, along with the relative immaturity of the AP codebase with respect to targeted microarchitectural optimisations & library selection you describe at the end.

When comparing CPU microarchitectures, a good way to look at it in a very general way, would be sheer 'transistor budget'. Especially with later processor designs, Gigahertz tends to mean less, as caches increase in size & 'cleverness', namely more complex hardware prefetcher implementations.

In a very general way, that sortof says that as CPUs get more complex, they take some of the necessity for elaborate low (instruction) level optimisations out, placing the ball straight into high level algorithmic & memory handling court.

So I would argue, the Athlon 64 there is very much an older design, that predates the more recent moves toward energy efficiency. While there may be someone inspired to make builds that go, say, 25% faster on these Processors, your suggestions are about efficiency, and the points are quite valid IMO. I've recently pretty much retired my old p4, apart from testing, for similar reasons.

Optimisation is (in part) about efficiency more than performance alone, newer architectures are designed to be more efficient, and the focus of interest for 'most' coders I know will tend to be on side of what's on-hand and/or the path of least resistance for maximum gain... i.e. best use of limited time & resources for the biggest overall gain.

Interpretation of what constitutes 'biggest overall gain' even varies by individual developer. Obviously general improvements sent back to the project aimed at improving stock have the potential to improve efficiency the greatest, and by their nature and the resources these improvements do need to be relatively generally applicable. What tends to happen is targeted improvements are gradually filtered through third party optimised applications first & then work their way back to stock. The exception of Joe Segur's AVX implementation in V7 multibeam beta is a good example of refining proven existing methods in a targeted way, and third party targeted development needs to 'take note' about that appropaches effectiveness for wider benefits.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1217152 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1217157 - Posted: 12 Apr 2012, 5:10:08 UTC - in response to Message 1217136.  

Well only 1 of your links now works but your E5200 should be about the same if not a little faster than my old E6300 @ 2.33GHz which averages around 52000 seconds for AP's.

I wouldn't bother with that old P4 at all myself (I got rid of all my P4's a few years back as my Athlon X2's did much better than them and even those got retired about 2yrs ago).

I'll throw these numbers in for my other 2 as well, my Q6600 @ 3GHz averages around 41000 seconds and my 2500K @ 3.4GHz averages around 20500 seconds.

Cheers.
ID: 1217157 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1217173 - Posted: 12 Apr 2012, 5:45:11 UTC - in response to Message 1217157.  

Well only 1 of your links now works but your E5200 should be about the same if not a little faster than my old E6300 @ 2.33GHz which averages around 52000 seconds for AP's.

I wouldn't bother with that old P4 at all myself (I got rid of all my P4's a few years back as my Athlon X2's did much better than them and even those got retired about 2yrs ago).

I'll throw these numbers in for my other 2 as well, my Q6600 @ 3GHz averages around 41000 seconds and my 2500K @ 3.4GHz averages around 20500 seconds.

Cheers.


Can I throw in 5k to 6k seconds for a AMD GPU.

ID: 1217173 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1217176 - Posted: 12 Apr 2012, 6:21:34 UTC - in response to Message 1217152.  



Optimisation is (in part) about efficiency more than performance alone, newer architectures are designed to be more efficient, and the focus of interest for 'most' coders I know will tend to be on side of what's on-hand and/or the path of least resistance for maximum gain... i.e. best use of limited time & resources for the biggest overall gain.


Jason


I very much appreciate your taking the time to comment and explain. I believe that I understand and even agree that the effort made has to be applied to the greatest macro good.

Let me make this idiotic comment, though.

Looking at this:

Athlon 64 x 2
Measured floating point speed 2594.59 million ops/sec
Measured integer speed 6326.36 million ops/sec


AMD FX-8120 Eight-Core Processor
Measured floating point speed 2347.01 million ops/sec
Measured integer speed 7654.02 million ops/sec


AMD Phenom II X6 1100T Processor
Measured floating point speed 2722.39 million ops/sec
Measured integer speed 8237.25 million ops/sec

I don't see that the "power" to do the calculations themselves is much different. That leads me to conclude that the newer instruction sets are making more difference than increases in the "raw processing power" and even the memory subsystems.

The Athlon *is* on DDR2 PC-10600 RAM running at 1333MHz while the others are running on DDR3 RAM, although I haven't pushed it, so about 1600MHz.

The reason I bother reporting that is that the old Athlon's numbers (above) aren't all that terrible to my uneducated eye. "MMX, SSE, SSE2, SSE3, x86-64, 3DNow!" seems like a fairly "generic" list of instructions. The task reports "FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3".

I have NO idea what all that stuff means (and please don't tell me because I would try to understand your explanation and that might be dangerous for me - I could blow a fuse or fuse a synapse or something).

It just makes it very tough on us "casual users" when we not only have to keep-up with the "power" of our processors, but also if the programs we run are fully optimized for our hardware.

Apparently I'm going to have to back-off processing AP tasks on these AMD CPUs. I just can't justify three times the necessary processor's time. Maybe I'll buy an i5 and motherboard to replace this old one and let it crunch AP-only.

OR - would my better bet (cheaper, more efficient solution) be to let this old processor and motherboard combination feed a reasonable ATI video card for crunching AP-only?

Again, your thoughts, please.

I appreciate your input.
ID: 1217176 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1217177 - Posted: 12 Apr 2012, 6:26:59 UTC - in response to Message 1217173.  
Last modified: 12 Apr 2012, 6:30:48 UTC

Well only 1 of your links now works but your E5200 should be about the same if not a little faster than my old E6300 @ 2.33GHz which averages around 52000 seconds for AP's.

I wouldn't bother with that old P4 at all myself (I got rid of all my P4's a few years back as my Athlon X2's did much better than them and even those got retired about 2yrs ago).

I'll throw these numbers in for my other 2 as well, my Q6600 @ 3GHz averages around 41000 seconds and my 2500K @ 3.4GHz averages around 20500 seconds.

Cheers.


Can I throw in 5k to 6k seconds for a AMD GPU.


Yeah, that's what I was just asking. That may be the absolutely best bet for me - just crunch AP on an ATI/AMD GPU.

Soooo, how far do I have to go up the ATI GPU ladder to get decent AP WU numbers?

EDIT: If I could get to those kinds of times, I could crunch just as much AP work on one ATI card as I could on ten AMD CPUs and maybe two or four Intel CPUs. That appeals to me.
ID: 1217177 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1217182 - Posted: 12 Apr 2012, 7:25:53 UTC - in response to Message 1217176.  



Optimisation is (in part) about efficiency more than performance alone, newer architectures are designed to be more efficient, and the focus of interest for 'most' coders I know will tend to be on side of what's on-hand and/or the path of least resistance for maximum gain... i.e. best use of limited time & resources for the biggest overall gain.


Jason


I very much appreciate your taking the time to comment and explain. I believe that I understand and even agree that the effort made has to be applied to the greatest macro good.

Let me make this idiotic comment, though.

Looking at this:

Athlon 64 x 2
Measured floating point speed 2594.59 million ops/sec
Measured integer speed 6326.36 million ops/sec


AMD FX-8120 Eight-Core Processor
Measured floating point speed 2347.01 million ops/sec
Measured integer speed 7654.02 million ops/sec


AMD Phenom II X6 1100T Processor
Measured floating point speed 2722.39 million ops/sec
Measured integer speed 8237.25 million ops/sec

I don't see that the "power" to do the calculations themselves is much different. That leads me to conclude that the newer instruction sets are making more difference than increases in the "raw processing power" and even the memory subsystems.

The Athlon *is* on DDR2 PC-10600 RAM running at 1333MHz while the others are running on DDR3 RAM, although I haven't pushed it, so about 1600MHz.

The reason I bother reporting that is that the old Athlon's numbers (above) aren't all that terrible to my uneducated eye. "MMX, SSE, SSE2, SSE3, x86-64, 3DNow!" seems like a fairly "generic" list of instructions. The task reports "FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3".

I have NO idea what all that stuff means (and please don't tell me because I would try to understand your explanation and that might be dangerous for me - I could blow a fuse or fuse a synapse or something).

It just makes it very tough on us "casual users" when we not only have to keep-up with the "power" of our processors, but also if the programs we run are fully optimized for our hardware.

Apparently I'm going to have to back-off processing AP tasks on these AMD CPUs. I just can't justify three times the necessary processor's time. Maybe I'll buy an i5 and motherboard to replace this old one and let it crunch AP-only.

OR - would my better bet (cheaper, more efficient solution) be to let this old processor and motherboard combination feed a reasonable ATI video card for crunching AP-only?

Again, your thoughts, please.

I appreciate your input.


Hello.

I do run lunatics ap tasks only on my ATI card.
It is very efficient.
http://setiathome.berkeley.edu/results.php?hostid=6265988
I run 2 tasks at the same time on it.

I have no clue if it is more cost-effective then running the optimized lunatics CPU(AVX) ap version though.

Maybe someone has done a calculation here??

It is only an "old" ATI 5850 i use.
The 6950/70 is better i think.

//TRuEQ

TRuEQ & TuVaLu
ID: 1217182 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1217185 - Posted: 12 Apr 2012, 7:31:06 UTC - in response to Message 1217136.  

I'm sorry I completely forgot about you...

For the general public: tbret was asked to run a non-AVX version for life time comparisons, as there was some indication, that AVX may actually fare worse on 'Bulldozer'. (maybe because of the shared FPU).

http://setiathome.berkeley.edu/results.php?userid=48101&offset=0&show_names=0&state=3&appid=12


only you can use that link, everybody else has to go through hostids:

I confess I am utterly lost when it comes to CPU names.
I wouldn't know for sure which of those hosts has AVX and which hasn't.

one host is running non-AVX r548:

http://setiathome.berkeley.edu/results.php?hostid=6568834&offset=0&show_names=0&state=0&appid=12

these hosts are running r555:

http://setiathome.berkeley.edu/results.php?hostid=6011644&offset=0&show_names=0&state=0&appid=12
http://setiathome.berkeley.edu/results.php?hostid=5829212&offset=0&show_names=0&state=0&appid=12
http://setiathome.berkeley.edu/results.php?hostid=6607030&offset=0&show_names=0&state=0&appid=12

you can probably get a bit extra out of them if you switch to r557 (via installer)

As for the experiment on the AVX host, I'd like to gather some more results (maybe a week's worth) and then look at r557 performance again.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1217185 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1217189 - Posted: 12 Apr 2012, 7:39:50 UTC - in response to Message 1217182.  



Hello.

I do run lunatics ap tasks only on my ATI card.
It is very efficient.
http://setiathome.berkeley.edu/results.php?hostid=6265988
I run 2 tasks at the same time on it.

I have no clue if it is more cost-effective then running the optimized lunatics CPU(AVX) ap version though.

Maybe someone has done a calculation here??

It is only an "old" ATI 5850 i use.
The 6950/70 is better i think.

//TRuEQ


I like this idea. Thanks for that model number and those links to your times. Yes, it is much less expensive for me to find a 5850 or equivalent than to build a new Intel-based computer and your times are lower than any CPU numbers I've seen.


ID: 1217189 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1217193 - Posted: 12 Apr 2012, 7:47:52 UTC - in response to Message 1217185.  
Last modified: 12 Apr 2012, 7:51:29 UTC

I'm sorry I completely forgot about you...

For the general public: tbret was asked to run a non-AVX version for life time comparisons, as there was some indication, that AVX may actually fare worse on 'Bulldozer'. (maybe because of the shared FPU).

http://setiathome.berkeley.edu/results.php?userid=48101&offset=0&show_names=0&state=3&appid=12


only you can use that link, everybody else has to go through hostids:

I confess I am utterly lost when it comes to CPU names.
I wouldn't know for sure which of those hosts has AVX and which hasn't.

one host is running non-AVX r548:

http://setiathome.berkeley.edu/results.php?hostid=6568834&offset=0&show_names=0&state=0&appid=12

these hosts are running r555:

http://setiathome.berkeley.edu/results.php?hostid=6011644&offset=0&show_names=0&state=0&appid=12
http://setiathome.berkeley.edu/results.php?hostid=5829212&offset=0&show_names=0&state=0&appid=12
http://setiathome.berkeley.edu/results.php?hostid=6607030&offset=0&show_names=0&state=0&appid=12

you can probably get a bit extra out of them if you switch to r557 (via installer)

As for the experiment on the AVX host, I'd like to gather some more results (maybe a week's worth) and then look at r557 performance again.


You forgot about me? You forgot about me?! Oh, that just makes me so very sad. I'm seriously not-feeling the love here.

I suppose I can let it run a while. Are you going to watch it? I'm about to "disappear" for a while.

I'll be even easier to forget, then.

The FX 8120 is the "bulldozer" and the only one of my computers capable of AVX, I believe.

Sorry for the dumb-linking.

EDIT: Huh? Okay, you want me to run the AVX-capable CPU on the non-AVX version for about a week, then you want me to update that with 557 and we'll see what happens. Have I got that right?

Do you really want me to do something else to the others?
ID: 1217193 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1217198 - Posted: 12 Apr 2012, 8:17:08 UTC - in response to Message 1217193.  


EDIT: Huh? Okay, you want me to run the AVX-capable CPU on the non-AVX version for about a week, then you want me to update that with 557 and we'll see what happens. Have I got that right?


That would be the plan, yes. That way we'll get host specific life comparison times between builds.
Thank you for your effort, it's really apreciated - that is I apreciate it I don't know about the others.
You are basically providing data I can use as a basis for recommendations in future releases.

Do you really want me to do something else to the others?


You don't have to - but r557 is probably faster by a few %. Just suggesting how you might improve performance a bit.

I'm not the Pope. I don't speak Ex Cathedra!
ID: 1217198 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1217208 - Posted: 12 Apr 2012, 9:29:42 UTC - in response to Message 1217176.  
Last modified: 12 Apr 2012, 9:42:04 UTC

Tbret:

To answer your original question about why the Athlon64 does only half as good as the Phenom/FX8120.

When AMD released the Phenoms, they beefed up the FPU part of the CPU, so it got double-wide execution units, so e.g. the Phenom could now do 1 SSE3 operation pr clock, where the Athlon64 could only do 1 every 2 clocks (As far as I remember, they now had 2x128bit FPU's, where they had 2x64bit that had to be combined to do higher level SSE math on Athlon's). They also changed the schedulers and decoders and so on for higher efficiency.

That together with the L3 cache, which is probably good for Seti work, did make the Phenom a great deal faster on optimized code than the A64, at same clockspeed.
But on most other code it was only marginally faster.
Thats probably why Boincs built in benchmark shows little difference, its not using the 128bit FPU's at all.

The Phenom was actually a bigger update than it was given credit for, but AMD can blame themselves for being much to late to the game at low frequencies (Core 2 was out and beat it badly and i7/i5 was not long away), and then the release of it was ruined by the bug that was in Phenom I, that, allthough nobody has probably experienced it, ruined the chips reputation even more.

My FX8150 running @ 4.3GHz is doing AP in ~41k seconds, which is not impressive when an i5 @ 3.4Ghz does it in 26k
ID: 1217208 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1217243 - Posted: 12 Apr 2012, 12:34:00 UTC - in response to Message 1217208.  

My FX8150 running @ 4.3GHz is doing AP in ~41k seconds, which is not impressive when an i5 @ 3.4Ghz does it in 26k

That i5 can't be running optimised app's then or it was a heavily blanked work unit.

Cheers.
ID: 1217243 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1217292 - Posted: 12 Apr 2012, 15:20:54 UTC

My FX6100 running at stock 3.3GHz does an AP in ~40k seconds using r557. Due to various benchmarking and testing, I have discovered that if I run more than three tasks (half the total number of cores) then the times increase to the low 60k range. The shared FPU thing really hurts when you run several tasks that need the FPU simultaneously.

However, my previous setup was 90nm Opteron (Santa Rosa) at 3.0GHz and it was averaging 93k seconds. So the shrink from 90nm -> 32nm + architecture improvements + 300MHz = just over a 50% increase in productivity.

I did notice a small issue that I had mentioned to both Josef and Jason last year, which was that for some reason in a 2p setup with 90nm Opterons (don't know if it affected anything else in the same way), there would occasionally be one AP that would run 25-50% longer than normal. Any casual mentions of that in public threads would immediately result in people saying "that task had high blanking." When in fact, it had either zero or <5% blanking. When I would see one of those tasks, I would save the WU for it and run it stand-alone when all the other cores were idle, and it would run at the expected normal speed.

Point is, as Jason alluded to, cache thrashing is detrimental to any task. The less you thrash, the faster and more efficiently it will run. That being said, Intel CPUs have--for a long time--had much more L2 cache than AMD, so it can keep more data in the L2 and for longer periods of time than AMD can. That's part of the reason why Intel does so much better on this project than AMD. Other architecture design differences also play a role, but high-speed low-latency on-chip cache is very important.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1217292 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1217296 - Posted: 12 Apr 2012, 15:38:29 UTC - in response to Message 1217208.  



When AMD released the Phenoms, they beefed up the FPU part of the CPU, so it got double-wide execution units, so e.g. the Phenom could now do 1 SSE3 operation pr clock, where the Athlon64 could only do 1 every 2 clocks (As far as I remember, they now had 2x128bit FPU's, where they had 2x64bit that had to be combined to do higher level SSE math on Athlon's). They also changed the schedulers and decoders and so on for higher efficiency.



Thank you for that understandable explanation.

That was the information I was looking-for in chip comparisons that I couldn't find in the wee hours of the morning. I was looking with my eyes at half-mast. It's especially useful to know that the BOINC benchmarking doesn't reflect the weaknesses.

That might be something for BOINC-folks to look into. (more useful benchmarks)

Looking at its history of Valid CPU work against others, I can see that the Athlon 64 x 2 is weak compared to...well, just about everything else. The APs super-exposed the weaknesses with their extended run-times.

I guess it also exposes something else: No computer that's ever run on that CPU has ever "seemed" laggy or slow or awful, even unzipping large files. We (casual users) must rarely tap the calculating power we have in the new CPUs.

I suppose I'll take that CPU out of the crunching mix. (as with the old P4 and Dual Core; for me, it's an efficiency issue) Now at least I have a basic understanding of "why."

Thanks again.
ID: 1217296 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1217301 - Posted: 12 Apr 2012, 15:52:49 UTC - in response to Message 1217177.  

Well only 1 of your links now works but your E5200 should be about the same if not a little faster than my old E6300 @ 2.33GHz which averages around 52000 seconds for AP's.

I wouldn't bother with that old P4 at all myself (I got rid of all my P4's a few years back as my Athlon X2's did much better than them and even those got retired about 2yrs ago).

I'll throw these numbers in for my other 2 as well, my Q6600 @ 3GHz averages around 41000 seconds and my 2500K @ 3.4GHz averages around 20500 seconds.

Cheers.


Can I throw in 5k to 6k seconds for a AMD GPU.


Yeah, that's what I was just asking. That may be the absolutely best bet for me - just crunch AP on an ATI/AMD GPU.

Soooo, how far do I have to go up the ATI GPU ladder to get decent AP WU numbers?

EDIT: If I could get to those kinds of times, I could crunch just as much AP work on one ATI card as I could on ten AMD CPUs and maybe two or four Intel CPUs. That appeals to me.


Those times were on a HD7750, I had to replace my HD5830 because the fan died on it again.

ID: 1217301 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1217340 - Posted: 12 Apr 2012, 17:38:06 UTC - in response to Message 1217243.  
Last modified: 12 Apr 2012, 17:39:39 UTC

Wiggo:

I can see your i5 2500k at ~3,3GHz does them in 20-21k.

My 8150, does 8 off them, but at roughly double the time for each of them, but it should end up having roughly the same throughput (2500k is a 4 thread CPU right?).

Sadly my 8150 needs 1Ghz of speed more to do it, and probably a fair amount of extra power too. So the results are still not too impressive.

I'm not disatisfied with it as such, but AMD has long ways to go to match Intel, and with probably 1/10 the research power, it is unlikely they will ever catch up again.
ID: 1217340 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1217441 - Posted: 12 Apr 2012, 23:16:20 UTC - in response to Message 1217340.  

Wiggo:

I can see your i5 2500k at ~3,3GHz does them in 20-21k.

My 8150, does 8 off them, but at roughly double the time for each of them, but it should end up having roughly the same throughput (2500k is a 4 thread CPU right?).

Sadly my 8150 needs 1Ghz of speed more to do it, and probably a fair amount of extra power too. So the results are still not too impressive.

I'm not disatisfied with it as such, but AMD has long ways to go to match Intel, and with probably 1/10 the research power, it is unlikely they will ever catch up again.

I have all power saving options and Turbo boost turned off so my 2500K defaults to 3.4GHz (a 100MHz overclock) but also remember that while both my 2500K (4 cores) and your 8150 (8 cores) would have a similar work output your 8150 is a 125W part while mine is only a 95W part. Now to take this even further, under full load my rig would be using about 140W while your's would be getting closer to using 230W of power (power consumption figures were referenced from several online review sites).

So in the end the real difference is in our power bills.

Cheers.
ID: 1217441 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1217445 - Posted: 12 Apr 2012, 23:35:12 UTC - in response to Message 1217441.  

I think I mentioned that in my post, by saying my 8150 probably used a fair amount more power to make the same amount of calculations.

Theres no denying Intel's latest processors are very efficient in every important respect. They definately learned their lesson from the P4.
ID: 1217445 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1217447 - Posted: 12 Apr 2012, 23:50:30 UTC - in response to Message 1217445.  

I've toyed a few times about building another AMD setup since my old Athlon II X4 (which my 2500K replaced) but so far I just cannot justify doing so though that's not to say that I won't at some point in the future.

Those P4's were good at keeping rooms warm in winter though but now I use video cards to do the same job and getting much more work done at the same time. :D

Cheers.
ID: 1217447 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : checking for an AMD AstroPulse


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.