The Attack of the Killer 58.7s


log in

Advanced search

Message boards : Number crunching : The Attack of the Killer 58.7s

1 · 2 · 3 · 4 · Next
Author Message
N/A
Volunteer tester
Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 483295 - Posted: 16 Dec 2006, 8:34:26 UTC - in response to Message 483162.

I never got 0.7 WUs... 58 WUs, but never 0.7

(read: What're y'talkin' about?)

____________

Profile MikeSW17
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 483313 - Posted: 16 Dec 2006, 9:35:10 UTC
Last modified: 16 Dec 2006, 9:36:45 UTC

I think he's talking about lots of high-noise WUs that are taking 58.7 seconds.
These can take less time to process than to download another, resulting in an empty cache, an idle machine and only .01 credits for what does get done.

It's to be expected though as Berkeley are currently loading tapes not previously run due to suspicion they may be noisy.
/edit
Not that I've seen any significant number of these.
____________

Profile Pilot
Avatar
Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 483340 - Posted: 16 Dec 2006, 11:48:26 UTC - in response to Message 483315.
Last modified: 16 Dec 2006, 11:49:47 UTC

I think he's talking about lots of high-noise WUs that are taking 58.7 seconds.
These can take less time to process than to download another, resulting in an empty cache, an idle machine and only .01 credits for what does get done.

It's to be expected though as Berkeley are currently loading tapes not previously run due to suspicion they may be noisy.
/edit
Not that I've seen any significant number of these.


Not quite....
They are a breed of WU that, on my quad cruncher, take about 1h 33m to crunch for 58.7 credits instead of about 1h 7m for about 62 credits for the more common type of WU. Had about 1 1/2 days worth of them in my cache. Killed my RAC and drove up my pending credits. (Must take forever to crunch on a slower rig). I think I may have chewed up the majority of them (knock on silicon) and can get back to business. Oh well, ya' gotta crunch wot you got.

OooofDAaaaa My cache is totally filled with the things. I hope there is still potential in these, and not just house cleaning.

____________
When we finally figure it all out, all the rules will change and we can start all over again.

Profile Karsten Vinding
Volunteer tester
Send message
Joined: 18 May 99
Posts: 140
Credit: 16,679,355
RAC: 3,087
Denmark
Message 483397 - Posted: 16 Dec 2006, 15:08:52 UTC - in response to Message 483162.

My trusty old computers have encountered a lot of these too.

But even though they take longer than other (normal) WU's in the 58 points area, they still cruch much faster than 62 points WU's.

My comouters are all AMD, perhaps something in these WU's is hurting your Core2 Duo's performance.
____________

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8492
Credit: 4,191,075
RAC: 1,804
United Kingdom
Message 483445 - Posted: 16 Dec 2006, 16:05:59 UTC - in response to Message 483397.

My trusty old computers have encountered a lot of these too.

But even though they take longer than other (normal) WU's in the 58 points area, they still cruch much faster than 62 points WU's.

My comouters are all AMD, perhaps something in these WU's is hurting your Core2 Duo's performance.

Best guess is that it's the old Intel problem of the FSB bottleneck... In general, AMD's "hypertransport"/on-chip system RAM interface seems to give a more reasonable system balance.

(And before the Intel evangelists get too upset: Note that it is all a question of programming and application as to what architecture gives the best performance balance. The variety of s@h WU ARs appear to swing the balance across both AMD and Intel. This is where the optimisers can optimise the cache constraints!)


Happy crunchin',
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8631
Credit: 51,497,640
RAC: 48,393
United Kingdom
Message 483457 - Posted: 16 Dec 2006, 16:19:01 UTC

Like Karsten, I can't see that the 58.69 credit WUs can be classed as 'killers' for Xeon processors, either. Here's the latest version of my credits per hour chart:


(click to show detail)

The 58.69s are the tail to the left - getting slightly lower as the AR gets smaller (down to my lowest value of AR=0.002868), but not nearly as drastic as the high ARs on the right - they really kill this machine (blue and green dots). Unfortunately I never got any ultra low ARs when I had the 6300s in for testing, so I can't show a direct comparison.

Since I've been capturing the data on the Xeons, you might like this graph, which has come up almost as an accidental by-product:


(click to show detail)

I hadn't realised that everything below about AR=0.072 seems to claim the dreaded 58.69, even though the times get longer as the AR decreases. The spike in the middle - the rare high-credit WUs - is from about AR=0.088 to AR=0.108

(PS I want more like that little spot in the middle - 76.37 credits for AR=0.205782. Best per hour of all, but I've only seen one in over 1,400 recorded results!)

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8492
Credit: 4,191,075
RAC: 1,804
United Kingdom
Message 483464 - Posted: 16 Dec 2006, 16:44:05 UTC - in response to Message 483457.

Like Karsten, I can't see that the 58.69 credit WUs can be classed as 'killers' for Xeon processors, either. Here's the latest version of my credits per hour chart:


(click to show detail)...

Can you recapture those at a higher resolution please?

There's too many dots to see per screen pixel!

Thanks,
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8631
Credit: 51,497,640
RAC: 48,393
United Kingdom
Message 483469 - Posted: 16 Dec 2006, 16:59:53 UTC - in response to Message 483464.

Like Karsten, I can't see that the 58.69 credit WUs can be classed as 'killers' for Xeon processors, either. Here's the latest version of my credits per hour chart:


(click to show detail)...

Can you recapture those at a higher resolution please?

There's too many dots to see per screen pixel!

Thanks,
Martin

Sorry, 1560 x 948 pixels is about the best I can do on a 20" LCD.....

They should come up in all their gory details if you just hot-click through to ImageShack, but if that doesn't work you could try credits per hour and credit per angle range

(let's hope they work - I'm only just getting to grips with image hosting)

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8492
Credit: 4,191,075
RAC: 1,804
United Kingdom
Message 483476 - Posted: 16 Dec 2006, 17:19:48 UTC - in response to Message 483469.


(click to show detail)...

Can you recapture those at a higher resolution please?

Sorry, 1560 x 948 pixels is about the best I can do on a 20" LCD.....

That res is fine if the "png" actually showed that. The above link appears as something more like "icon sized"!

They should come up in all their gory details if you just hot-click through to ImageShack, but if that doesn't work you could try credits per hour and credit per angle range

(let's hope they work - I'm only just getting to grips with image hosting)

Those links work much better, thanks. (Shame about all the attempted "pop-ups" :-(

So why is the E6300 claiming so much more credit?

And the credit/hour suggests that either some system choke limits are getting hit or the FLOPS counts are not as accurate as they should be.

Interesting.

Happy crunchin',
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8631
Credit: 51,497,640
RAC: 48,393
United Kingdom
Message 483481 - Posted: 16 Dec 2006, 17:33:38 UTC - in response to Message 483476.


(click to show detail)...

Can you recapture those at a higher resolution please?

Sorry, 1560 x 948 pixels is about the best I can do on a 20" LCD.....

That res is fine if the "png" actually showed that. The above link appears as something more like "icon sized"!

Well, it's supposed to be 'thumbnail sized' to be kind to dial-up users, and to avoid stretching the thread on smaller monitors.

They should come up in all their gory details if you just hot-click through to ImageShack, but if that doesn't work you could try credits per hour and credit per angle range

(let's hope they work - I'm only just getting to grips with image hosting)

Those links work much better, thanks. (Shame about all the attempted "pop-ups" :-(

So why is the E6300 claiming so much more credit?


That's the $64,000 question! The clock speeds (processor and FSB) are the same, and the 5320s have a bigger L2 cache. The best guess at the moment seems to be the memory latency of the fully-buffered DIMMs used on Xeon motherboards, coupled with the fact that I'm not running interleaved dual-channel memory - I've only got two DIMMs, where I should have four. (More memory is on order, so we can check that out in the next edition). Any further suggestions are welcome in this thread.

And the credit/hour suggests that either some system choke limits are getting hit or the FLOPS counts are not as accurate as they should be.

Another suggestion is that the FLOP counts are accurate, but not all FLOPs are equal. There's some commentary in this thread.

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4301
Credit: 1,070,204
RAC: 1,104
United States
Message 483546 - Posted: 16 Dec 2006, 19:34:18 UTC - in response to Message 483481.


(click to show detail)...

Can you recapture those at a higher resolution please?

Sorry, 1560 x 948 pixels is about the best I can do on a 20" LCD.....

That res is fine if the "png" actually showed that. The above link appears as something more like "icon sized"!

Well, it's supposed to be 'thumbnail sized' to be kind to dial-up users, and to avoid stretching the thread on smaller monitors.
...


The forum setting "Show images as links" loses the URL of the full image, so "(click to show detail)..." merely gets the thumbnail displayed on an imageshack page. If convenient, making that phrase a link to the full image would help those using that setting.
Joe

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8631
Credit: 51,497,640
RAC: 48,393
United Kingdom
Message 483556 - Posted: 16 Dec 2006, 19:43:35 UTC - in response to Message 483546.

The forum setting "Show images as links" loses the URL of the full image, so "(click to show detail)..." merely gets the thumbnail displayed on an imageshack page. If convenient, making that phrase a link to the full image would help those using that setting.
Joe

Thanks Joe, I'll try and remember that for next time. Too late to edit these, unfortunately.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 31,013,626
RAC: 20,733
United States
Message 483635 - Posted: 16 Dec 2006, 21:20:52 UTC - in response to Message 483481.

That's the $64,000 question! The clock speeds (processor and FSB) are the same, and the 5320s have a bigger L2 cache. The best guess at the moment seems to be the memory latency of the fully-buffered DIMMs used on Xeon motherboards, coupled with the fact that I'm not running interleaved dual-channel memory - I've only got two DIMMs, where I should have four. (More memory is on order, so we can check that out in the next edition). Any further suggestions are welcome in this thread.


Well, I think it's more than just the lack of interleaved dual-channel memory, as even AnandTech was having low throughput with the FB-DIMMs, and the problem could be attributed to many things, including (but not limited to): available BIOS settings (differs between manufacturers), immature MCH (this is Intel's first foray into FB-DIMM whereas they've had some time to work with DDR2), immature chipset drivers, etc.

The interleaved dual-channel memory should have a theoretical 32GB/s throughput, and AnandTech wasn't even getting 4GB/s (whereas the Core 2 with DDR2 was getting around 6-10GB/s). Sustained throughput should be at least half the theoretical throughput, so I feel there's something wrong somewhere (more than just memory latency).

At least adding the extra RAM should help balance that part of the equation, but there's still things out of user control that can still affect performance (such as what I've stated above). I am definitely interested in finding out the results - so please keep us posted Mr. Haselgrove!
____________

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8492
Credit: 4,191,075
RAC: 1,804
United Kingdom
Message 483637 - Posted: 16 Dec 2006, 21:25:38 UTC - in response to Message 483546.
Last modified: 16 Dec 2006, 21:26:27 UTC

(click to show detail)...

Can you recapture those at a higher resolution please?
Sorry, 1560 x 948 pixels is about the best I can do on a 20" LCD.....
That res is fine if the "png" actually showed that. The above link appears as something more like "icon sized"!
Well, it's supposed to be 'thumbnail sized' to be kind to dial-up users, and to avoid stretching the thread on smaller monitors.
...
The forum setting "Show images as links" loses the URL of the full image, so "(click to show detail)..." merely gets the thumbnail displayed on an imageshack page. If convenient, making that phrase a link to the full image would help those using that setting.

Well spotted. That's exactly it.

I've long had the "Show images as links" set so as to speed up displaying a thread. (Too many slow sites for showing whatever stats graphics :-( )

Regards,
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 31,013,626
RAC: 20,733
United States
Message 483741 - Posted: 16 Dec 2006, 23:48:32 UTC

Hmmm... I was wrong about the theoretical throughput of the interlaced dual-channel FB-DIMMs. After doing some research, I've discovered it's supposed to be a little over 21GB/s, which should still offer a sustained throughput of a little over 10GB/s - but still nowhere near as low as the 4GB/s AnandTech was getting.
____________

Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 483857 - Posted: 17 Dec 2006, 1:46:32 UTC
Last modified: 17 Dec 2006, 1:48:00 UTC

Hi Ozzfan,

I just ran a memory benchmark on a Dual Woodcrest 5150 system. To my knowledge, it is running a quad-channel 667MHz FB-DIMM memory configuration (that is, unless by quad-channel they mean 4 modules per socket).

5843 MB/s Int
5862 MB/s Float

is what I got. So yeah, 4 GB/s is a little on the low side, but reality isn't that much better...it's really holding the Xeons back in comparison with other Core 2 based CPUs.

SiSoft Sandra is stating 55% bandwidth efficiency (close to the half theoretical figure you mentioned), so since that tallies up to ~11GB/s it may not be quad-channel after all (and I'll have to pester a Dell rep).

Regards,
Simon.
____________
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information

Hans Dorn
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 2245
Credit: 18,825,077
RAC: 851
Germany
Message 483864 - Posted: 17 Dec 2006, 1:56:49 UTC - in response to Message 483857.

Hi Ozzfan,

I just ran a memory benchmark on a Dual Woodcrest 5150 system. To my knowledge, it is running a quad-channel 667MHz FB-DIMM memory configuration (that is, unless by quad-channel they mean 4 modules per socket).

5843 MB/s Int
5862 MB/s Float

is what I got. So yeah, 4 GB/s is a little on the low side, but reality isn't that much better...it's really holding the Xeons back in comparison with other Core 2 based CPUs.

SiSoft Sandra is stating 55% bandwidth efficiency (close to the half theoretical figure you mentioned), so since that tallies up to ~11GB/s it may not be quad-channel after all (and I'll have to pester a Dell rep).

Regards,
Simon.


Hi Simon,

do you know which chipset your Dell uses?

The 5000V only has 2 FBDimm channels, the 5000P and 5000X have 4.

Regards Hans
____________

Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 483875 - Posted: 17 Dec 2006, 2:09:33 UTC
Last modified: 17 Dec 2006, 2:13:30 UTC

Hi Hans,

just checked, chipset is a 5000X, configured with 4x1 GB FB-DIMMs (Memory set at 5/5/5/15/20 @ 667MHz) and 2 5150 Xeons.

HTH,
Simon.
____________
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information

1 · 2 · 3 · 4 · Next

Message boards : Number crunching : The Attack of the Killer 58.7s

Copyright © 2014 University of California