Crunch3r's Generic SSE2 app is faster than Core 2 specific SSSE3 app

Message boards : Number crunching : Crunch3r's Generic SSE2 app is faster than Core 2 specific SSSE3 app
Message board moderation

To post messages, you must log in.

AuthorMessage
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 624351 - Posted: 22 Aug 2007, 14:57:24 UTC

Hi,

Have you noticed that actually the Crunch3r's Generic SSE2 app calculates much faster than the Intel Core 2 specific SSSE3 application.

This workunit is calculated with the same E6600 processor. The other processor is a bit faster but it does not alone account for the 19,5 per cent time difference in calculated time. The clock difference is merely 12,5 per cent.

http://setiathome.berkeley.edu//workunit.php?wuid=149420297
ID: 624351 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51562
Credit: 1,018,363,574
RAC: 1,004
United States
Message 624352 - Posted: 22 Aug 2007, 15:17:11 UTC - in response to Message 624351.  

Hi,

Have you noticed that actually the Crunch3r's Generic SSE2 app calculates much faster than the Intel Core 2 specific SSSE3 application.

This workunit is calculated with the same E6600 processor. The other processor is a bit faster but it does not alone account for the 19,5 per cent time difference in calculated time. The clock difference is merely 12,5 per cent.

http://setiathome.berkeley.edu//workunit.php?wuid=149420297


This may be a situation where one specific AR crunches a bit faster with one app than another one. You would have to check other AR WUs on that host to see if the speed difference is still the same.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 624352 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 624355 - Posted: 22 Aug 2007, 15:36:44 UTC

Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints.

In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4.

Alinator
ID: 624355 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 651791 - Posted: 30 Sep 2007, 19:49:11 UTC - in response to Message 624355.  

Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints.

In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4.

Alinator


Hi,
I updated the clients to CrunchR 2.4V for Core2Duo.
Here are the results:

A)
Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change
Average daily credit yield: 2000 credit

B)
Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change.
Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V.
Average daily credit yield: 1700 credit

C)
Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change)
Average daily credit yield: 1400 credit

With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds.

If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client?


ID: 651791 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 652111 - Posted: 1 Oct 2007, 2:50:14 UTC - in response to Message 651791.  

Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints.

In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4.

Alinator


Hi,
I updated the clients to CrunchR 2.4V for Core2Duo.
Here are the results:

A)
Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change
Average daily credit yield: 2000 credit

B)
Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change.
Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V.
Average daily credit yield: 1700 credit

C)
Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change)
Average daily credit yield: 1400 credit

With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds.

If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client?

If you were using the xW "generic" 2.2B, perhaps you should try the xW build in 2.4V_Windows_x32_SSE2_AMD.zip from Crunch3r's page. I find it difficult to believe that xW is fastest on a Core 2 duo, though each system does react differently. However, several Core 2 users have found that the xT builds of SSSE3 aren't the fastest on their systems even though that's the Intel compiler setting for Core 2. But those have usually found the SSE3 Prescott build (xP) better.

I'll just request that you do change to a 2.4 application version. There's no way to force it, but I prefer to hope that my work won't be used in a fashion which will cause dissension.
                                                                 Joe
ID: 652111 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 653232 - Posted: 3 Oct 2007, 4:55:44 UTC - in response to Message 652111.  

Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints.

In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4.

Alinator


Hi,
I updated the clients to CrunchR 2.4V for Core2Duo.
Here are the results:

A)
Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change
Average daily credit yield: 2000 credit

B)
Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change.
Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V.
Average daily credit yield: 1700 credit

C)
Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change)
Average daily credit yield: 1400 credit

With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds.

If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client?

If you were using the xW "generic" 2.2B, perhaps you should try the xW build in 2.4V_Windows_x32_SSE2_AMD.zip from Crunch3r's page. I find it difficult to believe that xW is fastest on a Core 2 duo, though each system does react differently. However, several Core 2 users have found that the xT builds of SSSE3 aren't the fastest on their systems even though that's the Intel compiler setting for Core 2. But those have usually found the SSE3 Prescott build (xP) better.

I'll just request that you do change to a 2.4 application version. There's no way to force it, but I prefer to hope that my work won't be used in a fashion which will cause dissension.
                                                                 Joe

Pay attention to the next version, my source code is in the mail :)

`
ID: 653232 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 653247 - Posted: 3 Oct 2007, 6:12:58 UTC - in response to Message 653232.  

Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints.

In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4.

Alinator


Hi,
I updated the clients to CrunchR 2.4V for Core2Duo.
Here are the results:

A)
Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change
Average daily credit yield: 2000 credit

B)
Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change.
Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V.
Average daily credit yield: 1700 credit

C)
Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change)
Average daily credit yield: 1400 credit

With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds.

If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client?

If you were using the xW "generic" 2.2B, perhaps you should try the xW build in 2.4V_Windows_x32_SSE2_AMD.zip from Crunch3r's page. I find it difficult to believe that xW is fastest on a Core 2 duo, though each system does react differently. However, several Core 2 users have found that the xT builds of SSSE3 aren't the fastest on their systems even though that's the Intel compiler setting for Core 2. But those have usually found the SSE3 Prescott build (xP) better.

I'll just request that you do change to a 2.4 application version. There's no way to force it, but I prefer to hope that my work won't be used in a fashion which will cause dissension.
                                                                 Joe

Pay attention to the next version, my source code is in the mail :)

`


A sourcecode comparison between 2.2B and 2.4V would be interesting.

I installed 2.4V SSE2 AMD generic on the other machine here are some "results"

Result ID 622880656

Version: Windows SSSE3 32-bit based on S@H V5.15 'Noo? No - Ni!'
Revision: R-2.4V|xT|FFT:IPP_SSSE3|Ben-Joe
CPUID: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
Speed: 2 x 2700 MHz
Cache: L1=64K L2=4096K
Features: MMX SSE SSE2 SSE3 SSSE3

Work Unit Info
WU Credit multi. is: 2.85
WU True angle range: 0.431783

Spikes Pulses Triplets Gaussians Flops
3 0 0 1 15838023561766
Claimed credit 52.2673222202279
Granted credit 52.2673222202279

CPU calculation time: 5,711.72 sec

CPU time per credit ratio = 5711.72/52.2673222202279 = 109,269 sec/CR


-----------
Result ID 623000484

Version: Windows SSE2 32-bit based on S@H V5.15 'Noo? No - Ni!'
Revision: R-2.4V|xB|FFT:IPP_SSE2|Ben-Joe
CPUID: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
Speed: 2 x 2700 MHz
Cache: L1=64K L2=4096K
Features: MMX SSE SSE2 SSE3

Work Unit Info
WU Credit multi. is: 2.85
WU True angle range: 0.432131

Spikes Pulses Triplets Gaussians Flops
0 0 0 0 15823838513207

Claimed credit 52.2131228938152
Granted credit 52.2130577652316

CPU calculation time: 5,653.88 sec

CPU time per credit ratio = 5653.88/52.2130577652316 = 108,28 sec/CR


I need still to analyse the flop per credit ratio as well as get comparative data from the 2.2B client. Obiviously i would like to minimize the CPU time spent per CR. Then deeper analysis that where are the differences in CPU time per WU angle or CPU time per FLOP or FLOPS spent per WU angle. Remains to be analyzed in addon post.



ID: 653247 · Report as offensive

Message boards : Number crunching : Crunch3r's Generic SSE2 app is faster than Core 2 specific SSSE3 app


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.