Crunch3r's Generic SSE2 app is faster than Core 2 specific SSSE3 app

Author	Message
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 624351 - Posted: 22 Aug 2007, 14:57:24 UTC Hi, Have you noticed that actually the Crunch3r's Generic SSE2 app calculates much faster than the Intel Core 2 specific SSSE3 application. This workunit is calculated with the same E6600 processor. The other processor is a bit faster but it does not alone account for the 19,5 per cent time difference in calculated time. The clock difference is merely 12,5 per cent. http://setiathome.berkeley.edu//workunit.php?wuid=149420297 ID: 624351 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51562 Credit: 1,018,363,574 RAC: 1,004	Message 624352 - Posted: 22 Aug 2007, 15:17:11 UTC - in response to Message 624351. Hi, Have you noticed that actually the Crunch3r's Generic SSE2 app calculates much faster than the Intel Core 2 specific SSSE3 application. This workunit is calculated with the same E6600 processor. The other processor is a bit faster but it does not alone account for the 19,5 per cent time difference in calculated time. The clock difference is merely 12,5 per cent. http://setiathome.berkeley.edu//workunit.php?wuid=149420297 This may be a situation where one specific AR crunches a bit faster with one app than another one. You would have to check other AR WUs on that host to see if the speed difference is still the same. "Time is simply the mechanism that keeps everything from happening all at once." ID: 624352 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 624355 - Posted: 22 Aug 2007, 15:36:44 UTC Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4. Alinator ID: 624355 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 651791 - Posted: 30 Sep 2007, 19:49:11 UTC - in response to Message 624355. Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4. Alinator Hi, I updated the clients to CrunchR 2.4V for Core2Duo. Here are the results: A) Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change Average daily credit yield: 2000 credit B) Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change. Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V. Average daily credit yield: 1700 credit C) Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change) Average daily credit yield: 1400 credit With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds. If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client? ID: 651791 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 652111 - Posted: 1 Oct 2007, 2:50:14 UTC - in response to Message 651791. Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4. Alinator Hi, I updated the clients to CrunchR 2.4V for Core2Duo. Here are the results: A) Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change Average daily credit yield: 2000 credit B) Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change. Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V. Average daily credit yield: 1700 credit C) Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change) Average daily credit yield: 1400 credit With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds. If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client? If you were using the xW "generic" 2.2B, perhaps you should try the xW build in 2.4V_Windows_x32_SSE2_AMD.zip from Crunch3r's page. I find it difficult to believe that xW is fastest on a Core 2 duo, though each system does react differently. However, several Core 2 users have found that the xT builds of SSSE3 aren't the fastest on their systems even though that's the Intel compiler setting for Core 2. But those have usually found the SSE3 Prescott build (xP) better. I'll just request that you do change to a 2.4 application version. There's no way to force it, but I prefer to hope that my work won't be used in a fashion which will cause dissension. Joe ID: 652111 ·

Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0	Message 653232 - Posted: 3 Oct 2007, 4:55:44 UTC - in response to Message 652111. Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4. Alinator Hi, I updated the clients to CrunchR 2.4V for Core2Duo. Here are the results: A) Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change Average daily credit yield: 2000 credit B) Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change. Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V. Average daily credit yield: 1700 credit C) Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change) Average daily credit yield: 1400 credit With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds. If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client? If you were using the xW "generic" 2.2B, perhaps you should try the xW build in 2.4V_Windows_x32_SSE2_AMD.zip from Crunch3r's page. I find it difficult to believe that xW is fastest on a Core 2 duo, though each system does react differently. However, several Core 2 users have found that the xT builds of SSSE3 aren't the fastest on their systems even though that's the Intel compiler setting for Core 2. But those have usually found the SSE3 Prescott build (xP) better. I'll just request that you do change to a 2.4 application version. There's no way to force it, but I prefer to hope that my work won't be used in a fashion which will cause dissension. Joe Pay attention to the next version, my source code is in the mail :) ` ID: 653232 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 653247 - Posted: 3 Oct 2007, 6:12:58 UTC - in response to Message 653232. Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4. Alinator Hi, I updated the clients to CrunchR 2.4V for Core2Duo. Here are the results: A) Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change Average daily credit yield: 2000 credit B) Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change. Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V. Average daily credit yield: 1700 credit C) Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change) Average daily credit yield: 1400 credit With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds. If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client? If you were using the xW "generic" 2.2B, perhaps you should try the xW build in 2.4V_Windows_x32_SSE2_AMD.zip from Crunch3r's page. I find it difficult to believe that xW is fastest on a Core 2 duo, though each system does react differently. However, several Core 2 users have found that the xT builds of SSSE3 aren't the fastest on their systems even though that's the Intel compiler setting for Core 2. But those have usually found the SSE3 Prescott build (xP) better. I'll just request that you do change to a 2.4 application version. There's no way to force it, but I prefer to hope that my work won't be used in a fashion which will cause dissension. Joe Pay attention to the next version, my source code is in the mail :) ` A sourcecode comparison between 2.2B and 2.4V would be interesting. I installed 2.4V SSE2 AMD generic on the other machine here are some "results" Result ID 622880656 Version: Windows SSSE3 32-bit based on S@H V5.15 'Noo? No - Ni!' Revision: R-2.4V\|xT\|FFT:IPP_SSSE3\|Ben-Joe CPUID: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Speed: 2 x 2700 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 SSSE3 Work Unit Info WU Credit multi. is: 2.85 WU True angle range: 0.431783 Spikes Pulses Triplets Gaussians Flops 3 0 0 1 15838023561766 Claimed credit 52.2673222202279 Granted credit 52.2673222202279 CPU calculation time: 5,711.72 sec CPU time per credit ratio = 5711.72/52.2673222202279 = 109,269 sec/CR ----------- Result ID 623000484 Version: Windows SSE2 32-bit based on S@H V5.15 'Noo? No - Ni!' Revision: R-2.4V\|xB\|FFT:IPP_SSE2\|Ben-Joe CPUID: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Speed: 2 x 2700 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 Work Unit Info WU Credit multi. is: 2.85 WU True angle range: 0.432131 Spikes Pulses Triplets Gaussians Flops 0 0 0 0 15823838513207 Claimed credit 52.2131228938152 Granted credit 52.2130577652316 CPU calculation time: 5,653.88 sec CPU time per credit ratio = 5653.88/52.2130577652316 = 108,28 sec/CR I need still to analyse the flop per credit ratio as well as get comparative data from the 2.2B client. Obiviously i would like to minimize the CPU time spent per CR. Then deeper analysis that where are the differences in CPU time per WU angle or CPU time per FLOP or FLOPS spent per WU angle. Remains to be analyzed in addon post. ID: 653247 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.