Message boards :
Number crunching :
Crunch3r's Generic SSE2 app is faster than Core 2 specific SSSE3 app
Message board moderation
Author | Message |
---|---|
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 ![]() ![]() |
Hi, Have you noticed that actually the Crunch3r's Generic SSE2 app calculates much faster than the Intel Core 2 specific SSSE3 application. This workunit is calculated with the same E6600 processor. The other processor is a bit faster but it does not alone account for the 19,5 per cent time difference in calculated time. The clock difference is merely 12,5 per cent. http://setiathome.berkeley.edu//workunit.php?wuid=149420297 |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51562 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Hi, This may be a situation where one specific AR crunches a bit faster with one app than another one. You would have to check other AR WUs on that host to see if the speed difference is still the same. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 ![]() |
Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. In any event, both these hosts are running 2.2B which is now deprecated and should be retired and upgraded to 2.4. Alinator |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 ![]() ![]() |
Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. Hi, I updated the clients to CrunchR 2.4V for Core2Duo. Here are the results: A) Enhanced client 2.2B SSE2 for AMD generic 32 bit before multiplier change Average daily credit yield: 2000 credit B) Enhanced client 2.2B SSE2 for AMD generic 32 bit after multiplier change. Which means that 2.2B is overclaiming but the granted credit is less due to other clients claiming according to the new multiplier. However this client is still the fastest cruncher so it calculates the WU quicker than 2.4V. Average daily credit yield: 1700 credit C) Enhanced client 2.4V SSSE3 for Core2Duo 32 bit (after multiplier change) Average daily credit yield: 1400 credit With the E6600 most WU finished within 4000 seconds with the 2.2B client. With the 2.4V a WU takes on an average 6000 seconds. Only on some rare occasions the 2.4V calculates a very narrow angle WU in less than 4000 seconds. The standard 0.4 degree WU takes around 6000 seconds. If the 2.2B and the 2.4V clients produce the same result, only the 2.2B is on average quicker. Then why change the client? |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. If you were using the xW "generic" 2.2B, perhaps you should try the xW build in 2.4V_Windows_x32_SSE2_AMD.zip from Crunch3r's page. I find it difficult to believe that xW is fastest on a Core 2 duo, though each system does react differently. However, several Core 2 users have found that the xT builds of SSSE3 aren't the fastest on their systems even though that's the Intel compiler setting for Core 2. But those have usually found the SSE3 Prescott build (xP) better. I'll just request that you do change to a 2.4 application version. There's no way to force it, but I prefer to hope that my work won't be used in a fashion which will cause dissension. Joe |
![]() ![]() Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 ![]() |
Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. Pay attention to the next version, my source code is in the mail :) ` |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 ![]() ![]() |
Agreed, it's hard to make any conclusion from just one result in isolation. There could be other factors at work as well to account for the difference. I seem to recall reports of measurable differences in speed between SSE2 and 3 for AR's near the analysis breakpoints. A sourcecode comparison between 2.2B and 2.4V would be interesting. I installed 2.4V SSE2 AMD generic on the other machine here are some "results" Result ID 622880656 Version: Windows SSSE3 32-bit based on S@H V5.15 'Noo? No - Ni!' Revision: R-2.4V|xT|FFT:IPP_SSSE3|Ben-Joe CPUID: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Speed: 2 x 2700 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 SSSE3 Work Unit Info WU Credit multi. is: 2.85 WU True angle range: 0.431783 Spikes Pulses Triplets Gaussians Flops 3 0 0 1 15838023561766 Claimed credit 52.2673222202279 Granted credit 52.2673222202279 CPU calculation time: 5,711.72 sec CPU time per credit ratio = 5711.72/52.2673222202279 = 109,269 sec/CR ----------- Result ID 623000484 Version: Windows SSE2 32-bit based on S@H V5.15 'Noo? No - Ni!' Revision: R-2.4V|xB|FFT:IPP_SSE2|Ben-Joe CPUID: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Speed: 2 x 2700 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 Work Unit Info WU Credit multi. is: 2.85 WU True angle range: 0.432131 Spikes Pulses Triplets Gaussians Flops 0 0 0 0 15823838513207 Claimed credit 52.2131228938152 Granted credit 52.2130577652316 CPU calculation time: 5,653.88 sec CPU time per credit ratio = 5653.88/52.2130577652316 = 108,28 sec/CR I need still to analyse the flop per credit ratio as well as get comparative data from the 2.2B client. Obiviously i would like to minimize the CPU time spent per CR. Then deeper analysis that where are the differences in CPU time per WU angle or CPU time per FLOP or FLOPS spent per WU angle. Remains to be analyzed in addon post. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.