Optimised applications question

Message boards : Number crunching : Optimised applications question
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Chidge
Avatar

Send message
Joined: 25 Jul 05
Posts: 5
Credit: 16,119,741
RAC: 0
United Kingdom
Message 664466 - Posted: 22 Oct 2007, 16:27:04 UTC

Evening all :)

I have an Intel e2140 overclocked to 2.66GHz on which I am experimenting with optimised apps.

The CPU supports SSSE3 so I tried the appropriate app from Crunch3r's site. Below is sample of the CPU times this produced -
2.4V_Windows_x32_SSSE3
11,250.72 - 54.09
11,071.31 - 54.07
10,679.73 - 54.08
10,846.38 - 54.07

This didn't seem as fast as I expected so I've now tried the SSE3 app - cpu times below.
2.4V_Windows_x32_SSE3
9,155.36 - 54.26
9,779.33 - 54.07

I'm a bit confused now why the SSE3 optimised app seems to be faster than the SSSE3 app. I'm wondering if these times are normal at this clock speed for a core duo and/or if I should be using a different optimised app.

Any insights would be appreciated :)

chidge
ID: 664466 · Report as offensive
dblEagle
Avatar

Send message
Joined: 3 Apr 99
Posts: 136
Credit: 45,641
RAC: 0
United States
Message 664475 - Posted: 22 Oct 2007, 16:53:59 UTC - in response to Message 664466.  

Evening all :)

I have an Intel e2140 overclocked to 2.66GHz on which I am experimenting with optimised apps.

The CPU supports SSSE3 so I tried the appropriate app from Crunch3r's site. Below is sample of the CPU times this produced -
2.4V_Windows_x32_SSSE3
11,250.72 - 54.09
11,071.31 - 54.07
10,679.73 - 54.08
10,846.38 - 54.07

This didn't seem as fast as I expected so I've now tried the SSE3 app - cpu times below.
2.4V_Windows_x32_SSE3
9,155.36 - 54.26
9,779.33 - 54.07

I'm a bit confused now why the SSE3 optimised app seems to be faster than the SSSE3 app. I'm wondering if these times are normal at this clock speed for a core duo and/or if I should be using a different optimised app.

Any insights would be appreciated :)

chidge

Did you consider the size of the WU's you were testing with?
ID: 664475 · Report as offensive
Profile Chidge
Avatar

Send message
Joined: 25 Jul 05
Posts: 5
Credit: 16,119,741
RAC: 0
United Kingdom
Message 664481 - Posted: 22 Oct 2007, 17:07:41 UTC

I looked at work units which all had the same claimed credit (54.xx)

Is there another way to judge the size of the WUs I can't see?
ID: 664481 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 664489 - Posted: 22 Oct 2007, 17:25:39 UTC - in response to Message 664466.  

Evening all :)

I have an Intel e2140 overclocked to 2.66GHz on which I am experimenting with optimised apps.

The CPU supports SSSE3 so I tried the appropriate app from Crunch3r's site. Below is sample of the CPU times this produced -
2.4V_Windows_x32_SSSE3
11,250.72 - 54.09
11,071.31 - 54.07
10,679.73 - 54.08
10,846.38 - 54.07

This didn't seem as fast as I expected so I've now tried the SSE3 app - cpu times below.
2.4V_Windows_x32_SSE3
9,155.36 - 54.26
9,779.33 - 54.07

I'm a bit confused now why the SSE3 optimised app seems to be faster than the SSSE3 app. I'm wondering if these times are normal at this clock speed for a core duo and/or if I should be using a different optimised app.

Any insights would be appreciated :)

chidge


You are not alone in noticing this effect. See here.
ID: 664489 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 664500 - Posted: 22 Oct 2007, 17:50:07 UTC - in response to Message 664466.  
Last modified: 22 Oct 2007, 18:07:03 UTC

I'm wondering if these times are normal at this clock speed for a core duo and/or if I should be using a different optimised app.

There is quite a lot of variation in the CPU time required by credit. There is also variation in CPU time required by Angle Range, but generally you can get a better answer to your questions in looking at the question by angle range.

As it happens I run a Core 2 Duo E6600 at 3.006 GHz.

Here are recent times for results very close to your credit claims:
your "9,779.33 - 54.07" result had an angle range of .409029 with an result ID of 20mr07ad.14970.3753.7.6.231_1

my 21mr07ab.25902.14387.8.6.146_0 result had an angle range of .408954, and required 5475.53 CPU seconds to claim 54.08 credits.

my 21mr07ab.32527.6207.15.6.159_1 result had an angle range of .409162 and required 5503.31 CPU seconds to claim 54.06 credits.

Were your host getting as much work done per clock as mine, multiplication by the clock frequency ratio would suggest CPU times on the order of 6215, instead of the greater than 9000 times you report.

Some possible reasons:
1. your CPU has a smaller cache. Older versions of SETI very clearly got major benefit from cache growth form 256K to 1M. I don't know if there is clear data for the current SETI (and optimized version) response to cache size changes in the 1M to 4M range, but would not be surprised if there is considerable effect.

2. you had bad luck in work units downloaded. At the same credit, and even at the same angle range, there is quite a bit of natural variation in the actual computational work required. At least part of this is because there is optional extra computation expended on candidate signals (and also on noise).

Here is a graph of CPU time vs. angle range for my two 3.006 GHz Core 2 Duo hosts, both of which are currently running



The data on this graph were gathered using a VBA application prepared by Fred_W in preparation for reviewing the performance of new application releases by Francois. Most of the points represent results computed in the last two weeks. All the results shown here processed were done by the 2.4 SSE3 ap from crunch3r's site.

[edit]in correcting typos I notice that Fred_W has also replied to your question [/edit]

ID: 664500 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 664503 - Posted: 22 Oct 2007, 17:54:49 UTC - in response to Message 664489.  
Last modified: 22 Oct 2007, 17:55:36 UTC

[quote]Evening all :)

I have an Intel e2140 overclocked to 2.66GHz on which I am experimenting with optimised apps.

The CPU supports SSSE3 so I tried the appropriate app from Crunch3r's site. Below is sample of the CPU times this produced -
2.4V_Windows_x32_SSSE3
11,250.72 - 54.09
11,071.31 - 54.07
10,679.73 - 54.08
10,846.38 - 54.07

This didn't seem as fast as I expected so I've now tried the SSE3 app - cpu times below.
2.4V_Windows_x32_SSE3
9,155.36 - 54.26
9,779.33 - 54.07

I'm a bit confused now why the SSE3 optimised app seems to be faster than the SSSE3 app. I'm wondering if these times are normal at this clock speed for a core duo and/or if I should be using a different optimised app.

Any insights would be appreciated :)

chidge


You are not alone in noticing this effect. See [url=http://setiathome.berkeley.edu/forum_thread


I have a E6600 that is running on Crunch3r's SSSE3. To test your theory, I just switched over to his SSE3 version. I have 10 plus WU's that are running the exact same angle range, 0.010551. I'll compare the ones that I already ran with SSSE3 with that angle range and compare them to the ones that will run with SSE3. I have one running now but it was changed mid stream so I'll take that one out of the mix when it reports.

ID: 664503 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 664611 - Posted: 22 Oct 2007, 21:38:04 UTC

I did a speed comparison of a number of applications for 32-bit Windows, Core 2 processor last month:


(direct link)

(further discussion in this thread)

At the time, I concentrated on apps specifically designated for Core 2, but I'd seen the speculation that SSE3 generic was at least as good. So, as it happens, I set host 3755243 back to work today with Crunch3r's 2.4V_Windows_x32_SSE3 (the host has been AWOL, doing a long timing run for Einstein, but I brought it back at lunchtime).

Give me a few days, and I'll plot the SSE3 results over the earlier Core 2 figures.
ID: 664611 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 664649 - Posted: 22 Oct 2007, 22:19:59 UTC - in response to Message 664611.  

I did a speed comparison of a number of applications for 32-bit Windows, Core 2 processor last month:


(direct link)

(further discussion in this thread)

At the time, I concentrated on apps specifically designated for Core 2, but I'd seen the speculation that SSE3 generic was at least as good. So, as it happens, I set host 3755243 back to work today with Crunch3r's 2.4V_Windows_x32_SSE3 (the host has been AWOL, doing a long timing run for Einstein, but I brought it back at lunchtime).

Give me a few days, and I'll plot the SSE3 results over the earlier Core 2 figures.


My first one reporting back with SSE3 was similar to the to SSSE3 group:



ID: 664649 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 664704 - Posted: 22 Oct 2007, 23:53:50 UTC - in response to Message 664611.  

I did a speed comparison of a number of applications for 32-bit Windows, Core 2 processor last month:


(direct link)

(further discussion in this thread)

At the time, I concentrated on apps specifically designated for Core 2, but I'd seen the speculation that SSE3 generic was at least as good. So, as it happens, I set host 3755243 back to work today with Crunch3r's 2.4V_Windows_x32_SSE3 (the host has been AWOL, doing a long timing run for Einstein, but I brought it back at lunchtime).

Give me a few days, and I'll plot the SSE3 results over the earlier Core 2 figures.


@Richard

You have a PM (and, hopefully, an eMail).
ID: 664704 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 664765 - Posted: 23 Oct 2007, 1:30:46 UTC

It's interesting to note that 2 of the first 3 WU's under SSE3 are faster than any of the 12 WU's with SSSE3.



ID: 664765 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 664773 - Posted: 23 Oct 2007, 1:38:24 UTC - in response to Message 664765.  

It's interesting to note that 2 of the first 3 WU's under SSE3 are faster than any of the 12 WU's with SSSE3.


200-250 seconds is around a 3% variance. Nothing to write home about... Are you sure you did not change clock speeds or memory timings between the two sets?
ID: 664773 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 664793 - Posted: 23 Oct 2007, 2:08:53 UTC - in response to Message 664773.  
Last modified: 23 Oct 2007, 2:17:44 UTC

It's interesting to note that 2 of the first 3 WU's under SSE3 are faster than any of the 12 WU's with SSSE3.


200-250 seconds is around a 3% variance. Nothing to write home about... Are you sure you did not change clock speeds or memory timings between the two sets?


Since the adjustment a couple of months ago reducing the cobblestones, I'll take whatever I can get back.

Not being a programer I would think that SSSE3 should be faster thatn SSE3, not the other way around.

Also, I didn't change anything between the two sets.

ID: 664793 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 664816 - Posted: 23 Oct 2007, 2:55:49 UTC - in response to Message 664793.  


Not being a programer I would think that SSSE3 should be faster thatn SSE3, not the other way around.


Newer is not always better... The extra instructions would need to be faster. It could be that the application is not as optimized as the SSE3 app, given that SSE3 is more mature?

I'd gander at the thread where Francois is doing some code changes. Most of the reading there is over my head, considering I never did get involved with low-level stuff, but it may give some indications as to the expectations vs. reality...

Brian
ID: 664816 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 664897 - Posted: 23 Oct 2007, 7:26:12 UTC - in response to Message 664793.  

It's interesting to note that 2 of the first 3 WU's under SSE3 are faster than any of the 12 WU's with SSSE3.


200-250 seconds is around a 3% variance. Nothing to write home about... Are you sure you did not change clock speeds or memory timings between the two sets?


Since the adjustment a couple of months ago reducing the cobblestones, I'll take whatever I can get back.

Not being a programer I would think that SSSE3 should be faster thatn SSE3, not the other way around.

Also, I didn't change anything between the two sets.


Well....I did some comparisons between the x64 SSE3 and SSSE3, and although any difference was veeerry small, I thought the SSE3 was doing a teeny bit better, and that's what I am running right now. It depends on the exact AR, and even all WUs with the same AR do not all take the same crunch time.
The only definative test would be to make a copy of a set of WUs with varying ARs, and crunch the exact same set of WUs with both apps to see which one did best. (Without reporting the test work to Seti).
One app may do better on some ARs, while the other may do better on other ARs. I seem to recall that even with the older Chicken apps doing linefeed work, there were some AR WUs that the stock app would actually beat the Chicken app, but on most others, the Chicken app won hands down.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 664897 · Report as offensive
Profile Chidge
Avatar

Send message
Joined: 25 Jul 05
Posts: 5
Credit: 16,119,741
RAC: 0
United Kingdom
Message 664977 - Posted: 23 Oct 2007, 9:19:07 UTC

Some really interesting posts in this thread - thanks :)

I've now tried the SSE2 app and that is seemingly returning very slightly faster results so far.
So in my very limited set of WUs SSE2>SSE3>SSSE3

Will look at the results properly once I've got more of the same angle range
ID: 664977 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 664992 - Posted: 23 Oct 2007, 10:24:14 UTC - in response to Message 664977.  

Some really interesting posts in this thread - thanks :)

I've now tried the SSE2 app and that is seemingly returning very slightly faster results so far.
So in my very limited set of WUs SSE2>SSE3>SSSE3

Will look at the results properly once I've got more of the same angle range

To get the absolute max out of your computer then you need to check all AR's and plot them similar to these graphs by Astro. There can be a few surprises as the overall performance is affected by cpu, size of L2 cache, memory bandwidth and memory. It is also worth noting that you might get fantastic results at one or two AR's which lead you to think that is best app, only to find that these AR's are rare and you have decreased the performance for the common AR units.
ID: 664992 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 665088 - Posted: 23 Oct 2007, 12:55:31 UTC - in response to Message 664897.  

It's interesting to note that 2 of the first 3 WU's under SSE3 are faster than any of the 12 WU's with SSSE3.


200-250 seconds is around a 3% variance. Nothing to write home about... Are you sure you did not change clock speeds or memory timings between the two sets?


Since the adjustment a couple of months ago reducing the cobblestones, I'll take whatever I can get back.

Not being a programer I would think that SSSE3 should be faster thatn SSE3, not the other way around.

Also, I didn't change anything between the two sets.


Well....I did some comparisons between the x64 SSE3 and SSSE3, and although any difference was veeerry small, I thought the SSE3 was doing a teeny bit better, and that's what I am running right now. It depends on the exact AR, and even all WUs with the same AR do not all take the same crunch time.
The only definative test would be to make a copy of a set of WUs with varying ARs, and crunch the exact same set of WUs with both apps to see which one did best. (Without reporting the test work to Seti).
One app may do better on some ARs, while the other may do better on other ARs. I seem to recall that even with the older Chicken apps doing linefeed work, there were some AR WUs that the stock app would actually beat the Chicken app, but on most others, the Chicken app won hands down.


I agree that performance varies by ARs and that I'm just testing one AR that I happened to have quite a few of at the time I read the posting about SSE3 appearing to perform better than SSSE3, therefore my results are only one piece in the pie, but interesting non the less.

Having said that, this morning I updated for the 4 WU's that ran last night. The SSE3 still shows a slight improvement on SSSE3 on this particular AR.

If ASTRO, or someone else, is going to do a comparison of the two through various ARs I'll leave this be after I have 4 more report so I can have 12 for each. Unfortunately I don't think I can make the time to go into more depth with various angles but if ASTRO, or someone else, isn't able to, I'll try to make a go at it.

Here's the update with the 4 additional WU's.



ID: 665088 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 665094 - Posted: 23 Oct 2007, 13:03:44 UTC

I see/hear you.

I've deleted all the data I had for others and stopped collecting that data(to much time and to many variables (OC rates, app changes etc)), so..You own your own, unless someone else wants to.
ID: 665094 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 666968 - Posted: 26 Oct 2007, 13:54:54 UTC
Last modified: 26 Oct 2007, 13:56:34 UTC



@ Richard Haselgrove

There are some results of your test-run of the different apps?

..because of your post here.


What would be the best opt. app for a Core2 CPU?


ID: 666968 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 667086 - Posted: 26 Oct 2007, 17:59:26 UTC - in response to Message 666968.  

There are some results of your test-run of the different apps?

..because of your post here.

What would be the best opt. app for a Core2 CPU?

Yes, I think I've run it for long enough now - got about 130 results for SSE2, and about 160 for SSE3, to add to my earlier charts.


(direct link)

- which I think confirms what people having been saying, that the current best app for Core 2s is Crunch3r's 2.4V SSE3.

For new readers: these timings have been taken from two identical Q6600 boxes, running Windows XP at stock speed with 2 GB of RAM. The earlier tests (pink, turquoise and yellow) were all applications designed for Core 2: the last, and seemingly best, version (brown) is the generic one for any Intel P4 or above with SSE3 capability.

[Note for you-know-who: I've stopped logging for the time being, but I'll keep the data - if you'd like to see any vectorised SSSE3 timings plotted over the top, you know where to send the PM]
ID: 667086 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Optimised applications question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.