Version 3.4 of Faster SETI cruncher for Linux

Message boards : Number crunching : Version 3.4 of Faster SETI cruncher for Linux
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 189331 - Posted: 15 Nov 2005, 21:47:07 UTC

Part II: To HT or not to HT: that is the question.

Test machine: HT testing on a single-P4 3.2GHz Prescott sse sse2 pni support. 2GB RAM.

Result: time spent on a reference workunit in minutes.

2 processes in parallel:
setiathome_SSE3-naparst-r3.4 55
setiathome_SSE2-naparst-r3.4 56

1 process in parallel:
setiathome_SSE3-naparst-r3.4 29

So, the answer is it depends.
I recommend leaving "Number of processors" at 2 in the Boinc preferences.

Harold indicated that FSB is the bottleneck. For the dual-Xeon, the shared FSB seems to be much more of a limitation compared to a single-P4.

ID: 189331 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 189380 - Posted: 16 Nov 2005, 0:27:10 UTC - in response to Message 189331.  


Harold indicated that FSB is the bottleneck. For the dual-Xeon, the shared FSB seems to be much more of a limitation compared to a single-P4.


Yes, because you now have four simultaneous processes sharing that bandwidth instead of two.

Interesting that even the single HT procesor running two processes gains very little. 2 WUs in HT mode in 56mins versus 2 single WUs in non-HT mode in 2 x 29 = 58 mins.

It would appear that hyperthreading actually gains you very little (~3%).

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 189380 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 189386 - Posted: 16 Nov 2005, 1:10:59 UTC - in response to Message 189380.  


Harold indicated that FSB is the bottleneck. For the dual-Xeon, the shared FSB seems to be much more of a limitation compared to a single-P4.


Yes, because you now have four simultaneous processes sharing that bandwidth instead of two.

Interesting that even the single HT procesor running two processes gains very little. 2 WUs in HT mode in 56mins versus 2 single WUs in non-HT mode in 2 x 29 = 58 mins.

It would appear that hyperthreading actually gains you very little (~3%).

Ned

Indeed. Also note that Boinc benchmarking on HT computers is nearly meaningless. Boinc benchmark uses one process and symmetric multi-processing to the specified number of processors, while Seti application runs in a single process/single thread.

I found that running single-P4/HT with 2 "Boinc CPUs" produces better claimed credit for the nearly equal (within 3%) amount of work. A workunit claims about 15-18 credit with Harold's ICC-for-P4 boinc.

Running dual-Xeon/HT with 4 "Boinc CPUs" is so much slower that it is not worth the credit. A work unit typically claims about 10 credit when running with 2 "Boinc CPUs". Same boinc client.



ID: 189386 · Report as offensive
Profile JavaPersona
Volunteer tester

Send message
Joined: 4 Jun 99
Posts: 112
Credit: 471,529
RAC: 0
United States
Message 189563 - Posted: 16 Nov 2005, 17:23:26 UTC - in response to Message 189380.  


Harold indicated that FSB is the bottleneck. For the dual-Xeon, the shared FSB seems to be much more of a limitation compared to a single-P4.


Yes, because you now have four simultaneous processes sharing that bandwidth instead of two.

Interesting that even the single HT procesor running two processes gains very little. 2 WUs in HT mode in 56mins versus 2 single WUs in non-HT mode in 2 x 29 = 58 mins.

It would appear that hyperthreading actually gains you very little (~3%).

Ned


I do not dispute 3% is very little. But taken in the context of machine(s) crunching 24/7 it adds up. If I offer you $100 today or $103 it may not seem like a big difference. But $100 everyday versus $103 everyday and the propositions do not compare.
ID: 189563 · Report as offensive
HDL
Volunteer tester

Send message
Joined: 20 Apr 05
Posts: 27
Credit: 11,577,352
RAC: 0
United Kingdom
Message 190151 - Posted: 18 Nov 2005, 8:45:34 UTC

Which is the best SETI client for Pentium with HT? Hans Dorn using TMR 4th Nov client can crunch two work units at around 2100s (35 mins at 3.2GHz). From what I saw below, Harold's client can crunch 2 work unit at 56 mins.
ID: 190151 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 190163 - Posted: 18 Nov 2005, 10:23:14 UTC - in response to Message 190151.  
Last modified: 18 Nov 2005, 10:23:31 UTC

Which is the best SETI client for Pentium with HT? Hans Dorn using TMR 4th Nov client can crunch two work units at around 2100s (35 mins at 3.2GHz). From what I saw below, Harold's client can crunch 2 work unit at 56 mins.


Whoops. Thats just me being too lazy to fix the credits. The client I'm using is similar to Harold's.

I compiled my own version.

Regards Hans
ID: 190163 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 190164 - Posted: 18 Nov 2005, 10:23:46 UTC - in response to Message 190151.  

Which is the best SETI client for Pentium with HT? Hans Dorn using TMR 4th Nov client can crunch two work units at around 2100s (35 mins at 3.2GHz). From what I saw below, Harold's client can crunch 2 work unit at 56 mins.


Harolds 3.4 client is the fastes setiathome app. for P4.



Join BOINC United now!
ID: 190164 · Report as offensive
HDL
Volunteer tester

Send message
Joined: 20 Apr 05
Posts: 27
Credit: 11,577,352
RAC: 0
United Kingdom
Message 190169 - Posted: 18 Nov 2005, 10:39:43 UTC - in response to Message 190164.  

Which is the best SETI client for Pentium with HT? Hans Dorn using TMR 4th Nov client can crunch two work units at around 2100s (35 mins at 3.2GHz). From what I saw below, Harold's client can crunch 2 work unit at 56 mins.


Harolds 3.4 client is the fastes setiathome app. for P4.



Then why is the big difference between Hans work unit time and that of Michael37? 35 mins vs 56 mins?

Is Hans' 3.2GHz computer overclocked?
ID: 190169 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 190180 - Posted: 18 Nov 2005, 11:30:39 UTC - in response to Message 190169.  
Last modified: 18 Nov 2005, 11:41:49 UTC

Which is the best SETI client for Pentium with HT? Hans Dorn using TMR 4th Nov client can crunch two work units at around 2100s (35 mins at 3.2GHz). From what I saw below, Harold's client can crunch 2 work unit at 56 mins.


Harolds 3.4 client is the fastes setiathome app. for P4.



Then why is the big difference between Hans work unit time and that of Michael37? 35 mins vs 56 mins?

Is Hans' 3.2GHz computer overclocked?


Hi,

I'm not sure which host you're referring to :o)

I just used Andy's Page to get some averages:

P4 3.2 (1MB L2) : 3200s
P4 3.2 (2MB L2) : 2550s
P4D 3.2@3.6 : 1810s

Pentium-M 2.0@2.1 : 2100s


Regards Hans

P.S: The P4D absolutely rocks. It overclocks remarkably well, too :o)


ID: 190180 · Report as offensive
HDL
Volunteer tester

Send message
Joined: 20 Apr 05
Posts: 27
Credit: 11,577,352
RAC: 0
United Kingdom
Message 190189 - Posted: 18 Nov 2005, 11:54:48 UTC - in response to Message 190180.  
Last modified: 18 Nov 2005, 11:59:53 UTC

Which is the best SETI client for Pentium with HT? Hans Dorn using TMR 4th Nov client can crunch two work units at around 2100s (35 mins at 3.2GHz). From what I saw below, Harold's client can crunch 2 work unit at 56 mins.


Harolds 3.4 client is the fastes setiathome app. for P4.



Then why is the big difference between Hans work unit time and that of Michael37? 35 mins vs 56 mins?

Is Hans' 3.2GHz computer overclocked?


Hi,

I'm not sure which host you're referring to :o)

I just used Andy's Page to get some averages:

P4 3.2 (1MB L2) : 3200s
P4 3.2 (2MB L2) : 2550s
P4D 3.2@3.6 : 1810s

Pentium-M 2.0@2.1 : 2100s


Regards Hans

P.S: The P4D absolutely rocks. It overclocks remarkably well, too :o)



The P4 3.2 (2MB L2) : 2550s is about 42.5 mins (for 2 work unit).
The P4 3.2 (1MB L2) : 3250s is about 53.5 mins (for 2 work unit).

It is still better than Michael37's 56 mins (5%). How do you achieve that? I hope the figure you give is averaged out of many units so it will be similar to the reference unit.

P4D is excellent. I have a P4D XE 3.2GHz, it is overclocked to 3.9GHz. 47 mins for 4 work units. RAC is 2354 at momoent.
I am thinking of buying a P4D 3.0. It is 1/3 price of the P4D XE 3.2. I can overclock it to around 3.6 GHz. That will be 60 mins for 4 work unit (according to your figure above). Performance/price ratio is much better.

Best regards

ID: 190189 · Report as offensive
Profile S.L.Chia
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 98
Credit: 2,667,122
RAC: 1
Malaysia
Message 190197 - Posted: 18 Nov 2005, 12:20:41 UTC
Last modified: 18 Nov 2005, 12:21:32 UTC

my p4 3.0 prescott overclocked to 3.35 has achieve 58min 2wu....
it seems a bit slow...
ID: 190197 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 190335 - Posted: 18 Nov 2005, 21:00:59 UTC

To continue my topic, the next questions is Eh? What is Hyperthreading?

Test machine: dual AMD Opteron(tm) Processor 242 1.6GHz, 1MB L2 cache, 1GB RAM

Result: time spent on a reference workunit in minutes.

Kernel: 2.4.21-37.ELsmp 32-bit for athlon
2 processes in parallel:
AthlonXP_SSE_FFTW3_Caching_V2.09s w/wisdom 68
setiathome_SSE2-naparst-r3.4 58

Kernel: 2.6.9-22.0.1.ELsmp 64-bit for Opteron (not the default "Generic x86_64")
2 processes in parallel:
AthlonXP_SSE_FFTW3_Caching_V2.09s w/wisdom 65
setiathome_SSE2-naparst-r3.4 59

Has anyone made a FFTW/SSE2 version of the client? I would love to benchmark it too.

Since we are at it, a couple of boinc benchmarks:

version 4.72
boinc_amd64_immoral (naparst) 2177/6002
boinc_naparst_p4 2101/4061
Boinc_4.72_AthlonXP 1799/3319

version 4.19
ned's AMD64 64-bit 1452/5524
ned's AMD64 32-bit 1699/4641



ID: 190335 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 190357 - Posted: 18 Nov 2005, 22:12:51 UTC - in response to Message 190335.  

To continue my topic, the next questions is Eh? What is Hyperthreading?

Test machine: dual AMD Opteron(tm) Processor 242 1.6GHz, 1MB L2 cache, 1GB RAM

Result: time spent on a reference workunit in minutes.

Kernel: 2.4.21-37.ELsmp 32-bit for athlon
2 processes in parallel:
AthlonXP_SSE_FFTW3_Caching_V2.09s w/wisdom 68
setiathome_SSE2-naparst-r3.4 58

Kernel: 2.6.9-22.0.1.ELsmp 64-bit for Opteron (not the default "Generic x86_64")
2 processes in parallel:
AthlonXP_SSE_FFTW3_Caching_V2.09s w/wisdom 65
setiathome_SSE2-naparst-r3.4 59

Has anyone made a FFTW/SSE2 version of the client? I would love to benchmark it too.

Since we are at it, a couple of boinc benchmarks:

version 4.72
boinc_amd64_immoral (naparst) 2177/6002
boinc_naparst_p4 2101/4061
Boinc_4.72_AthlonXP 1799/3319

version 4.19
ned's AMD64 64-bit 1452/5524
ned's AMD64 32-bit 1699/4641



To answer you question about FFTW/SSE2:

It doesn't work because we are using FFTW3 in single precision (SSE) and SSE2 is double precision and we don't need that ;)

P.S.
You should also take a look at my page for boinc clients i guess they are the fastest on linux (exept the immoral amd64)




Join BOINC United now!
ID: 190357 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 190376 - Posted: 18 Nov 2005, 23:34:22 UTC - in response to Message 190357.  


To answer you question about FFTW/SSE2:

It doesn't work because we are using FFTW3 in single precision (SSE) and SSE2 is double precision and we don't need that ;)

I see. Well, the AthlonXP FFTW/SSE build was about 15% slower.


P.S.
You should also take a look at my page for boinc clients i guess they are the fastest on linux (exept the immoral amd64)

For boinc v4 clients, ICC/IPP Harold's build beat the GCC AthlonXP one. I bet SSE2 definitely helps with Whetstone.

I'll try boinc v5 clients next.

ID: 190376 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 190379 - Posted: 18 Nov 2005, 23:45:41 UTC - in response to Message 190335.  


Has anyone made a FFTW/SSE2 version of the client? I would love to benchmark it too.



I built a 32-bit SSE2 enabled seti client using FFTW3 for athlon64 based on Harold's source tree and benchmarked it on one of Harold's AMD X2 systems. It was slower than Harold's SSE2 clients.

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 190379 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 190468 - Posted: 19 Nov 2005, 2:39:31 UTC

@Crunch3r

First of all, thanks for putting together optimized boinc and seti for itanium. Even though I think that Itanium was Intel biggest mistake in the past years, I have one system with Itanium 2 running RHEL4.

I tried using your builds for Itanium, and they work. Again, thank you. However, the seti application looks quite dated.

Two questions: Was it built with ICC or GCC? Considering I can't even build my own boinc on EM64T with icc, I won't even try building it on IA64. Do you want to try building Harold's client on Itanium? Do you want to give me some pointers?

michael37

ID: 190468 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 190568 - Posted: 19 Nov 2005, 8:54:09 UTC - in response to Message 190468.  

@Crunch3r

First of all, thanks for putting together optimized boinc and seti for itanium. Even though I think that Itanium was Intel biggest mistake in the past years, I have one system with Itanium 2 running RHEL4.

I tried using your builds for Itanium, and they work. Again, thank you. However, the seti application looks quite dated.

Two questions: Was it built with ICC or GCC? Considering I can't even build my own boinc on EM64T with icc, I won't even try building it on IA64. Do you want to try building Harold's client on Itanium? Do you want to give me some pointers?

michael37


First of all the ia64 was build with gcc and sure it is a bit outdated. If you want anew client i can build one for you.

P.S.

Would be interesting if you could post processing times on you itanium ( your computers are hidden so i can't look myself)



Join BOINC United now!
ID: 190568 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 190642 - Posted: 19 Nov 2005, 13:48:03 UTC - in response to Message 190568.  


First of all the ia64 was build with gcc and sure it is a bit outdated. If you want anew client i can build one for you.

That would be nice. Is there a reason why you haven't tried icc and ipp? Both are available for Itaniums.


Would be interesting if you could post processing times on you itanium ( your computers are hidden so i can't look myself)

Here it is. The average time is 9,757s.

ID: 190642 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 190661 - Posted: 19 Nov 2005, 15:24:23 UTC - in response to Message 190642.  
Last modified: 19 Nov 2005, 15:25:35 UTC


First of all the ia64 was build with gcc and sure it is a bit outdated. If you want anew client i can build one for you.

That would be nice. Is there a reason why you haven't tried icc and ipp? Both are available for Itaniums.


Would be interesting if you could post processing times on you itanium ( your computers are hidden so i can't look myself)

Here it is. The average time is 9,757s.


I think i can cut processing times at least by half with a new client :-)
My Alpha ev67@600MHz went down from 30000sec. per wu to 3h30min running the 3.5 client!

That should be possible with the itanium too.
I guess 2h per wu or less is possible. I'll post a link to the new client when i've build it.





Join BOINC United now!
ID: 190661 · Report as offensive
Profile rattelschneck
Avatar

Send message
Joined: 14 Apr 01
Posts: 435
Credit: 842,179
RAC: 0
Germany
Message 190675 - Posted: 19 Nov 2005, 16:42:13 UTC


@Crunch3r

Hi!

I just thought this morning it ain't a bad idea to run some test of your optimized seti clients against the reference work unit on my Athlon XP 3200+.
So, I tested your clients V2.7s, V2.08s and V2.09s on this host. The tests finished some minutes ago, but the result is somewhat confusing to me.

Here are the results:
AthlonXP_SSE_FFTW3_Caching_V2.07s: 5808.850943
AthlonXP_SSE_FFTW3_Caching_V2.08s: 5508.440612
AthlonXP_SSE_FFTW3_Caching_V2.09s: 8814.406029

Do you know the reasons for the 2.09 slower than 2.08 or 2.07 (on my host)?

The Hardware is an Athlon XP 3200+ , no overclocking, 512 MB RAM
The OS is Debian 3.1 kernel 2.6.8-2-386


Would be nice if you could give me some enlightment ;-)

Regards
rattelschneck

ID: 190675 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Version 3.4 of Faster SETI cruncher for Linux


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.