KWSN Windows optimized science apps - Share your results and problems!

Message boards : Number crunching : KWSN Windows optimized science apps - Share your results and problems!
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 13 · Next

AuthorMessage
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65763
Credit: 55,293,173
RAC: 49
United States
Message 368787 - Posted: 15 Jul 2006, 23:29:01 UTC - in response to Message 368626.  
Last modified: 15 Jul 2006, 23:33:06 UTC

[quote]Oh no errors.....


If you use his new version it will also tell your True CPU speed.


I use CPU Z.

Me too just nice feature for others that compare...


I don't, CPU-Z is nice and all, I use CrystalCPUID and It's really useful(vcore adjustment from Windows XP), Not to mention CrystalCPUID 4.7.5.299 comes up almost after I click on It and CPU-Z tends to take almost half a minute or more to fully launch.

Oh and no errors here, Outside of the normal noisy WU's that everybody gets, Overclocked or not.

You got a link for CrystalCPUID? Is it shareware or freeware or something to buy. I'd like to check it out. Thanks.


It's Freeware and here's the Link: CrystalCPUID Click on the Dot where It says Download and then click on an image like this one as It has a download link in It:

There is also a Native XP x64 version too for those that use XP x64 like I do, Note the version with the image is for the XP x32 OS(32bit XP) and the XP x64 version is only for the XP x64 OS.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 368787 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 368849 - Posted: 16 Jul 2006, 0:14:31 UTC - in response to Message 368097.  

Most early Prescotts are quicker with SSE2 than SSE3. That yours is also quicker using generic SSE2 than P4-specific SSE2 is interesting, usually it's the other way around.

How do we know if we have an "ealier" Prescott?
me@rescam.org
ID: 368849 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 368923 - Posted: 16 Jul 2006, 2:00:32 UTC - in response to Message 368849.  

Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system.

I'm seing substantual gains in speed, and the SSE3 files run the fastest. You can check my claim by looking at some results.

http://setiathome.berkeley.edu/result.php?resultid=354471318
http://setiathome.berkeley.edu/result.php?resultid=354433231
http://setiathome.berkeley.edu/result.php?resultid=354371121

If you click the link by Computer ID, you will see that this an Athlon64 X2 3800+ Overclocked to 2400Mhz, running the P4 SSE3 client, with very good results. Also it is much faster than the generic SSE2 client.

I have run the automated testsuite (KWSN CPU test and benchmark)from the homepage that have the optimized clients available, and with the modified executables put into the right directory, I'm seeing these results:

*** Cut start ***
Starting tests. This will take a few minutes, please be patient!

Testing setiathome-5.15-kwsn-mmx.exe...
setiathome-5.15-kwsn-mmx.exe ran for 114 seconds

Testing setiathome-5.15-kwsn-sse.exe...
setiathome-5.15-kwsn-sse.exe ran for 60 seconds

Testing setiathome-5.15-kwsn-sse2.exe...
setiathome-5.15-kwsn-sse2.exe ran for 59 seconds

Testing setiathome-5.15-kwsn-sse2-p4.exe...
setiathome-5.15-kwsn-sse2-p4.exe ran for 51 seconds

Testing setiathome-5.15-kwsn-sse2-pm.exe...
setiathome-5.15-kwsn-sse2-pm.exe ran for 55 seconds

Testing setiathome-5.15-kwsn-sse3-p4.exe...
setiathome-5.15-kwsn-sse3-p4.exe ran for 50 seconds

Finished with test run!
*** Cut End ***

The Iccpatch program removes a GenuineIntel check in code produced by the latest Intel compilers. It's well documented at this page:

http://www.swallowtail.org/naughty-intel.html

I downlaoded the Iccpatch program from that page, but today there are only some scripts available.

If the developer of these optimized clients were to incorporate the patches described on this page, _all_ SSE2 and SSE3 capable machines could see further improvements, and not only the Intel based machines.
ID: 368923 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 368942 - Posted: 16 Jul 2006, 2:35:56 UTC - in response to Message 368923.  

Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system.

I'm seing substantual gains in speed, and the SSE3 files run the fastest. You can check my claim by looking at some results.

http://setiathome.berkeley.edu/result.php?resultid=354471318
http://setiathome.berkeley.edu/result.php?resultid=354433231
http://setiathome.berkeley.edu/result.php?resultid=354371121

If you click the link by Computer ID, you will see that this an Athlon64 X2 3800+ Overclocked to 2400Mhz, running the P4 SSE3 client, with very good results. Also it is much faster than the generic SSE2 client.

I have run the automated testsuite (KWSN CPU test and benchmark)from the homepage that have the optimized clients available, and with the modified executables put into the right directory, I'm seeing these results:

*** Cut start ***
Starting tests. This will take a few minutes, please be patient!

Testing setiathome-5.15-kwsn-mmx.exe...
setiathome-5.15-kwsn-mmx.exe ran for 114 seconds

Testing setiathome-5.15-kwsn-sse.exe...
setiathome-5.15-kwsn-sse.exe ran for 60 seconds

Testing setiathome-5.15-kwsn-sse2.exe...
setiathome-5.15-kwsn-sse2.exe ran for 59 seconds

Testing setiathome-5.15-kwsn-sse2-p4.exe...
setiathome-5.15-kwsn-sse2-p4.exe ran for 51 seconds

Testing setiathome-5.15-kwsn-sse2-pm.exe...
setiathome-5.15-kwsn-sse2-pm.exe ran for 55 seconds

Testing setiathome-5.15-kwsn-sse3-p4.exe...
setiathome-5.15-kwsn-sse3-p4.exe ran for 50 seconds

Finished with test run!
*** Cut End ***

The Iccpatch program removes a GenuineIntel check in code produced by the latest Intel compilers. It's well documented at this page:

http://www.swallowtail.org/naughty-intel.html

I downlaoded the Iccpatch program from that page, but today there are only some scripts available.

If the developer of these optimized clients were to incorporate the patches described on this page, _all_ SSE2 and SSE3 capable machines could see further improvements, and not only the Intel based machines.


Be interesting if this works for simon...My AMD would love to go faster...

Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 368942 · Report as offensive
Marshall

Send message
Joined: 17 May 99
Posts: 5
Credit: 97,879
RAC: 0
United States
Message 369082 - Posted: 16 Jul 2006, 5:40:14 UTC

Hi Simon,
Thanks for the new optimized builds!!

I don't have exact "proof" with an exact same workunit comparison, but the C3R SSE3 application appears to be a small percentage faster, maybe 10% on my Prescott machine than the KWSN version. I could spend a lot of CPU time with real workunits proving this, but it wouldn't make any difference in terms of how to go forward with even better code. The C3R code is dead and we need to move forward with new code.

This small difference is not THAT important. The biggest thing between the version immediately preceeding Seti-enhanced and Seti-enhanced is that it makes little difference if I run both of my HT cores or not. This observation is independent of KWSN vs C3R compilation of seti-enhanced.

Before if I ran SETI on both core threads, both would run maybe 25-30% slower, but at the end of the day, I would get a lot more work done because I was running 2 work units at once. This doesn't appear to be true anymore.

I don't see any clear advantage to running both cores anymore. For example, if I run one thread, maybe it takes 23,000 seconds. If I run two threads at once, maybe it takes about 45,000 seconds for both to complete - not much difference as oppposed to running a single thread serially.

I suspect that my "relative wimpy 3 GHz Prescott" machine is memory bound now. If that is indeed true, then a further fairly dramatic performance increase is probably possible by re-organizing the code to take better advantage of the local processor cache. This is of course a far greater effort than simply compiling existing code with a better complier/libraries.

However the results of such a cache optimization could be huge when magnified over a large number of machines.




ID: 369082 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 369103 - Posted: 16 Jul 2006, 6:13:32 UTC

I tried posting this a few hours ago, but the new forum software seems to be hanging on to my posts until I make another post!
After only a couple of days using Simon's new optimised app, I am seeing WU crunch times down to as little as 60% of the time taken with the standard app.
Well done, mate!
ID: 369103 · Report as offensive
Natsuo Tsuji

Send message
Joined: 18 May 02
Posts: 24
Credit: 1,519,328
RAC: 0
Japan
Message 369117 - Posted: 16 Jul 2006, 6:33:09 UTC

Simon, I don't know whether this post is off topic for this thread or not, but please remember to make a FreeBSD optimized client.
ID: 369117 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 369155 - Posted: 16 Jul 2006, 7:49:52 UTC - in response to Message 368923.  
Last modified: 16 Jul 2006, 8:08:16 UTC

Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system.

I'm seing substantual gains in speed, and the SSE3 files run the fastest. You can check my claim by looking at some results.

[...]
The Iccpatch program removes a GenuineIntel check in code produced by the latest Intel compilers. It's well documented at this page:

http://www.swallowtail.org/naughty-intel.html

I downlaoded the Iccpatch program from that page, but today there are only some scripts available.

If the developer of these optimized clients were to incorporate the patches described on this page, _all_ SSE2 and SSE3 capable machines could see further improvements, and not only the Intel based machines.

This is certainly a neat trick (I knew there was a way to do this), but it's also against the restrictions Intel sent me when I bought ICC/IPP licenses.

I quote:
B. You may NOT: (i) use or copy the Materials except as provided in this Agreement; (ii) rent or lease the Materials to any third party; (iii) assign this Agreement or transfer the Materials without the express written consent of Intel; (iv) modify, adapt, or translate the Materials in whole or in part except as provided in this Agreement;

D. DISTRIBUTION: Distribution of the Materials is also subject to the following limitations: You (i) shall be solely responsible to your customers for any update or support obligation or other liability which may arise from the distribution, (ii) shall not make any statement that your product is "certified", or that its performance is guaranteed, by Intel, (iii) shall not use Intel's name or trademarks to market your product without written permission, (iv) shall prohibit disassembly and reverse engineering, (v) shall not publish reviews of Materials designated as beta without written permission by Intel, and (vi) shall indemnify, hold harmless, and defend Intel and its suppliers from and against any claims or lawsuits, including attorney's fees, that arise or result from your distribution of any product.

9. TERMINATION OF THIS LICENSE: This Agreement becomes effective on the date you accept this Agreement and will continue until terminated as provided for in this Agreement. If you are using the Materials under the control of a time-limited license, for example an Evaluation License, this Agreement terminates without notice on the last day of the time period, which is controlled by the license key code for the Materials. Intel may terminate this license at any time if you are in breach of any of its terms and conditions. Upon termination, you will immediately return to Intel or destroy the Materials and all copies thereof.


So by doing that and offering it for download, I'd be opening myself up for legal retribution and license nullification from Intel. Guess what, I won't do it :o)

That's not to say I'll keep you from doing it, but no, you will not be getting a binary like that from me directly, sorry. Maybe I'll insert a link somewhere with a hint, but that's as far as I can go, and needs a disclaimer anyway.

Interesting results, though :o) I knew those routines weren't Intel-specific...

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 369155 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 369166 - Posted: 16 Jul 2006, 8:12:28 UTC - in response to Message 368849.  

Most early Prescotts are quicker with SSE2 than SSE3. That yours is also quicker using generic SSE2 than P4-specific SSE2 is interesting, usually it's the other way around.

How do we know if we have an "ealier" Prescott?

Usually when CPU-Z tells you it's on Socket 478 instead of LGA775 :o)

HTH,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 369166 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 369190 - Posted: 16 Jul 2006, 9:34:58 UTC

Hm,

regarding ICCPatch, I wonder whether there is not a legal way to do it.

The license specifically states "you will prohibit disassembly and reverse engineering".

Patching an executable is not disassembling it, neither is it reverse-engineering it.

What it is doing is performing a "bit transplant", basically. It wasn't even necessary to reverse-engineer Intel products to achieve this, but instead was realized by using a debugger and analyzing assembly.

So in this regard, I'll be emailing someone at Intel I'm in contact with and asking them whether this will nullify my license. I really don't want to go on the assumption it's okay.

However, I have tested the generic SSE2 vs. the P4-only patched SSE2 executable on an AMD machine, and boy, was it quick. 69 seconds gen. SSE2, 59 P4-SSE2 patched. Yowzer...

Definitely worth investigating. I'll also be checking something I'm wondering about myself regarding speed differences on AMDs...*grin*

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 369190 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 369202 - Posted: 16 Jul 2006, 10:00:55 UTC - in response to Message 369190.  

Hm,

regarding ICCPatch, I wonder whether there is not a legal way to do it.

The license specifically states "you will prohibit disassembly and reverse engineering".

Patching an executable is not disassembling it, neither is it reverse-engineering it.


I once tried to make a patch for icc, ipp library not to detect cpu brand, but looking at EULA, I abandoned. In order to make a patch, you have to disassemble the library, and it's the bottleneck. Technically, it's easy, but you have to disassemble the code. Maybe you may want to ask Intel at Premier Support or somewhere.
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 369202 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 369211 - Posted: 16 Jul 2006, 10:15:53 UTC
Last modified: 16 Jul 2006, 10:33:29 UTC

Thanks Tetsuji,

I read that thread. Still, I'm not sure patching an executable falls under the license in the EULA - they specifically state you must not change their libraries, but short from disassembling the executable (debugging is not disassembling, is it?), I don't see any regulations about patching executables themselves.

Anyway, I have written to the author of the patch, and I will write to a contact at Intel next before I do anything - always better to be safe than to be sorry.

Regards,
Simon.

P.S. Er, and I just found out someone else did that for his binaries. Tsk tsk :o)
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 369211 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 369240 - Posted: 16 Jul 2006, 11:45:56 UTC - in response to Message 369117.  
Last modified: 16 Jul 2006, 12:25:14 UTC

Simon, I don't know whether this post is off topic for this thread or not, but please remember to make a FreeBSD optimized client.

Thanks for reminding me :o)

Tetsuji Maverick Rai already said that ICC was available as a FreeBSD port - I wonder whether I can just use my static Linux executable (which also include IPP) to work on FreeBSD (with brandelf) or whether I have to compile a native executable (and if I can do both, which is quicker).

When I find some time, I'll test it out - downloading some installation ISOs now.

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 369240 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 369258 - Posted: 16 Jul 2006, 12:41:21 UTC
Last modified: 16 Jul 2006, 12:49:54 UTC

Simon - First, many thanks for the incredible amount of work you're doing here. This definitely goes way “above and beyond”!

I downloaded the Benchmark Test Set last night and am in the process of Benching four of my machines, and have a quick question: Have you updated the bench applications in that package to match your latest and greatest release? (1.3 I guess.) Or would the upgrade to 1.3 from your original release not affect execution speed?

Thanks again!

D'Oh!! I just read on your web page that the package includes the r1.3 apps. NEVER MIND
ID: 369258 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 369305 - Posted: 16 Jul 2006, 14:13:34 UTC - in response to Message 368354.  

Oh no errors

354214050 84866230 14 Jul 2006 22:09:50 UTC 15 Jul 2006 12:38:31 UTC Over Success Done 3,485.72 13.12 0.00

Computer ID 2493813
Report deadline 29 Jul 2006 12:58:54 UTC
CPU time 3485.71875
stderr out <core_client_version>5.4.9</core_client_version>
[...]

I've just downloaded the WU that errored for you from the server and will try crunching it offline with my Athlon64 3500+ using the same version (xW/gen. SSE2). Let's see what happens :o)

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 369305 · Report as offensive
Natsuo Tsuji

Send message
Joined: 18 May 02
Posts: 24
Credit: 1,519,328
RAC: 0
Japan
Message 369308 - Posted: 16 Jul 2006, 14:20:27 UTC - in response to Message 369240.  
Last modified: 16 Jul 2006, 14:21:05 UTC

Thanks for reminding me :o)

Tetsuji Maverick Rai already said that ICC was available as a FreeBSD port - I wonder whether I can just use my static Linux executable (which also include IPP) to work on FreeBSD (with brandelf) or whether I have to compile a native executable (and if I can do both, which is quicker).

When I find some time, I'll test it out - downloading some installation ISOs now.

Regards,
Simon.

I've tried both your SSE and SSE2 Linux worker on FreeBSD with brandelf, but neither of them worked.
So, please compile a native executable.
ID: 369308 · Report as offensive
Profile Al
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 5832
Credit: 401,935
RAC: 0
Serbia
Message 369321 - Posted: 16 Jul 2006, 14:32:57 UTC - in response to Message 369305.  

Oh no errors

354214050 84866230 14 Jul 2006 22:09:50 UTC 15 Jul 2006 12:38:31 UTC Over Success Done 3,485.72 13.12 0.00

Computer ID 2493813
Report deadline 29 Jul 2006 12:58:54 UTC
CPU time 3485.71875
stderr out <core_client_version>5.4.9</core_client_version>
[...]

I've just downloaded the WU that errored for you from the server and will try crunching it offline with my Athlon64 3500+ using the same version (xW/gen. SSE2). Let's see what happens :o)

Regards,
Simon.

Super:)
I cant wait to se the result.
Scorpions - Wind Of Change
ID: 369321 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 369335 - Posted: 16 Jul 2006, 14:50:54 UTC - in response to Message 369082.  

I suspect that my "relative wimpy 3 GHz Prescott" machine is memory bound now. If that is indeed true, then a further fairly dramatic performance increase is probably possible by re-organizing the code to take better advantage of the local processor cache. This is of course a far greater effort than simply compiling existing code with a better complier/libraries.

However the results of such a cache optimization could be huge when magnified over a large number of machines.

Alex Kan added a transposition of data arrays in his Mac G4 and G5 optimized versions which is more cache friendly, and that code will be in the stock 5.17 versions. Some test builds I did with DevC++/MinGW seemed to give about a 10% speed improvement. But Tetsuji said someplace that it didn't seem to work when compiled with VC++/ICC and I think Simon has had the same lack of improvement.

I do hope that those with good ideas on how to improve the program will get the sources and contribute the needed time and effort.
                                                       Joe
ID: 369335 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 369337 - Posted: 16 Jul 2006, 14:55:26 UTC

Josef is correct - in fact, the 5.17 builds seemed slightly slower using ICC/IPP.

Well, in my effort to reproduce Dr.'s error, BOINC somehow managed to stay running in the background even though I had exited the manager (and am running single user mode on that machine), but I didn't notice it. So when I thought I was cancelling the science app doing my benchmark because I had the wrong WU, I actually cancelled two WUs on BOINC. Bah :o) No computation error, but that's what it shows up as.

Anyway, I'll be running that WU now.

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 369337 · Report as offensive
Alex Kan
Volunteer developer

Send message
Joined: 4 Dec 03
Posts: 127
Credit: 29,269
RAC: 0
United States
Message 369465 - Posted: 16 Jul 2006, 17:04:10 UTC - in response to Message 369335.  
Last modified: 16 Jul 2006, 17:07:14 UTC

Alex Kan added a transposition of data arrays in his Mac G4 and G5 optimized versions which is more cache friendly, and that code will be in the stock 5.17 versions. Some test builds I did with DevC++/MinGW seemed to give about a 10% speed improvement. But Tetsuji said someplace that it didn't seem to work when compiled with VC++/ICC and I think Simon has had the same lack of improvement.

From looking at the seti_cvs mailing list archives, it seems that until June 27, transpose functions were being called with the wrong dimensions, causing the transposed PoT array to contain garbage. I still haven't tested 5.17 myself, so someone else will have to verify that the code in CVS behaves correctly now.

In general, without a fast matrix transpose implementation, the speed gains from the improved cache performance of a transposed PoT array may be negated by the amount of time it actually takes to transpose the array. My clients use a general out-of-place matrix transpose provided by Apple with its vDSP performance libraries. Since most other optimized clients use IPP, it may be worth looking into calling ippmTranpose instead of Eric's provided functions.
ID: 369465 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 13 · Next

Message boards : Number crunching : KWSN Windows optimized science apps - Share your results and problems!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.