KWSN Windows optimized science apps - Share your results and problems!

Author	Message
zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65763 Credit: 55,293,173 RAC: 49	Message 368787 - Posted: 15 Jul 2006, 23:29:01 UTC - in response to Message 368626. Last modified: 15 Jul 2006, 23:33:06 UTC [quote]Oh no errors..... If you use his new version it will also tell your True CPU speed. I use CPU Z. Me too just nice feature for others that compare... I don't, CPU-Z is nice and all, I use CrystalCPUID and It's really useful(vcore adjustment from Windows XP), Not to mention CrystalCPUID 4.7.5.299 comes up almost after I click on It and CPU-Z tends to take almost half a minute or more to fully launch. Oh and no errors here, Outside of the normal noisy WU's that everybody gets, Overclocked or not. You got a link for CrystalCPUID? Is it shareware or freeware or something to buy. I'd like to check it out. Thanks. It's Freeware and here's the Link: CrystalCPUID Click on the Dot where It says Download and then click on an image like this one as It has a download link in It: There is also a Native XP x64 version too for those that use XP x64 like I do, Note the version with the image is for the XP x32 OS(32bit XP) and the XP x64 version is only for the XP x64 OS. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 368787 ·

Misfit Volunteer tester Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0	Message 368849 - Posted: 16 Jul 2006, 0:14:31 UTC - in response to Message 368097. Most early Prescotts are quicker with SSE2 than SSE3. That yours is also quicker using generic SSE2 than P4-specific SSE2 is interesting, usually it's the other way around. How do we know if we have an "ealier" Prescott? me@rescam.org ID: 368849 ·

Karsten Vinding Volunteer tester Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11	Message 368923 - Posted: 16 Jul 2006, 2:00:32 UTC - in response to Message 368849. Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system. I'm seing substantual gains in speed, and the SSE3 files run the fastest. You can check my claim by looking at some results. http://setiathome.berkeley.edu/result.php?resultid=354471318 http://setiathome.berkeley.edu/result.php?resultid=354433231 http://setiathome.berkeley.edu/result.php?resultid=354371121 If you click the link by Computer ID, you will see that this an Athlon64 X2 3800+ Overclocked to 2400Mhz, running the P4 SSE3 client, with very good results. Also it is much faster than the generic SSE2 client. I have run the automated testsuite (KWSN CPU test and benchmark)from the homepage that have the optimized clients available, and with the modified executables put into the right directory, I'm seeing these results: * Cut start * Starting tests. This will take a few minutes, please be patient! Testing setiathome-5.15-kwsn-mmx.exe... setiathome-5.15-kwsn-mmx.exe ran for 114 seconds Testing setiathome-5.15-kwsn-sse.exe... setiathome-5.15-kwsn-sse.exe ran for 60 seconds Testing setiathome-5.15-kwsn-sse2.exe... setiathome-5.15-kwsn-sse2.exe ran for 59 seconds Testing setiathome-5.15-kwsn-sse2-p4.exe... setiathome-5.15-kwsn-sse2-p4.exe ran for 51 seconds Testing setiathome-5.15-kwsn-sse2-pm.exe... setiathome-5.15-kwsn-sse2-pm.exe ran for 55 seconds Testing setiathome-5.15-kwsn-sse3-p4.exe... setiathome-5.15-kwsn-sse3-p4.exe ran for 50 seconds Finished with test run! * Cut End * The Iccpatch program removes a GenuineIntel check in code produced by the latest Intel compilers. It's well documented at this page: http://www.swallowtail.org/naughty-intel.html I downlaoded the Iccpatch program from that page, but today there are only some scripts available. If the developer of these optimized clients were to incorporate the patches described on this page, _all_ SSE2 and SSE3 capable machines could see further improvements, and not only the Intel based machines. ID: 368923 ·

hiamps Volunteer tester Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0	Message 368942 - Posted: 16 Jul 2006, 2:35:56 UTC - in response to Message 368923. Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system. I'm seing substantual gains in speed, and the SSE3 files run the fastest. You can check my claim by looking at some results. http://setiathome.berkeley.edu/result.php?resultid=354471318 http://setiathome.berkeley.edu/result.php?resultid=354433231 http://setiathome.berkeley.edu/result.php?resultid=354371121 If you click the link by Computer ID, you will see that this an Athlon64 X2 3800+ Overclocked to 2400Mhz, running the P4 SSE3 client, with very good results. Also it is much faster than the generic SSE2 client. I have run the automated testsuite (KWSN CPU test and benchmark)from the homepage that have the optimized clients available, and with the modified executables put into the right directory, I'm seeing these results: * Cut start * Starting tests. This will take a few minutes, please be patient! Testing setiathome-5.15-kwsn-mmx.exe... setiathome-5.15-kwsn-mmx.exe ran for 114 seconds Testing setiathome-5.15-kwsn-sse.exe... setiathome-5.15-kwsn-sse.exe ran for 60 seconds Testing setiathome-5.15-kwsn-sse2.exe... setiathome-5.15-kwsn-sse2.exe ran for 59 seconds Testing setiathome-5.15-kwsn-sse2-p4.exe... setiathome-5.15-kwsn-sse2-p4.exe ran for 51 seconds Testing setiathome-5.15-kwsn-sse2-pm.exe... setiathome-5.15-kwsn-sse2-pm.exe ran for 55 seconds Testing setiathome-5.15-kwsn-sse3-p4.exe... setiathome-5.15-kwsn-sse3-p4.exe ran for 50 seconds Finished with test run! * Cut End * The Iccpatch program removes a GenuineIntel check in code produced by the latest Intel compilers. It's well documented at this page: http://www.swallowtail.org/naughty-intel.html I downlaoded the Iccpatch program from that page, but today there are only some scripts available. If the developer of these optimized clients were to incorporate the patches described on this page, _all_ SSE2 and SSE3 capable machines could see further improvements, and not only the Intel based machines. Be interesting if this works for simon...My AMD would love to go faster... Official Abuser of Boinc Buttons... And no good credit hound! ID: 368942 ·

Marshall Send message Joined: 17 May 99 Posts: 5 Credit: 97,879 RAC: 0	Message 369082 - Posted: 16 Jul 2006, 5:40:14 UTC Hi Simon, Thanks for the new optimized builds!! I don't have exact "proof" with an exact same workunit comparison, but the C3R SSE3 application appears to be a small percentage faster, maybe 10% on my Prescott machine than the KWSN version. I could spend a lot of CPU time with real workunits proving this, but it wouldn't make any difference in terms of how to go forward with even better code. The C3R code is dead and we need to move forward with new code. This small difference is not THAT important. The biggest thing between the version immediately preceeding Seti-enhanced and Seti-enhanced is that it makes little difference if I run both of my HT cores or not. This observation is independent of KWSN vs C3R compilation of seti-enhanced. Before if I ran SETI on both core threads, both would run maybe 25-30% slower, but at the end of the day, I would get a lot more work done because I was running 2 work units at once. This doesn't appear to be true anymore. I don't see any clear advantage to running both cores anymore. For example, if I run one thread, maybe it takes 23,000 seconds. If I run two threads at once, maybe it takes about 45,000 seconds for both to complete - not much difference as oppposed to running a single thread serially. I suspect that my "relative wimpy 3 GHz Prescott" machine is memory bound now. If that is indeed true, then a further fairly dramatic performance increase is probably possible by re-organizing the code to take better advantage of the local processor cache. This is of course a far greater effort than simply compiling existing code with a better complier/libraries. However the results of such a cache optimization could be huge when magnified over a large number of machines. ID: 369082 ·

littlegreenmanfrommars Volunteer tester Send message Joined: 28 Jan 06 Posts: 1410 Credit: 934,158 RAC: 0	Message 369103 - Posted: 16 Jul 2006, 6:13:32 UTC I tried posting this a few hours ago, but the new forum software seems to be hanging on to my posts until I make another post! After only a couple of days using Simon's new optimised app, I am seeing WU crunch times down to as little as 60% of the time taken with the standard app. Well done, mate! ID: 369103 ·

Natsuo Tsuji Send message Joined: 18 May 02 Posts: 24 Credit: 1,519,328 RAC: 0	Message 369117 - Posted: 16 Jul 2006, 6:33:09 UTC Simon, I don't know whether this post is off topic for this thread or not, but please remember to make a FreeBSD optimized client. ID: 369117 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 369155 - Posted: 16 Jul 2006, 7:49:52 UTC - in response to Message 368923. Last modified: 16 Jul 2006, 8:08:16 UTC Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system. I'm seing substantual gains in speed, and the SSE3 files run the fastest. You can check my claim by looking at some results. [...] The Iccpatch program removes a GenuineIntel check in code produced by the latest Intel compilers. It's well documented at this page: http://www.swallowtail.org/naughty-intel.html I downlaoded the Iccpatch program from that page, but today there are only some scripts available. If the developer of these optimized clients were to incorporate the patches described on this page, _all_ SSE2 and SSE3 capable machines could see further improvements, and not only the Intel based machines. This is certainly a neat trick (I knew there was a way to do this), but it's also against the restrictions Intel sent me when I bought ICC/IPP licenses. I quote: B. You may NOT: (i) use or copy the Materials except as provided in this Agreement; (ii) rent or lease the Materials to any third party; (iii) assign this Agreement or transfer the Materials without the express written consent of Intel; (iv) modify, adapt, or translate the Materials in whole or in part except as provided in this Agreement; D. DISTRIBUTION: Distribution of the Materials is also subject to the following limitations: You (i) shall be solely responsible to your customers for any update or support obligation or other liability which may arise from the distribution, (ii) shall not make any statement that your product is "certified", or that its performance is guaranteed, by Intel, (iii) shall not use Intel's name or trademarks to market your product without written permission, (iv) shall prohibit disassembly and reverse engineering, (v) shall not publish reviews of Materials designated as beta without written permission by Intel, and (vi) shall indemnify, hold harmless, and defend Intel and its suppliers from and against any claims or lawsuits, including attorney's fees, that arise or result from your distribution of any product. 9. TERMINATION OF THIS LICENSE: This Agreement becomes effective on the date you accept this Agreement and will continue until terminated as provided for in this Agreement. If you are using the Materials under the control of a time-limited license, for example an Evaluation License, this Agreement terminates without notice on the last day of the time period, which is controlled by the license key code for the Materials. Intel may terminate this license at any time if you are in breach of any of its terms and conditions. Upon termination, you will immediately return to Intel or destroy the Materials and all copies thereof. So by doing that and offering it for download, I'd be opening myself up for legal retribution and license nullification from Intel. Guess what, I won't do it :o) That's not to say I'll keep you from doing it, but no, you will not be getting a binary like that from me directly, sorry. Maybe I'll insert a link somewhere with a hint, but that's as far as I can go, and needs a disclaimer anyway. Interesting results, though :o) I knew those routines weren't Intel-specific... Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 369155 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 369166 - Posted: 16 Jul 2006, 8:12:28 UTC - in response to Message 368849. Most early Prescotts are quicker with SSE2 than SSE3. That yours is also quicker using generic SSE2 than P4-specific SSE2 is interesting, usually it's the other way around. How do we know if we have an "ealier" Prescott? Usually when CPU-Z tells you it's on Socket 478 instead of LGA775 :o) HTH, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 369166 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 369190 - Posted: 16 Jul 2006, 9:34:58 UTC Hm, regarding ICCPatch, I wonder whether there is not a legal way to do it. The license specifically states "you will prohibit disassembly and reverse engineering". Patching an executable is not disassembling it, neither is it reverse-engineering it. What it is doing is performing a "bit transplant", basically. It wasn't even necessary to reverse-engineer Intel products to achieve this, but instead was realized by using a debugger and analyzing assembly. So in this regard, I'll be emailing someone at Intel I'm in contact with and asking them whether this will nullify my license. I really don't want to go on the assumption it's okay. However, I have tested the generic SSE2 vs. the P4-only patched SSE2 executable on an AMD machine, and boy, was it quick. 69 seconds gen. SSE2, 59 P4-SSE2 patched. Yowzer... Definitely worth investigating. I'll also be checking something I'm wondering about myself regarding speed differences on AMDs...grin Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 369190 ·

Tetsuji Maverick Rai Volunteer tester Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0	Message 369202 - Posted: 16 Jul 2006, 10:00:55 UTC - in response to Message 369190. Hm, regarding ICCPatch, I wonder whether there is not a legal way to do it. The license specifically states "you will prohibit disassembly and reverse engineering". Patching an executable is not disassembling it, neither is it reverse-engineering it. I once tried to make a patch for icc, ipp library not to detect cpu brand, but looking at EULA, I abandoned. In order to make a patch, you have to disassemble the library, and it's the bottleneck. Technically, it's easy, but you have to disassemble the code. Maybe you may want to ask Intel at Premier Support or somewhere. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ID: 369202 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 369211 - Posted: 16 Jul 2006, 10:15:53 UTC Last modified: 16 Jul 2006, 10:33:29 UTC Thanks Tetsuji, I read that thread. Still, I'm not sure patching an executable falls under the license in the EULA - they specifically state you must not change their libraries, but short from disassembling the executable (debugging is not disassembling, is it?), I don't see any regulations about patching executables themselves. Anyway, I have written to the author of the patch, and I will write to a contact at Intel next before I do anything - always better to be safe than to be sorry. Regards, Simon. P.S. Er, and I just found out someone else did that for his binaries. Tsk tsk :o) Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 369211 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 369240 - Posted: 16 Jul 2006, 11:45:56 UTC - in response to Message 369117. Last modified: 16 Jul 2006, 12:25:14 UTC Simon, I don't know whether this post is off topic for this thread or not, but please remember to make a FreeBSD optimized client. Thanks for reminding me :o) Tetsuji Maverick Rai already said that ICC was available as a FreeBSD port - I wonder whether I can just use my static Linux executable (which also include IPP) to work on FreeBSD (with brandelf) or whether I have to compile a native executable (and if I can do both, which is quicker). When I find some time, I'll test it out - downloading some installation ISOs now. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 369240 ·

gomeyer Volunteer tester Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0	Message 369258 - Posted: 16 Jul 2006, 12:41:21 UTC Last modified: 16 Jul 2006, 12:49:54 UTC Simon - First, many thanks for the incredible amount of work you're doing here. This definitely goes way Ã¢â‚¬Å“above and beyondÃ¢â‚¬Â! I downloaded the Benchmark Test Set last night and am in the process of Benching four of my machines, and have a quick question: Have you updated the bench applications in that package to match your latest and greatest release? (1.3 I guess.) Or would the upgrade to 1.3 from your original release not affect execution speed? Thanks again! D'Oh!! I just read on your web page that the package includes the r1.3 apps. NEVER MIND ID: 369258 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 369305 - Posted: 16 Jul 2006, 14:13:34 UTC - in response to Message 368354. Oh no errors 354214050 84866230 14 Jul 2006 22:09:50 UTC 15 Jul 2006 12:38:31 UTC Over Success Done 3,485.72 13.12 0.00 Computer ID 2493813 Report deadline 29 Jul 2006 12:58:54 UTC CPU time 3485.71875 stderr out <core_client_version>5.4.9</core_client_version> [...] I've just downloaded the WU that errored for you from the server and will try crunching it offline with my Athlon64 3500+ using the same version (xW/gen. SSE2). Let's see what happens :o) Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 369305 ·

Natsuo Tsuji Send message Joined: 18 May 02 Posts: 24 Credit: 1,519,328 RAC: 0	Message 369308 - Posted: 16 Jul 2006, 14:20:27 UTC - in response to Message 369240. Last modified: 16 Jul 2006, 14:21:05 UTC Thanks for reminding me :o) Tetsuji Maverick Rai already said that ICC was available as a FreeBSD port - I wonder whether I can just use my static Linux executable (which also include IPP) to work on FreeBSD (with brandelf) or whether I have to compile a native executable (and if I can do both, which is quicker). When I find some time, I'll test it out - downloading some installation ISOs now. Regards, Simon. I've tried both your SSE and SSE2 Linux worker on FreeBSD with brandelf, but neither of them worked. So, please compile a native executable. ID: 369308 ·

Al Volunteer tester Send message Joined: 4 Oct 99 Posts: 5832 Credit: 401,935 RAC: 0	Message 369321 - Posted: 16 Jul 2006, 14:32:57 UTC - in response to Message 369305. Oh no errors 354214050 84866230 14 Jul 2006 22:09:50 UTC 15 Jul 2006 12:38:31 UTC Over Success Done 3,485.72 13.12 0.00 Computer ID 2493813 Report deadline 29 Jul 2006 12:58:54 UTC CPU time 3485.71875 stderr out <core_client_version>5.4.9</core_client_version> [...] I've just downloaded the WU that errored for you from the server and will try crunching it offline with my Athlon64 3500+ using the same version (xW/gen. SSE2). Let's see what happens :o) Regards, Simon. Super:) I cant wait to se the result. Scorpions - Wind Of Change ID: 369321 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 369335 - Posted: 16 Jul 2006, 14:50:54 UTC - in response to Message 369082. I suspect that my "relative wimpy 3 GHz Prescott" machine is memory bound now. If that is indeed true, then a further fairly dramatic performance increase is probably possible by re-organizing the code to take better advantage of the local processor cache. This is of course a far greater effort than simply compiling existing code with a better complier/libraries. However the results of such a cache optimization could be huge when magnified over a large number of machines. Alex Kan added a transposition of data arrays in his Mac G4 and G5 optimized versions which is more cache friendly, and that code will be in the stock 5.17 versions. Some test builds I did with DevC++/MinGW seemed to give about a 10% speed improvement. But Tetsuji said someplace that it didn't seem to work when compiled with VC++/ICC and I think Simon has had the same lack of improvement. I do hope that those with good ideas on how to improve the program will get the sources and contribute the needed time and effort. Joe ID: 369335 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 369337 - Posted: 16 Jul 2006, 14:55:26 UTC Josef is correct - in fact, the 5.17 builds seemed slightly slower using ICC/IPP. Well, in my effort to reproduce Dr.'s error, BOINC somehow managed to stay running in the background even though I had exited the manager (and am running single user mode on that machine), but I didn't notice it. So when I thought I was cancelling the science app doing my benchmark because I had the wrong WU, I actually cancelled two WUs on BOINC. Bah :o) No computation error, but that's what it shows up as. Anyway, I'll be running that WU now. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 369337 ·

Alex Kan Volunteer developer Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0	Message 369465 - Posted: 16 Jul 2006, 17:04:10 UTC - in response to Message 369335. Last modified: 16 Jul 2006, 17:07:14 UTC Alex Kan added a transposition of data arrays in his Mac G4 and G5 optimized versions which is more cache friendly, and that code will be in the stock 5.17 versions. Some test builds I did with DevC++/MinGW seemed to give about a 10% speed improvement. But Tetsuji said someplace that it didn't seem to work when compiled with VC++/ICC and I think Simon has had the same lack of improvement. From looking at the seti_cvs mailing list archives, it seems that until June 27, transpose functions were being called with the wrong dimensions, causing the transposed PoT array to contain garbage. I still haven't tested 5.17 myself, so someone else will have to verify that the code in CVS behaves correctly now. In general, without a fast matrix transpose implementation, the speed gains from the improved cache performance of a transposed PoT array may be negated by the amount of time it actually takes to transpose the array. My clients use a general out-of-place matrix transpose provided by Apple with its vDSP performance libraries. Since most other optimized clients use IPP, it may be worth looking into calling ippmTranpose instead of Eric's provided functions. ID: 369465 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.