Message boards :
Number crunching :
KWSN Windows optimized science apps - Share your results and problems!
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 13 · Next
Author | Message |
---|---|
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65763 Credit: 55,293,173 RAC: 49 |
[quote]Oh no errors..... It's Freeware and here's the Link: CrystalCPUID Click on the Dot where It says Download and then click on an image like this one as It has a download link in It: There is also a Native XP x64 version too for those that use XP x64 like I do, Note the version with the image is for the XP x32 OS(32bit XP) and the XP x64 version is only for the XP x64 OS. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Misfit Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0 |
Most early Prescotts are quicker with SSE2 than SSE3. That yours is also quicker using generic SSE2 than P4-specific SSE2 is interesting, usually it's the other way around. How do we know if we have an "ealier" Prescott? me@rescam.org |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system. I'm seing substantual gains in speed, and the SSE3 files run the fastest. You can check my claim by looking at some results. http://setiathome.berkeley.edu/result.php?resultid=354471318 http://setiathome.berkeley.edu/result.php?resultid=354433231 http://setiathome.berkeley.edu/result.php?resultid=354371121 If you click the link by Computer ID, you will see that this an Athlon64 X2 3800+ Overclocked to 2400Mhz, running the P4 SSE3 client, with very good results. Also it is much faster than the generic SSE2 client. I have run the automated testsuite (KWSN CPU test and benchmark)from the homepage that have the optimized clients available, and with the modified executables put into the right directory, I'm seeing these results: *** Cut start *** Starting tests. This will take a few minutes, please be patient! Testing setiathome-5.15-kwsn-mmx.exe... setiathome-5.15-kwsn-mmx.exe ran for 114 seconds Testing setiathome-5.15-kwsn-sse.exe... setiathome-5.15-kwsn-sse.exe ran for 60 seconds Testing setiathome-5.15-kwsn-sse2.exe... setiathome-5.15-kwsn-sse2.exe ran for 59 seconds Testing setiathome-5.15-kwsn-sse2-p4.exe... setiathome-5.15-kwsn-sse2-p4.exe ran for 51 seconds Testing setiathome-5.15-kwsn-sse2-pm.exe... setiathome-5.15-kwsn-sse2-pm.exe ran for 55 seconds Testing setiathome-5.15-kwsn-sse3-p4.exe... setiathome-5.15-kwsn-sse3-p4.exe ran for 50 seconds Finished with test run! *** Cut End *** The Iccpatch program removes a GenuineIntel check in code produced by the latest Intel compilers. It's well documented at this page: http://www.swallowtail.org/naughty-intel.html I downlaoded the Iccpatch program from that page, but today there are only some scripts available. If the developer of these optimized clients were to incorporate the patches described on this page, _all_ SSE2 and SSE3 capable machines could see further improvements, and not only the Intel based machines. |
hiamps Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0 |
Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system. Be interesting if this works for simon...My AMD would love to go faster... Official Abuser of Boinc Buttons... And no good credit hound! |
Marshall Send message Joined: 17 May 99 Posts: 5 Credit: 97,879 RAC: 0 |
Hi Simon, Thanks for the new optimized builds!! I don't have exact "proof" with an exact same workunit comparison, but the C3R SSE3 application appears to be a small percentage faster, maybe 10% on my Prescott machine than the KWSN version. I could spend a lot of CPU time with real workunits proving this, but it wouldn't make any difference in terms of how to go forward with even better code. The C3R code is dead and we need to move forward with new code. This small difference is not THAT important. The biggest thing between the version immediately preceeding Seti-enhanced and Seti-enhanced is that it makes little difference if I run both of my HT cores or not. This observation is independent of KWSN vs C3R compilation of seti-enhanced. Before if I ran SETI on both core threads, both would run maybe 25-30% slower, but at the end of the day, I would get a lot more work done because I was running 2 work units at once. This doesn't appear to be true anymore. I don't see any clear advantage to running both cores anymore. For example, if I run one thread, maybe it takes 23,000 seconds. If I run two threads at once, maybe it takes about 45,000 seconds for both to complete - not much difference as oppposed to running a single thread serially. I suspect that my "relative wimpy 3 GHz Prescott" machine is memory bound now. If that is indeed true, then a further fairly dramatic performance increase is probably possible by re-organizing the code to take better advantage of the local processor cache. This is of course a far greater effort than simply compiling existing code with a better complier/libraries. However the results of such a cache optimization could be huge when magnified over a large number of machines. |
littlegreenmanfrommars Send message Joined: 28 Jan 06 Posts: 1410 Credit: 934,158 RAC: 0 |
I tried posting this a few hours ago, but the new forum software seems to be hanging on to my posts until I make another post! After only a couple of days using Simon's new optimised app, I am seeing WU crunch times down to as little as 60% of the time taken with the standard app. Well done, mate! |
Natsuo Tsuji Send message Joined: 18 May 02 Posts: 24 Credit: 1,519,328 RAC: 0 |
Simon, I don't know whether this post is off topic for this thread or not, but please remember to make a FreeBSD optimized client. |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Due to the magic of a small program called Iccpatch, that I once downloaded, I have made _all_ the P4 and PM "only" versions work on my Athlon64 system. This is certainly a neat trick (I knew there was a way to do this), but it's also against the restrictions Intel sent me when I bought ICC/IPP licenses. I quote: B. You may NOT: (i) use or copy the Materials except as provided in this Agreement; (ii) rent or lease the Materials to any third party; (iii) assign this Agreement or transfer the Materials without the express written consent of Intel; (iv) modify, adapt, or translate the Materials in whole or in part except as provided in this Agreement; D. DISTRIBUTION: Distribution of the Materials is also subject to the following limitations: You (i) shall be solely responsible to your customers for any update or support obligation or other liability which may arise from the distribution, (ii) shall not make any statement that your product is "certified", or that its performance is guaranteed, by Intel, (iii) shall not use Intel's name or trademarks to market your product without written permission, (iv) shall prohibit disassembly and reverse engineering, (v) shall not publish reviews of Materials designated as beta without written permission by Intel, and (vi) shall indemnify, hold harmless, and defend Intel and its suppliers from and against any claims or lawsuits, including attorney's fees, that arise or result from your distribution of any product. 9. TERMINATION OF THIS LICENSE: This Agreement becomes effective on the date you accept this Agreement and will continue until terminated as provided for in this Agreement. If you are using the Materials under the control of a time-limited license, for example an Evaluation License, this Agreement terminates without notice on the last day of the time period, which is controlled by the license key code for the Materials. Intel may terminate this license at any time if you are in breach of any of its terms and conditions. Upon termination, you will immediately return to Intel or destroy the Materials and all copies thereof. So by doing that and offering it for download, I'd be opening myself up for legal retribution and license nullification from Intel. Guess what, I won't do it :o) That's not to say I'll keep you from doing it, but no, you will not be getting a binary like that from me directly, sorry. Maybe I'll insert a link somewhere with a hint, but that's as far as I can go, and needs a disclaimer anyway. Interesting results, though :o) I knew those routines weren't Intel-specific... Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Most early Prescotts are quicker with SSE2 than SSE3. That yours is also quicker using generic SSE2 than P4-specific SSE2 is interesting, usually it's the other way around. Usually when CPU-Z tells you it's on Socket 478 instead of LGA775 :o) HTH, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Hm, regarding ICCPatch, I wonder whether there is not a legal way to do it. The license specifically states "you will prohibit disassembly and reverse engineering". Patching an executable is not disassembling it, neither is it reverse-engineering it. What it is doing is performing a "bit transplant", basically. It wasn't even necessary to reverse-engineer Intel products to achieve this, but instead was realized by using a debugger and analyzing assembly. So in this regard, I'll be emailing someone at Intel I'm in contact with and asking them whether this will nullify my license. I really don't want to go on the assumption it's okay. However, I have tested the generic SSE2 vs. the P4-only patched SSE2 executable on an AMD machine, and boy, was it quick. 69 seconds gen. SSE2, 59 P4-SSE2 patched. Yowzer... Definitely worth investigating. I'll also be checking something I'm wondering about myself regarding speed differences on AMDs...*grin* Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
Tetsuji Maverick Rai Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 |
Hm, I once tried to make a patch for icc, ipp library not to detect cpu brand, but looking at EULA, I abandoned. In order to make a patch, you have to disassemble the library, and it's the bottleneck. Technically, it's easy, but you have to disassemble the code. Maybe you may want to ask Intel at Premier Support or somewhere. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Thanks Tetsuji, I read that thread. Still, I'm not sure patching an executable falls under the license in the EULA - they specifically state you must not change their libraries, but short from disassembling the executable (debugging is not disassembling, is it?), I don't see any regulations about patching executables themselves. Anyway, I have written to the author of the patch, and I will write to a contact at Intel next before I do anything - always better to be safe than to be sorry. Regards, Simon. P.S. Er, and I just found out someone else did that for his binaries. Tsk tsk :o) Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Simon, I don't know whether this post is off topic for this thread or not, but please remember to make a FreeBSD optimized client. Thanks for reminding me :o) Tetsuji Maverick Rai already said that ICC was available as a FreeBSD port - I wonder whether I can just use my static Linux executable (which also include IPP) to work on FreeBSD (with brandelf) or whether I have to compile a native executable (and if I can do both, which is quicker). When I find some time, I'll test it out - downloading some installation ISOs now. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Simon - First, many thanks for the incredible amount of work you're doing here. This definitely goes way “above and beyondâ€Â! I downloaded the Benchmark Test Set last night and am in the process of Benching four of my machines, and have a quick question: Have you updated the bench applications in that package to match your latest and greatest release? (1.3 I guess.) Or would the upgrade to 1.3 from your original release not affect execution speed? Thanks again! D'Oh!! I just read on your web page that the package includes the r1.3 apps. NEVER MIND |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Oh no errors I've just downloaded the WU that errored for you from the server and will try crunching it offline with my Athlon64 3500+ using the same version (xW/gen. SSE2). Let's see what happens :o) Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
Natsuo Tsuji Send message Joined: 18 May 02 Posts: 24 Credit: 1,519,328 RAC: 0 |
Thanks for reminding me :o) I've tried both your SSE and SSE2 Linux worker on FreeBSD with brandelf, but neither of them worked. So, please compile a native executable. |
Al Send message Joined: 4 Oct 99 Posts: 5832 Credit: 401,935 RAC: 0 |
Oh no errors Super:) I cant wait to se the result. Scorpions - Wind Of Change |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I suspect that my "relative wimpy 3 GHz Prescott" machine is memory bound now. If that is indeed true, then a further fairly dramatic performance increase is probably possible by re-organizing the code to take better advantage of the local processor cache. This is of course a far greater effort than simply compiling existing code with a better complier/libraries. Alex Kan added a transposition of data arrays in his Mac G4 and G5 optimized versions which is more cache friendly, and that code will be in the stock 5.17 versions. Some test builds I did with DevC++/MinGW seemed to give about a 10% speed improvement. But Tetsuji said someplace that it didn't seem to work when compiled with VC++/ICC and I think Simon has had the same lack of improvement. I do hope that those with good ideas on how to improve the program will get the sources and contribute the needed time and effort. Joe |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Josef is correct - in fact, the 5.17 builds seemed slightly slower using ICC/IPP. Well, in my effort to reproduce Dr.'s error, BOINC somehow managed to stay running in the background even though I had exited the manager (and am running single user mode on that machine), but I didn't notice it. So when I thought I was cancelling the science app doing my benchmark because I had the wrong WU, I actually cancelled two WUs on BOINC. Bah :o) No computation error, but that's what it shows up as. Anyway, I'll be running that WU now. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
Alex Kan added a transposition of data arrays in his Mac G4 and G5 optimized versions which is more cache friendly, and that code will be in the stock 5.17 versions. Some test builds I did with DevC++/MinGW seemed to give about a 10% speed improvement. But Tetsuji said someplace that it didn't seem to work when compiled with VC++/ICC and I think Simon has had the same lack of improvement. From looking at the seti_cvs mailing list archives, it seems that until June 27, transpose functions were being called with the wrong dimensions, causing the transposed PoT array to contain garbage. I still haven't tested 5.17 myself, so someone else will have to verify that the code in CVS behaves correctly now. In general, without a fast matrix transpose implementation, the speed gains from the improved cache performance of a transposed PoT array may be negated by the amount of time it actually takes to transpose the array. My clients use a general out-of-place matrix transpose provided by Apple with its vDSP performance libraries. Since most other optimized clients use IPP, it may be worth looking into calling ippmTranpose instead of Eric's provided functions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.