Message boards :
Number crunching :
Mac Seti Enhanced Optimized
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Tetsuji Maverick Rai Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 |
If you don't use graphics (or can live without them), go to the optimized apps links thread and snag the newest version for yet another speedup! Thanks to Simon Zadra of KWSN for pointing out that SETI runs slower when compiled with graphics, regardless of whether or not they're actually used. Hi Alex, I tried your transposed PoT in cvs repository, but it generates a wrong result(I emailed Eric about this). Will you fix it? Eric wrote he didn't have time to try it. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Hi Alex, my MacMini G4 1.42 GHz is slowly levelling out at 200 RAC. Amazing work. It's up to 3x quicker than a 3.0 GHz P4 using an unoptimized client, and only about 25-35% slower than a Pentium-D 2.66 GHz using my optimized client (although that machine is running single channel RAM until tomorrow). It's also consistently pulling about 25% more RAC than an Athlon XP 2400+ (2 GHz) running an SSE optimized client. You've done amazing things :o) I believe the speedup vs. the standard Mac app is larger than with the PC ones. Is there data how much RAC a G5 can pull per core/total? Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
You've done amazing things :o) I believe the speedup vs. the standard Mac app is larger than with the PC ones. I have timing numbers from the DP G5 1.8 I've been testing on, but I figured I'd hold off on posting them until you update your Pentium D numbers to take into account the second stick of RAM, since I imagine that'll give you a sizable speed gain. Is there data how much RAC a G5 can pull per core/total? Look at the G5s in the Top 10--most of them are running some variety of v6, and all of them are running one of my clients, judging from the work unit times. I think halimedia's Quad is still leveling off in terms of RAC, but even it probably won't end up at more than 2300 or so. (That Woodcrest at the top is something else entirely.) |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
Niiiiice numbers. About what I expected, around 7000 seconds for a general 60 cobblestone WU and about 7500-8000 for a VLAR. I still have to test the P-D with dual channel, and I do expect a sizable speedup too. Still, my first test running a linux cruncher in a VM on Windows has yielded by far the quickest runs for me so far - I'll be testing that hypothesis (that a Linux cruncher in a VM is quicker than either a native Win32 cruncher or a native Linux one) further on the P-D as well as my Athlon64 3500+. There, I was in the 6800-7300 for 60-63 credit units, as well (with the A64). Nowhere close to it on Windows, haven't tried native Linux on that Machine yet. And yeah, that Woodcrest is out of this world. However, this monday Intel started selling them! So for anyone who's interested, you can grab one now. Just be prepared to look around for suitable platforms, not too many right now. Their prices (same as Core 2 Duo) will stay stable for the remainder of 2006, so you might as well grab one now ;o) Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
I've posted the source code to v6. Sorry about the delay, if anyone was looking to get their hands on it. Tetsuji, this code (and every release since v3) contains a working transposed-PoT implementation. I looked through a recent tarball and didn't see anything obviously wrong with the parts using the transposed PoT matrix, but I haven't checked to make sure that the matrix transposes work as advertised. Correctness ought to be easy to verify, since you should be able to compare the PoT arrays generated with and without the transposed code. However, I'll be out of the country for a few weeks and won't be able to put in fixes myself. |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
Before I leave for a while and forget about them, here's those benchmark numbers on a Dual 1.8 GHz G5 from a while back. (I had to manually adjust the chirp limits to 1/2.5, since IIRC in the package I downloaded, they were set to 4/10.) testWU-1 stock 5.13 (-nographics): 21:34.26 (1294.26s) v6 (-nographics): 7:46.93 (466.93s) - 63.9% time reduction testWU-2 stock 5.13 (-nographics): 18:06.75 (1086.75s) v6 (-nographics): 8:58.35 (538.35s) - 50.5% time reduction testWU-4 stock 5.13 (-nographics): 7:33.24 (453.24s) v6 (-nographics): 3:20.61 (200.61s) - 55.7% time reduction I was going to add this as an edit, but an hour has already gone by since my last post. :P |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
Reviving an old thread to post some news... An initial version of my optimized clients for Intel Macs is almost done. I currently don't have access to the Intel Compiler or IPP libraries, so you'll have to settle for what I can do without touching the FFTs. This means that your performance on those shorter (12-13 credit) WUs might not stack up against Windows/Linux clients. Longer WUs should be just fine. ;) Keep your eyes peeled for an update to the optimized apps thread! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.