Message boards :
Number crunching :
Windows port of Alex v8 code
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 50 · Next
Author | Message |
---|---|
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0 |
I also just started running v0.2 on this hostid=4251730 q6600 @ 2520MHz... I've added that one to the extract, and am calling it Whale2 in my graph legend. Here is a first look: As you suggested, there is a much more extensive Chicken reference available for Whale2 than for Whale1. Improvement of V0.2 compared to V0.1 is also clear in this set. Also, I noticed that there was a result out past Angle Range of 10, so revised the x axis scale limit for this one. I also updated the Whale 1 graph, which should show in updated form in Message 731461 in this thread (though you may have to hit your browser refresh button to see the updated copy). |
Crunch3r Send message Joined: 15 Apr 99 Posts: 1546 Credit: 3,438,823 RAC: 0 |
I don't mind at all. It's your effort of porting the code ( a thing that i preached many times over at lunatics.at ...) and finally someone is heading in the right direction, can only thank you for doing so. You decide what to do with it or if you wanna release it or not.. it's all up to you what to do next. I offer my help to solve the legal issues. That's all, no pressure at all. IF you feel ready for a public release or whatever, just tell me and we solve that legal stuff. HTH EDIT Btw, if you decide to relase you code, i'll port it to linux as well like i did with the 2.2B and 2.4 code from lunatics.at ... Cuz no one was able to do so... so i had to do it on my own ... once again :( However, it's all up to you JD. Join BOINC United now! |
The Gas Giant Send message Joined: 22 Nov 01 Posts: 1904 Credit: 2,646,654 RAC: 0 |
Hey Crunch3r, thanks for all your efforts with code optimisation and the encouragement you are giving others. Well done! Live long and BOINC! Paul (S@H1 8888) And proud of it! |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release. I hope that your Q6600 makes it to the top 20 quickly (6 weeks or so), and that you do decide on a public release. Dreaming aside My Penny currently produces an RAC of ~5,600+ on the KWSN_2.4V_MB_SSSE3 client. What might it's RAC top at with JD's port? It's good to be back amongst friends and colleagues |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release. Don't forget what Jason Gee is doing... Also, though I get zero benefit from any of this since I don't have a system that can take advantage of the changes, I'd like to thank everyone involved for having this nice, calm, friendly, and productive thread as opposed to past endeavors on the same subject... Brian |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release. And how, my friend, do you think you will net zero benefit from his efforts? "Freedom is just Chaos, with better lighting." Alan Dean Foster |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release. I would thank Brian for the reminder of the other port Jason is doing. Thanks Just been in Top Computers to see where the Q6,600's are residing and what their RACs are currently. I see the top Q6,600 is at 52 in the list, with and RAC of 4,750. There is a storm of Q6,600s after this with RACs varying between 4,000 and 4,500. Let us say the mean RAC for the current crop of Q6,600s is 4,500, then JD's estimate for an RAC topping at 6,500 points to how fast the AK code port could potentially be. Assuming a straight relationship might apply to a Penny, then the top out RAC could be as high as 8,000 Dreaming again My, how efficient that code must be. It's good to be back amongst friends and colleagues |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release. Pure speculation is ~8900 if performance scales linear. If my runs with KnaBench are indicative, I'm seeing just under 40% decrease in runtimes overall with some ARs closer to 60% decrease. I'm guestimating RAC should rise by factor ~1.6. for my Q6600 against SSSE3 R2.4V. Am I optomistic? Yes, very!!! I've still got a couple tweaks to try, but will be running them on a yet undisclosed host, my desktop Q6600. It's only got 16 days to make it's move. Running Vista, so it's not as fast as those XP boxes. LOL... Of course, your performance will depend on whether Crunch3r throttles back the code or not ;-) JK Regards, JDWhale [Edit] I could not have gotten this code where it is today without Jason pointing out where my first attempt was hanging. That hint persuaded me to pursue getting VECTORIZED_GAUSSIAN part of the code working and to find the get around to the Mac corr() function with Intel "ipps" calls, Also it was a discussion on Lunatics forum that disclosed a bug an another part of the code. Jason had run into the same brick walls that I hit and his pointers are much appreciated.[/edit] |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
archae86 has been doing an excellent job presenting JD's results and I have been busy helping others out, but I couldn't resist presenting my alternative view on this excellent piece of work: Direct Link Purely by chance, my numbering for Whale1 and Whale2 is exactly the same. Keep up the good work JD :) F. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65763 Credit: 55,293,173 RAC: 49 |
archae86 has been doing an excellent job presenting JD's results and I have been busy helping others out, but I couldn't resist presenting my alternative view on this excellent piece of work: Nice pics and info too. Now I know I need better cooling. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release. Are the optimizations being brought over going to be SSE, SSE2, or SSE3? If they are any of those, then I am mistaken. If they are higher SIMD levels than those, I get no direct benefit for my systems, which support only SSE, SSE2, and SSE3... |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Hey.. I have a QUAD for testing the new app also.. How it will be.. The SSE3 (if will be compiled) will be faster than SSSE3 also, like now with the Rev.2.4V? A MMX version will be nice also, for my old AMD K7.. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Until now, I only focused building for Core2, I am building now for SSE3 and will try on my P4D. [Edit] First few comparisons test positive... 25-35% speedup vs. KWSN_2.4_SSE3-Intel-P4_MB I'll go live on the P4D with the ported AK app [/edit] |
Logan Send message Joined: 26 Jan 07 Posts: 743 Credit: 918,353 RAC: 0 |
Hi, JD. Crunch3r will give you all you need to made legal your V8 version. I don't understand why do you not want to live your code to him... All crunchers knows to Crunch3r... And he have the licences what can do all of us do better for the project.... This remember to me a movie... 'The Hitchhiker's Guide to the Galaxy'... 'And thaks for the fish and so long....' A kid with a pencil would be must important like that... haha, and dangerous... hehehe.... Logan. BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I know wtf means......"where's the fish"...... The kittis have long known that qoute.........'twas from kitty Twain. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Thanks... This thread did seem to indicate that the efforts were indeed aimed at SSSE3 and higher, so I'm glad that the rest of the user base might see some benefit as well. The critical builds for me specifically (really me and a lot of folks, considering a large number of systems are pre-Prescott P4), are the Intel SSE2 and generic (AMD) SSE2. If there are differences between those KSWN versions, then it will help all pre-Prescott P4 and all pre-Venice/San Diego K8-based Athlons. For Venice and San Diego K8s and newer, SSE3 will work...although there may not be much of an improvement, if any, over SSE2 (which is what was found with the KSWN apps)... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
My ports have builds supporting P4's and AMD's with SSE3. Sorry, this is lowest we can go at this stage without some pretty extensive effort (though we are looking at the possiblity of SSE2 builds later, My Northwood and Coppermine are out the the cold also, well in the cupboard anyway..). Most interestingly AMD with SSE3, including a Phenom, have shown to run the AMD build vs a patched intel-only build at identical speeds so far. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Well, my AMD has SSE3 support, but I digress... The SSE2 build would help the project immensely if it has any significant performance improvement. Consider the request for more users to help out. If that does not happen, the bulk of the systems in use are SSE2-class systems. I'm not trying to be a "task master" here, carrying a whip. I also simply do not have the skillset to dive in and help, not without taking some serious time to just get up to speed. I'm just trying to be an advocate for the bulk of the user base...so that it helps the project more than just targeting "newer" systems would... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
No Worries, We'll see what happens down the road, I agree that SSE2 machines dominate according to BoincStats, and have mentioned so before on Lunatics. Just be aware that these generation of machines are sufficiently architecturally different (IMO) that key code would not only have to be converted from SSE3+ to SSE2, but also likely have the memory access and cache usage patterns completely reorganised/rewritten. It could turn out that removing the 'SSE3+ AK goodness' and introducing SSE2/P4 code renders the thing the same speed or slower than 2.4V, Though I won't rule out such a build until further experiments confirm or reject this hypothesis. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65763 Credit: 55,293,173 RAC: 49 |
This code in SSE3 would if It's as fast as It's Intel equivalent would be fantastic news for those with Opteron cpus(PC4 has an Opteron 165 in It that ran 24/7 @ 2.60GHz), My PC4 is shut down cause of this as It performs so poorly on SSE2 compared to the Intel Quads or even the Intel Duals that I've had as to be not worth running, But with this new code It could be worth maybe putting back online eventually, As in after the the 1st of the year, maybe. :D The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.