Windows port of Alex v8 code

Message boards : Number crunching : Windows port of Alex v8 code
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 50 · Next

AuthorMessage
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 731507 - Posted: 28 Mar 2008, 21:06:11 UTC - in response to Message 731481.  
Last modified: 28 Mar 2008, 21:07:29 UTC

I also just started running v0.2 on this hostid=4251730 q6600 @ 2520MHz...


I've added that one to the extract, and am calling it Whale2 in my graph legend.

Here is a first look:



As you suggested, there is a much more extensive Chicken reference available for Whale2 than for Whale1.

Improvement of V0.2 compared to V0.1 is also clear in this set.

Also, I noticed that there was a result out past Angle Range of 10, so revised the x axis scale limit for this one. I also updated the Whale 1 graph, which should show in updated form in Message 731461 in this thread (though you may have to hit your browser refresh button to see the updated copy).
ID: 731507 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 731513 - Posted: 28 Mar 2008, 21:38:37 UTC - in response to Message 731506.  
Last modified: 28 Mar 2008, 21:54:51 UTC


Thanks Crunch3r, do you mind if I hold off 'till my Q6600 makes it into the top 20 before I hand over the ported source? I figure it's got a chance to reach RAC ~6500 running at 3200MHz with the AK codes.


I don't mind at all. It's your effort of porting the code ( a thing that i preached many times over at lunatics.at ...) and finally someone is heading in the right direction, can only thank you for doing so.

You decide what to do with it or if you wanna release it or not.. it's all up to you what to do next.


I offer my help to solve the legal issues. That's all, no pressure at all.
IF you feel ready for a public release or whatever, just tell me and we solve that legal stuff.

HTH

EDIT
Btw, if you decide to relase you code, i'll port it to linux as well like i did with the 2.2B and 2.4 code from lunatics.at ... Cuz no one was able to do so... so i had to do it on my own ... once again :(

However, it's all up to you JD.

Join BOINC United now!
ID: 731513 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 731521 - Posted: 28 Mar 2008, 22:40:36 UTC

Hey Crunch3r, thanks for all your efforts with code optimisation and the encouragement you are giving others. Well done!

Live long and BOINC!

Paul
(S@H1 8888)
And proud of it!
ID: 731521 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 731532 - Posted: 28 Mar 2008, 23:16:42 UTC
Last modified: 28 Mar 2008, 23:18:25 UTC

I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release.

I hope that your Q6600 makes it to the top 20 quickly (6 weeks or so), and that you do decide on a public release.

Dreaming aside

My Penny currently produces an RAC of ~5,600+ on the KWSN_2.4V_MB_SSSE3 client. What might it's RAC top at with JD's port?
It's good to be back amongst friends and colleagues



ID: 731532 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 731537 - Posted: 28 Mar 2008, 23:28:47 UTC - in response to Message 731532.  

I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release.


Don't forget what Jason Gee is doing...

Also, though I get zero benefit from any of this since I don't have a system that can take advantage of the changes, I'd like to thank everyone involved for having this nice, calm, friendly, and productive thread as opposed to past endeavors on the same subject...

Brian
ID: 731537 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 731538 - Posted: 28 Mar 2008, 23:30:51 UTC - in response to Message 731537.  

I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release.


Don't forget what Jason Gee is doing...

Also, though I get zero benefit from any of this since I don't have a system that can take advantage of the changes, I'd like to thank everyone involved for having this nice, calm, friendly, and productive thread as opposed to past endeavors on the same subject...

Brian

And how, my friend, do you think you will net zero benefit from his efforts?

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 731538 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 731539 - Posted: 28 Mar 2008, 23:35:25 UTC - in response to Message 731537.  
Last modified: 28 Mar 2008, 23:36:22 UTC

I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release.


Don't forget what Jason Gee is doing...

Also, though I get zero benefit from any of this since I don't have a system that can take advantage of the changes, I'd like to thank everyone involved for having this nice, calm, friendly, and productive thread as opposed to past endeavors on the same subject...

Brian


I would thank Brian for the reminder of the other port Jason is doing.

Thanks




Just been in Top Computers to see where the Q6,600's are residing and what their RACs are currently.

I see the top Q6,600 is at 52 in the list, with and RAC of 4,750. There is a storm of Q6,600s after this with RACs varying between 4,000 and 4,500.

Let us say the mean RAC for the current crop of Q6,600s is 4,500, then JD's estimate for an RAC topping at 6,500 points to how fast the AK code port could potentially be.

Assuming a straight relationship might apply to a Penny, then the top out RAC could be as high as 8,000 Dreaming again

My, how efficient that code must be.
It's good to be back amongst friends and colleagues



ID: 731539 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 731543 - Posted: 28 Mar 2008, 23:53:41 UTC - in response to Message 731532.  
Last modified: 29 Mar 2008, 0:21:10 UTC

I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release.

I hope that your Q6600 makes it to the top 20 quickly (6 weeks or so), and that you do decide on a public release.

Dreaming aside

My Penny currently produces an RAC of ~5,600+ on the KWSN_2.4V_MB_SSSE3 client. What might it's RAC top at with JD's port?


Pure speculation is ~8900 if performance scales linear.

If my runs with KnaBench are indicative, I'm seeing just under 40% decrease in runtimes overall with some ARs closer to 60% decrease. I'm guestimating RAC should rise by factor ~1.6. for my Q6600 against SSSE3 R2.4V.

Am I optomistic? Yes, very!!!

I've still got a couple tweaks to try, but will be running them on a yet undisclosed host, my desktop Q6600. It's only got 16 days to make it's move. Running Vista, so it's not as fast as those XP boxes. LOL...

Of course, your performance will depend on whether Crunch3r throttles back the code or not ;-) JK

Regards,
JDWhale



[Edit] I could not have gotten this code where it is today without Jason pointing out where my first attempt was hanging. That hint persuaded me to pursue getting VECTORIZED_GAUSSIAN part of the code working and to find the get around to the Mac corr() function with Intel "ipps" calls, Also it was a discussion on Lunatics forum that disclosed a bug an another part of the code. Jason had run into the same brick walls that I hit and his pointers are much appreciated.[/edit]
ID: 731543 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 731552 - Posted: 29 Mar 2008, 0:24:49 UTC

archae86 has been doing an excellent job presenting JD's results and I have been busy helping others out, but I couldn't resist presenting my alternative view on this excellent piece of work:


Direct Link

Purely by chance, my numbering for Whale1 and Whale2 is exactly the same.

Keep up the good work JD :)

F.
ID: 731552 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65763
Credit: 55,293,173
RAC: 49
United States
Message 731558 - Posted: 29 Mar 2008, 0:47:18 UTC - in response to Message 731552.  

archae86 has been doing an excellent job presenting JD's results and I have been busy helping others out, but I couldn't resist presenting my alternative view on this excellent piece of work:


Direct Link

Purely by chance, my numbering for Whale1 and Whale2 is exactly the same.

Keep up the good work JD :)

F.

Nice pics and info too. Now I know I need better cooling.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 731558 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 731565 - Posted: 29 Mar 2008, 1:14:35 UTC - in response to Message 731538.  
Last modified: 29 Mar 2008, 1:16:03 UTC

I would like to thank JD for the encouraging Port of AK's code, now under local test. Further, I would like to thank Crunch3r for his encouragement to JD and offer to help resolve the legal bits should JD want to move towards a public release.


Don't forget what Jason Gee is doing...

Also, though I get zero benefit from any of this since I don't have a system that can take advantage of the changes, I'd like to thank everyone involved for having this nice, calm, friendly, and productive thread as opposed to past endeavors on the same subject...

Brian

And how, my friend, do you think you will net zero benefit from his efforts?


Are the optimizations being brought over going to be SSE, SSE2, or SSE3? If they are any of those, then I am mistaken. If they are higher SIMD levels than those, I get no direct benefit for my systems, which support only SSE, SSE2, and SSE3...
ID: 731565 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 731579 - Posted: 29 Mar 2008, 1:25:10 UTC
Last modified: 29 Mar 2008, 1:27:39 UTC

Hey.. I have a QUAD for testing the new app also..


How it will be..
The SSE3 (if will be compiled) will be faster than SSSE3 also, like now with the Rev.2.4V?


A MMX version will be nice also, for my old AMD K7..
ID: 731579 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 731583 - Posted: 29 Mar 2008, 1:33:19 UTC - in response to Message 731565.  
Last modified: 29 Mar 2008, 2:16:55 UTC


Are the optimizations being brought over going to be SSE, SSE2, or SSE3? If they are any of those, then I am mistaken. If they are higher SIMD levels than those, I get no direct benefit for my systems, which support only SSE, SSE2, and SSE3...


Until now, I only focused building for Core2, I am building now for SSE3 and will try on my P4D.

[Edit] First few comparisons test positive... 25-35% speedup vs. KWSN_2.4_SSE3-Intel-P4_MB
I'll go live on the P4D with the ported AK app [/edit]
ID: 731583 · Report as offensive
Profile Logan
Volunteer tester
Avatar

Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 731594 - Posted: 29 Mar 2008, 1:53:22 UTC - in response to Message 731583.  
Last modified: 29 Mar 2008, 2:16:30 UTC


Are the optimizations being brought over going to be SSE, SSE2, or SSE3? If they are any of those, then I am mistaken. If they are higher SIMD levels than those, I get no direct benefit for my systems, which support only SSE, SSE2, and SSE3...


Until now, I only focused building for Core2, I am building now for SSE3 and will try on my P4D.

Hi, JD. Crunch3r will give you all you need to made legal your V8 version. I don't understand why do you not want to live your code to him... All crunchers knows to Crunch3r... And he have the licences what can do all of us do better for the project....

This remember to me a movie... 'The Hitchhiker's Guide to the Galaxy'... 'And thaks for the fish and so long....'

A kid with a pencil would be must important like that... haha, and dangerous... hehehe....
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)
ID: 731594 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 731600 - Posted: 29 Mar 2008, 2:12:06 UTC - in response to Message 731594.  


Are the optimizations being brought over going to be SSE, SSE2, or SSE3? If they are any of those, then I am mistaken. If they are higher SIMD levels than those, I get no direct benefit for my systems, which support only SSE, SSE2, and SSE3...


Until now, I only focused building for Core2, I am building now for SSE3 and will try on my P4D.

Hi, JD. Crunch3r will give you all you need to made legal your V8 version. I don't understand why do you not want to live your code to him... All crunchers knows to Crunch3r... And he have the licences what can do all of us do better for the project....

This remember to me a movie... 'The Hitchhiker's Guide to the Galaxy'... 'And thaks for the fish and so long....'

I know wtf means......"where's the fish"......
The kittis have long known that qoute.........'twas from kitty Twain.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 731600 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 731610 - Posted: 29 Mar 2008, 2:45:24 UTC - in response to Message 731583.  


Are the optimizations being brought over going to be SSE, SSE2, or SSE3? If they are any of those, then I am mistaken. If they are higher SIMD levels than those, I get no direct benefit for my systems, which support only SSE, SSE2, and SSE3...


Until now, I only focused building for Core2, I am building now for SSE3 and will try on my P4D.

[Edit] First few comparisons test positive... 25-35% speedup vs. KWSN_2.4_SSE3-Intel-P4_MB
I'll go live on the P4D with the ported AK app [/edit]


Thanks... This thread did seem to indicate that the efforts were indeed aimed at SSSE3 and higher, so I'm glad that the rest of the user base might see some benefit as well.

The critical builds for me specifically (really me and a lot of folks, considering a large number of systems are pre-Prescott P4), are the Intel SSE2 and generic (AMD) SSE2. If there are differences between those KSWN versions, then it will help all pre-Prescott P4 and all pre-Venice/San Diego K8-based Athlons. For Venice and San Diego K8s and newer, SSE3 will work...although there may not be much of an improvement, if any, over SSE2 (which is what was found with the KSWN apps)...
ID: 731610 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731636 - Posted: 29 Mar 2008, 4:23:59 UTC - in response to Message 731610.  
Last modified: 29 Mar 2008, 4:31:59 UTC


Are the optimizations being brought over going to be SSE, SSE2, or SSE3? If they are any of those, then I am mistaken. If they are higher SIMD levels than those, I get no direct benefit for my systems, which support only SSE, SSE2, and SSE3...


Until now, I only focused building for Core2, I am building now for SSE3 and will try on my P4D.

[Edit] First few comparisons test positive... 25-35% speedup vs. KWSN_2.4_SSE3-Intel-P4_MB
I'll go live on the P4D with the ported AK app [/edit]


Thanks... This thread did seem to indicate that the efforts were indeed aimed at SSSE3 and higher, so I'm glad that the rest of the user base might see some benefit as well.

The critical builds for me specifically (really me and a lot of folks, considering a large number of systems are pre-Prescott P4), are the Intel SSE2 and generic (AMD) SSE2. If there are differences between those KSWN versions, then it will help all pre-Prescott P4 and all pre-Venice/San Diego K8-based Athlons. For Venice and San Diego K8s and newer, SSE3 will work...although there may not be much of an improvement, if any, over SSE2 (which is what was found with the KSWN apps)...


My ports have builds supporting P4's and AMD's with SSE3. Sorry, this is lowest we can go at this stage without some pretty extensive effort (though we are looking at the possiblity of SSE2 builds later, My Northwood and Coppermine are out the the cold also, well in the cupboard anyway..). Most interestingly AMD with SSE3, including a Phenom, have shown to run the AMD build vs a patched intel-only build at identical speeds so far.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731636 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 731639 - Posted: 29 Mar 2008, 5:02:50 UTC - in response to Message 731636.  


My ports have builds supporting P4's and AMD's with SSE3. Sorry, this is lowest we can go at this stage without some pretty extensive effort (though we are looking at the possiblity of SSE2 builds later


Well, my AMD has SSE3 support, but I digress... The SSE2 build would help the project immensely if it has any significant performance improvement. Consider the request for more users to help out. If that does not happen, the bulk of the systems in use are SSE2-class systems.

I'm not trying to be a "task master" here, carrying a whip. I also simply do not have the skillset to dive in and help, not without taking some serious time to just get up to speed. I'm just trying to be an advocate for the bulk of the user base...so that it helps the project more than just targeting "newer" systems would...
ID: 731639 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731642 - Posted: 29 Mar 2008, 5:15:15 UTC - in response to Message 731639.  


My ports have builds supporting P4's and AMD's with SSE3. Sorry, this is lowest we can go at this stage without some pretty extensive effort (though we are looking at the possiblity of SSE2 builds later


Well, my AMD has SSE3 support, but I digress... The SSE2 build would help the project immensely if it has any significant performance improvement. Consider the request for more users to help out. If that does not happen, the bulk of the systems in use are SSE2-class systems.

I'm not trying to be a "task master" here, carrying a whip. I also simply do not have the skillset to dive in and help, not without taking some serious time to just get up to speed. I'm just trying to be an advocate for the bulk of the user base...so that it helps the project more than just targeting "newer" systems would...

No Worries, We'll see what happens down the road, I agree that SSE2 machines dominate according to BoincStats, and have mentioned so before on Lunatics. Just be aware that these generation of machines are sufficiently architecturally different (IMO) that key code would not only have to be converted from SSE3+ to SSE2, but also likely have the memory access and cache usage patterns completely reorganised/rewritten.

It could turn out that removing the 'SSE3+ AK goodness' and introducing SSE2/P4 code renders the thing the same speed or slower than 2.4V, Though I won't rule out such a build until further experiments confirm or reject this hypothesis.

Jason

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731642 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65763
Credit: 55,293,173
RAC: 49
United States
Message 731649 - Posted: 29 Mar 2008, 5:24:30 UTC

This code in SSE3 would if It's as fast as It's Intel equivalent would be fantastic news for those with Opteron cpus(PC4 has an Opteron 165 in It that ran 24/7 @ 2.60GHz), My PC4 is shut down cause of this as It performs so poorly on SSE2 compared to the Intel Quads or even the Intel Duals that I've had as to be not worth running, But with this new code It could be worth maybe putting back online eventually, As in after the the 1st of the year, maybe. :D
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 731649 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 50 · Next

Message boards : Number crunching : Windows port of Alex v8 code


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.