Windows port of Alex v8 code

Message boards : Number crunching : Windows port of Alex v8 code
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 50 · Next

AuthorMessage
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 731651 - Posted: 29 Mar 2008, 5:33:28 UTC - in response to Message 731639.  

I'm not trying to be a "task master" here, carrying a whip. I also simply do not have the skillset to dive in and help, not without taking some serious time to just get up to speed. I'm just trying to be an advocate for the bulk of the user base...so that it helps the project more than just targeting "newer" systems would...


The question that comes to mind is "What percentage of that bulk run optimized applications?" If the percentage is not high then there is little impact by providing an optimized app, further diminished if the performance gains are not great. Sorry if that sounds cruel, I do not have that intent, just being a realist.

I tried to build for SSE2, but the errors are great in number and the code looks like it has already been modified once at many of those locations. It might be fairly simple to make it work, but I regret that I have no desire learning the routines/intrinsics. Coupled with the fact that my Intel Evaluation Licenses expire in a little over 2 weeks, there are other things I'd rather pursue. Possibly running VTune to identify hotspots/bottlenecks or substituting more IPP routines to increase readability/maintainability as long as performance does not suffer.
ID: 731651 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731654 - Posted: 29 Mar 2008, 5:38:47 UTC - in response to Message 731649.  
Last modified: 29 Mar 2008, 5:42:36 UTC

This code in SSE3 would if It's as fast as It's Intel equivalent would be fantastic news for those with Opteron cpus(PC4 has an Opteron 165 in It that ran 24/7 @ 2.60GHz), My PC4 is shut down cause of this as It performs so poorly on SSE2 compared to the Intel Quads or even the Intel Duals that I've had as to be not worth running, But with this new code It could be worth maybe putting back online eventually, As in after the the 1st of the year, maybe. :D


From what I've seen of others' tests, (I don't personally own any AMD since my Athlon died years ago), around 1.07 to 1.49 times, the output of 2.4V SSE2A, depending on angle range, averaging about 1.2 times. Not as huge a leap as the SSSE3 capable machines (1.3 to 1.5x 2.4V SSSE3), but a leap nonetheless.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731654 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 731662 - Posted: 29 Mar 2008, 6:00:35 UTC - in response to Message 731651.  

I'm not trying to be a "task master" here, carrying a whip. I also simply do not have the skillset to dive in and help, not without taking some serious time to just get up to speed. I'm just trying to be an advocate for the bulk of the user base...so that it helps the project more than just targeting "newer" systems would...


The question that comes to mind is "What percentage of that bulk run optimized applications?" If the percentage is not high then there is little impact by providing an optimized app, further diminished if the performance gains are not great. Sorry if that sounds cruel, I do not have that intent, just being a realist.

I tried to build for SSE2, but the errors are great in number and the code looks like it has already been modified once at many of those locations. It might be fairly simple to make it work, but I regret that I have no desire learning the routines/intrinsics. Coupled with the fact that my Intel Evaluation Licenses expire in a little over 2 weeks, there are other things I'd rather pursue. Possibly running VTune to identify hotspots/bottlenecks or substituting more IPP routines to increase readability/maintainability as long as performance does not suffer.


I agree with what you're saying, but certain other folks may recognize where I'm going with the battery of questions I'm asking you (you and Jason) and Joe... Perhaps long-term I'd be able to help in it, but instead of asking more people to join, another way to get more work done is to find a way to make as many of the existing hosts more efficient. The project did that with the partially optimized stock application, but full optimization was left to the anonymous platform due to "lack of resources". It would be great if full optimization could eventually be delivered to the larger participant populace by default, rather than by choice.

Yes, yes, and it does have to do with real cross-project parity as well...since some projects have optimizations and some don't, and between those that do, you have full vs. partial, with the partial providing a loophole via what you all are doing, which is fantastic work, but it goes against actual "parity" because the everyday user doesn't know about it. When they find out about it and start using it, it skews the comparison used to justify "equality", and credit reductions are demanded or claims of bad citizenship are thrown about, etc, etc, etc...

So, there are benefits to having "anonymous platform" only be used for an "OS platform", not for both "OS platform" and "optimization level".

It will be a long mountain to climb, for sure, but if it were easy, we'd already have it....

IMO, YMMV, etc, etc, etc....

ID: 731662 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731673 - Posted: 29 Mar 2008, 6:21:31 UTC - in response to Message 731662.  
Last modified: 29 Mar 2008, 6:23:30 UTC

... Yes, yes, and it does have to do with real cross-project parity as well...since some projects have optimizations and some don't, and between those that do, you have full vs. partial, with the partial providing a loophole via what you all are doing, which is fantastic work, but it goes against actual "parity" because the everyday user doesn't know about it. When they find out about it and start using it, it skews the comparison used to justify "equality", and credit reductions are demanded or claims of bad citizenship are thrown about, etc, etc, etc...

So, there are benefits to having "anonymous platform" only be used for an "OS platform", not for both "OS platform" and "optimization level".

It will be a long mountain to climb, for sure, but if it were easy, we'd already have it....

IMO, YMMV, etc, etc, etc....


This belongs in a different thread, I had disparaging remarks made directly at me for spending time trying to introduce (further) optimisations into the stock v6 code, if there is now better support for this then I'd reconsider going back to that next. No-one defended my stance then, so I have all but abandoned my part in that project (for now).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731673 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 731676 - Posted: 29 Mar 2008, 6:29:58 UTC - in response to Message 731673.  

... Yes, yes, and it does have to do with real cross-project parity as well...since some projects have optimizations and some don't, and between those that do, you have full vs. partial, with the partial providing a loophole via what you all are doing, which is fantastic work, but it goes against actual "parity" because the everyday user doesn't know about it. When they find out about it and start using it, it skews the comparison used to justify "equality", and credit reductions are demanded or claims of bad citizenship are thrown about, etc, etc, etc...

So, there are benefits to having "anonymous platform" only be used for an "OS platform", not for both "OS platform" and "optimization level".

It will be a long mountain to climb, for sure, but if it were easy, we'd already have it....

IMO, YMMV, etc, etc, etc....


This belongs in a different thread, I had disparaging remarks made directly at me for spending time trying to introduce (further) optimisations into the stock v6 code, if there is now better support for this then I'd reconsider going back to that next. No-one defended my stance then, so I have all but abandoned my part in that project (for now).


I'm sorry you experienced that. I was not aware that had happened. I don't think there is any more support for it, other than me. Nobody else has really said they believe as I do either publicly or in PM...until you... Maybe Joe if I stretch and interpret things a certain way...but I think he's not really taking a position, only stating fact and answering questions...

Anyway, carry on... ;-)
ID: 731676 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731680 - Posted: 29 Mar 2008, 6:46:32 UTC - in response to Message 731676.  
Last modified: 29 Mar 2008, 7:22:38 UTC

I'm sorry you experienced that. I was not aware that had happened. I don't think there is any more support for it, other than me. Nobody else has really said they believe as I do either publicly or in PM...until you... Maybe Joe if I stretch and interpret things a certain way...but I think he's not really taking a position, only stating fact and answering questions...

Anyway, carry on... ;-)
Will do, For What it's worth I'm quite over it. AFAIK Joe's position has long been one of 'The biggest overall efficiency gains for the project will be made by incorporating small improvements back into the stock application' ... where mine is a slight extension on that 'improvements must first be found elsewhere before they can be incorporated'. For example stock code optimisations currently include a sizeable proportion of algorithmic improvements from earlier Chicken/AK/Joe & Ben's improvements using a so-called 'dispatch wrapper' mechanism...

Back On Topic of Porting AK v8: I see no reason *some* newer SSE3 and SSSE3 improvements in this current AK version 8 technical leap shouldn't gradually filter back through SSE2 variants and to stock, but expect the process to be a long one involving 'shoehorns and crowbars'. Without the existence of the mechanisms to allow these new builds, and the tireless efforts of those involved (well before my time) , IMO it is unlikely that improvements much beyond basic SSE would have made it into stock...

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731680 · Report as offensive
Profile [AF>france>pas-de-calais]symaski62
Volunteer tester

Send message
Joined: 12 Aug 05
Posts: 258
Credit: 100,548
RAC: 0
France
Message 731718 - Posted: 29 Mar 2008, 9:44:47 UTC

CPU type: GenuineIntel
Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz [x86 Family 6 Model 15 Stepping 13]
Number of CPUs: 2
Operating System: Microsoft Windows Vista
Home Edition, Service Pack 1, (06.00.6001.00)
Memory: 1022.64 MB
Cache: 976.56 KB
Measured floating point speed: 1702.56 million ops/sec
Measured integer speed: 3777.03 million ops/sec



Name: 13fe08ac.8009.12342.10.7.58
CPU time: 11,066.09 sec

/\\
||

VS
||
\\/


CPU type: GenuineIntel
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [x86 Family 6 Model 15 Stepping 11]
Number of CPUs: 4
Operating System: Microsoft Windows XP
Professional Edition, Service Pack 2, (05.01.2600.00)
Memory: 3069.5 MB
Cache: 488.28 KB
Measured floating point speed: 2313.62 million ops/sec
Measured integer speed: 5000.1 million ops/sec


Name: 13fe08ac.8009.12342.10.7.58
CPU time: 7,543.28 sec

:)
SETI@Home Informational message -9 result_overflow
with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding.
ID: 731718 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 731725 - Posted: 29 Mar 2008, 10:08:36 UTC - in response to Message 731680.  

I'm sorry you experienced that. I was not aware that had happened. I don't think there is any more support for it, other than me. Nobody else has really said they believe as I do either publicly or in PM...until you... Maybe Joe if I stretch and interpret things a certain way...but I think he's not really taking a position, only stating fact and answering questions...

Anyway, carry on... ;-)
Will do, For What it's worth I'm quite over it. AFAIK Joe's position has long been one of 'The biggest overall efficiency gains for the project will be made by incorporating small improvements back into the stock application' ... where mine is a slight extension on that 'improvements must first be found elsewhere before they can be incorporated'. For example stock code optimisations currently include a sizeable proportion of algorithmic improvements from earlier Chicken/AK/Joe & Ben's improvements using a so-called 'dispatch wrapper' mechanism...

Back On Topic of Porting AK v8: I see no reason *some* newer SSE3 and SSSE3 improvements in this current AK version 8 technical leap shouldn't gradually filter back through SSE2 variants and to stock, but expect the process to be a long one involving 'shoehorns and crowbars'. Without the existence of the mechanisms to allow these new builds, and the tireless efforts of those involved (well before my time) , IMO it is unlikely that improvements much beyond basic SSE would have made it into stock...

Jason

Jason......
Your efforts are lost on a few.
They do not understand......
Your coding will benefit the whole project when released........
And some day may parts of it my be embedded into the stock app......
That's the way it went with the Chicken...........
Just rock on..............
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 731725 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731727 - Posted: 29 Mar 2008, 10:19:02 UTC - in response to Message 731725.  
Last modified: 29 Mar 2008, 10:22:07 UTC


Jason......
Your efforts are lost on a few.
They do not understand......
Your coding will benefit the whole project when released........
And some day may parts of it my be embedded into the stock app......
That's the way it went with the Chicken...........
Just rock on..............


Let's Just make it clear now that it's Alex's code, ported with lots of help from Alex, and Joe and others as well. [Including problems found and resolved that JDWhale mentions earlier]

I'm guessing that Alex will soon be making a v9 release so we can start the cycle again :D ["The Wheel it Turns, Forever, Round and Round", Kai , last of the the Brunnen-G, the Lexx]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731727 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 731735 - Posted: 29 Mar 2008, 10:39:46 UTC

Well JD/Jason a big congrats to the pair of you.

As for Alex I hope something is coming, the increase from running in Quad Channel mode is only around 8%, not bad I know but I was hoping for slightly more, would have been happy with 10-12%
ID: 731735 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731741 - Posted: 29 Mar 2008, 11:07:50 UTC - in response to Message 731735.  

Well JD/Jason a big congrats to the pair of you.

As for Alex I hope something is coming, the increase from running in Quad Channel mode is only around 8%, not bad I know but I was hoping for slightly more, would have been happy with 10-12%

hmmm, from the code I've been seeing, the memory access patterns would be really efficient. I'd take a guess, without having run extensive profiles yet, that a mac running v8 code might be cpu bound with Quad channel memory... next cpu upgrade would fix that wouldn't it? ( Does apple do that ? )

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731741 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 731747 - Posted: 29 Mar 2008, 11:16:57 UTC

Jason the next Mac Pro update will probably be to Nehalem come next January. So we are stuck with what we got. I might take out the stock 2GB and see what happens. Will do a little more searching to try and find the most efficient crunching method.
ID: 731747 · Report as offensive
_heinz
Volunteer tester

Send message
Joined: 25 Feb 05
Posts: 744
Credit: 5,539,270
RAC: 0
France
Message 731762 - Posted: 29 Mar 2008, 12:56:23 UTC

Hi all,

http://lunatics.kwsn.net/ it is down again.

did any body know what happened ?

heinz
ID: 731762 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 731764 - Posted: 29 Mar 2008, 13:01:45 UTC - in response to Message 731762.  
Last modified: 29 Mar 2008, 13:21:49 UTC

Hi all,

http://lunatics.kwsn.net/ it is down again.

did any body know what happened ?

heinz


Not sure Heinz. If & when it's back up, could you help me move all the bench tests into a new folder in TestProject [for safe keeping]? We can arrange them nicely in folders by machine type and coordinate by PM. I had plans to do this when the site went down :C [PS server problems are here again, I'm going to block SVN at the router for a while to keep things safe, remind me if I forget to wake things up... The sky is fallin'! ....]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 731764 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 731772 - Posted: 29 Mar 2008, 13:33:08 UTC - in response to Message 731727.  


Jason......
Your efforts are lost on a few.
They do not understand......
Your coding will benefit the whole project when released........
And some day may parts of it my be embedded into the stock app......
That's the way it went with the Chicken...........
Just rock on..............


Let's Just make it clear now that it's Alex's code, ported with lots of help from Alex, and Joe and others as well. [Including problems found and resolved that JDWhale mentions earlier]

I'm guessing that Alex will soon be making a v9 release so we can start the cycle again :D ["The Wheel it Turns, Forever, Round and Round", Kai , last of the the Brunnen-G, the Lexx]


Let's also make it clear that the hoopla from prior "efforts" along these lines by other individuals that are not currently involved seemed to be more of efforts to improve "self" and not "everyone". I'm glad this time around, people are actually collaborating and appear to have everyone's best interest at heart...
ID: 731772 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 731773 - Posted: 29 Mar 2008, 13:33:45 UTC - in response to Message 729172.  
Last modified: 29 Mar 2008, 13:40:50 UTC

On msattler's Penryn host, some recent returned results show:

Windows optimized S@H Enhanced application by Alex Kan
Version info: OS X SSSE3 (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan


Now there is a new batch, this one appearing thus in stderr out
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSE4.1 (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE4.1 Win32 rev 20 Pre-Release, Ported by : Jason G, Joe Segur, Alex Kan, Raistmer

Again, sadly, the variety of Angle Range so far is very slight. To facilitate comparison, I've used a linear Angle Range scale, magnified in a very small range.



In the legend, I've called the current one KanJasG20, and the previous one Akan_JGport. In the expanded view, it appears the two are equal in performance. I suspect the "upward stragglers" may be moment of conversion mixed cases, though I've not checked that guess.


I've also updated moments ago the graphs I posted for two JDWhale hosts. They should appear in the previous messages in this thread if you use the update button on your browser.

I should have mentioned in my first JDWhale host image postings that all data collection here was done using Fred W's "data vac".
ID: 731773 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 731798 - Posted: 29 Mar 2008, 14:16:05 UTC

Beautiful!!! I've got to get me a "Frozen Penny"... LOL

I noticed on the Whale plots, the 20 minute savings per WU between versions 0.1 and 0.2 best depicted my these mid-range WU's that fill my current cache.

BOINC ON!!!!
ID: 731798 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 731810 - Posted: 29 Mar 2008, 14:31:18 UTC
Last modified: 29 Mar 2008, 14:31:51 UTC

Frozne Penny munches on with the forbidden code......
Purely for testing purposes, of course.......got a couple of data vacs out there scoping the results.....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 731810 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 731840 - Posted: 29 Mar 2008, 15:23:46 UTC

Thats really cool you were able to get this going...Boinc on and let the rest of us have a go soon....
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 731840 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 731852 - Posted: 29 Mar 2008, 15:49:32 UTC
Last modified: 29 Mar 2008, 15:50:29 UTC

I look forwards to the trial results and conclusions of the Jason/JD/et al port of AK's code to Windows driven Core 2 Duo and Quads coming out, and being able to run the same myself.

Clearly you guys need to BOINC on and we can do the same soon.

One non-frozen Penny waiting in the wings, a little slower (by >600MHz) but still eager to try out the new fare!
It's good to be back amongst friends and colleagues



ID: 731852 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 50 · Next

Message boards : Number crunching : Windows port of Alex v8 code


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.