Windows port of Alex v8 code

Author	Message
JDWhale Volunteer tester Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3	Message 731651 - Posted: 29 Mar 2008, 5:33:28 UTC - in response to Message 731639. I'm not trying to be a "task master" here, carrying a whip. I also simply do not have the skillset to dive in and help, not without taking some serious time to just get up to speed. I'm just trying to be an advocate for the bulk of the user base...so that it helps the project more than just targeting "newer" systems would... The question that comes to mind is "What percentage of that bulk run optimized applications?" If the percentage is not high then there is little impact by providing an optimized app, further diminished if the performance gains are not great. Sorry if that sounds cruel, I do not have that intent, just being a realist. I tried to build for SSE2, but the errors are great in number and the code looks like it has already been modified once at many of those locations. It might be fairly simple to make it work, but I regret that I have no desire learning the routines/intrinsics. Coupled with the fact that my Intel Evaluation Licenses expire in a little over 2 weeks, there are other things I'd rather pursue. Possibly running VTune to identify hotspots/bottlenecks or substituting more IPP routines to increase readability/maintainability as long as performance does not suffer. ID: 731651 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 731654 - Posted: 29 Mar 2008, 5:38:47 UTC - in response to Message 731649. Last modified: 29 Mar 2008, 5:42:36 UTC This code in SSE3 would if It's as fast as It's Intel equivalent would be fantastic news for those with Opteron cpus(PC4 has an Opteron 165 in It that ran 24/7 @ 2.60GHz), My PC4 is shut down cause of this as It performs so poorly on SSE2 compared to the Intel Quads or even the Intel Duals that I've had as to be not worth running, But with this new code It could be worth maybe putting back online eventually, As in after the the 1st of the year, maybe. :D From what I've seen of others' tests, (I don't personally own any AMD since my Athlon died years ago), around 1.07 to 1.49 times, the output of 2.4V SSE2A, depending on angle range, averaging about 1.2 times. Not as huge a leap as the SSSE3 capable machines (1.3 to 1.5x 2.4V SSSE3), but a leap nonetheless. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 731654 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 731662 - Posted: 29 Mar 2008, 6:00:35 UTC - in response to Message 731651. I'm not trying to be a "task master" here, carrying a whip. I also simply do not have the skillset to dive in and help, not without taking some serious time to just get up to speed. I'm just trying to be an advocate for the bulk of the user base...so that it helps the project more than just targeting "newer" systems would... The question that comes to mind is "What percentage of that bulk run optimized applications?" If the percentage is not high then there is little impact by providing an optimized app, further diminished if the performance gains are not great. Sorry if that sounds cruel, I do not have that intent, just being a realist. I tried to build for SSE2, but the errors are great in number and the code looks like it has already been modified once at many of those locations. It might be fairly simple to make it work, but I regret that I have no desire learning the routines/intrinsics. Coupled with the fact that my Intel Evaluation Licenses expire in a little over 2 weeks, there are other things I'd rather pursue. Possibly running VTune to identify hotspots/bottlenecks or substituting more IPP routines to increase readability/maintainability as long as performance does not suffer. I agree with what you're saying, but certain other folks may recognize where I'm going with the battery of questions I'm asking you (you and Jason) and Joe... Perhaps long-term I'd be able to help in it, but instead of asking more people to join, another way to get more work done is to find a way to make as many of the existing hosts more efficient. The project did that with the partially optimized stock application, but full optimization was left to the anonymous platform due to "lack of resources". It would be great if full optimization could eventually be delivered to the larger participant populace by default, rather than by choice. Yes, yes, and it does have to do with real cross-project parity as well...since some projects have optimizations and some don't, and between those that do, you have full vs. partial, with the partial providing a loophole via what you all are doing, which is fantastic work, but it goes against actual "parity" because the everyday user doesn't know about it. When they find out about it and start using it, it skews the comparison used to justify "equality", and credit reductions are demanded or claims of bad citizenship are thrown about, etc, etc, etc... So, there are benefits to having "anonymous platform" only be used for an "OS platform", not for both "OS platform" and "optimization level". It will be a long mountain to climb, for sure, but if it were easy, we'd already have it.... IMO, YMMV, etc, etc, etc.... ID: 731662 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 731673 - Posted: 29 Mar 2008, 6:21:31 UTC - in response to Message 731662. Last modified: 29 Mar 2008, 6:23:30 UTC ... Yes, yes, and it does have to do with real cross-project parity as well...since some projects have optimizations and some don't, and between those that do, you have full vs. partial, with the partial providing a loophole via what you all are doing, which is fantastic work, but it goes against actual "parity" because the everyday user doesn't know about it. When they find out about it and start using it, it skews the comparison used to justify "equality", and credit reductions are demanded or claims of bad citizenship are thrown about, etc, etc, etc... So, there are benefits to having "anonymous platform" only be used for an "OS platform", not for both "OS platform" and "optimization level". It will be a long mountain to climb, for sure, but if it were easy, we'd already have it.... IMO, YMMV, etc, etc, etc.... This belongs in a different thread, I had disparaging remarks made directly at me for spending time trying to introduce (further) optimisations into the stock v6 code, if there is now better support for this then I'd reconsider going back to that next. No-one defended my stance then, so I have all but abandoned my part in that project (for now). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 731673 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 731676 - Posted: 29 Mar 2008, 6:29:58 UTC - in response to Message 731673. ... Yes, yes, and it does have to do with real cross-project parity as well...since some projects have optimizations and some don't, and between those that do, you have full vs. partial, with the partial providing a loophole via what you all are doing, which is fantastic work, but it goes against actual "parity" because the everyday user doesn't know about it. When they find out about it and start using it, it skews the comparison used to justify "equality", and credit reductions are demanded or claims of bad citizenship are thrown about, etc, etc, etc... So, there are benefits to having "anonymous platform" only be used for an "OS platform", not for both "OS platform" and "optimization level". It will be a long mountain to climb, for sure, but if it were easy, we'd already have it.... IMO, YMMV, etc, etc, etc.... This belongs in a different thread, I had disparaging remarks made directly at me for spending time trying to introduce (further) optimisations into the stock v6 code, if there is now better support for this then I'd reconsider going back to that next. No-one defended my stance then, so I have all but abandoned my part in that project (for now). I'm sorry you experienced that. I was not aware that had happened. I don't think there is any more support for it, other than me. Nobody else has really said they believe as I do either publicly or in PM...until you... Maybe Joe if I stretch and interpret things a certain way...but I think he's not really taking a position, only stating fact and answering questions... Anyway, carry on... ;-) ID: 731676 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 731680 - Posted: 29 Mar 2008, 6:46:32 UTC - in response to Message 731676. Last modified: 29 Mar 2008, 7:22:38 UTC I'm sorry you experienced that. I was not aware that had happened. I don't think there is any more support for it, other than me. Nobody else has really said they believe as I do either publicly or in PM...until you... Maybe Joe if I stretch and interpret things a certain way...but I think he's not really taking a position, only stating fact and answering questions... Anyway, carry on... ;-) Will do, For What it's worth I'm quite over it. AFAIK Joe's position has long been one of 'The biggest overall efficiency gains for the project will be made by incorporating small improvements back into the stock application' ... where mine is a slight extension on that 'improvements must first be found elsewhere before they can be incorporated'. For example stock code optimisations currently include a sizeable proportion of algorithmic improvements from earlier Chicken/AK/Joe & Ben's improvements using a so-called 'dispatch wrapper' mechanism... Back On Topic of Porting AK v8: I see no reason some newer SSE3 and SSSE3 improvements in this current AK version 8 technical leap shouldn't gradually filter back through SSE2 variants and to stock, but expect the process to be a long one involving 'shoehorns and crowbars'. Without the existence of the mechanisms to allow these new builds, and the tireless efforts of those involved (well before my time) , IMO it is unlikely that improvements much beyond basic SSE would have made it into stock... Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 731680 ·

[AF>france>pas-de-calais]symaski62 Volunteer tester Send message Joined: 12 Aug 05 Posts: 258 Credit: 100,548 RAC: 0	Message 731718 - Posted: 29 Mar 2008, 9:44:47 UTC CPU type: GenuineIntel Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz [x86 Family 6 Model 15 Stepping 13] Number of CPUs: 2 Operating System: Microsoft Windows Vista Home Edition, Service Pack 1, (06.00.6001.00) Memory: 1022.64 MB Cache: 976.56 KB Measured floating point speed: 1702.56 million ops/sec Measured integer speed: 3777.03 million ops/sec Name: 13fe08ac.8009.12342.10.7.58 CPU time: 11,066.09 sec /\\ \|\| VS \|\| \\/ CPU type: GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [x86 Family 6 Model 15 Stepping 11] Number of CPUs: 4 Operating System: Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00) Memory: 3069.5 MB Cache: 488.28 KB Measured floating point speed: 2313.62 million ops/sec Measured integer speed: 5000.1 million ops/sec Name: 13fe08ac.8009.12342.10.7.58 CPU time: 7,543.28 sec :) SETI@Home Informational message -9 result_overflow with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding. ID: 731718 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 731725 - Posted: 29 Mar 2008, 10:08:36 UTC - in response to Message 731680. I'm sorry you experienced that. I was not aware that had happened. I don't think there is any more support for it, other than me. Nobody else has really said they believe as I do either publicly or in PM...until you... Maybe Joe if I stretch and interpret things a certain way...but I think he's not really taking a position, only stating fact and answering questions... Anyway, carry on... ;-) Will do, For What it's worth I'm quite over it. AFAIK Joe's position has long been one of 'The biggest overall efficiency gains for the project will be made by incorporating small improvements back into the stock application' ... where mine is a slight extension on that 'improvements must first be found elsewhere before they can be incorporated'. For example stock code optimisations currently include a sizeable proportion of algorithmic improvements from earlier Chicken/AK/Joe & Ben's improvements using a so-called 'dispatch wrapper' mechanism... Back On Topic of Porting AK v8: I see no reason some newer SSE3 and SSSE3 improvements in this current AK version 8 technical leap shouldn't gradually filter back through SSE2 variants and to stock, but expect the process to be a long one involving 'shoehorns and crowbars'. Without the existence of the mechanisms to allow these new builds, and the tireless efforts of those involved (well before my time) , IMO it is unlikely that improvements much beyond basic SSE would have made it into stock... Jason Jason...... Your efforts are lost on a few. They do not understand...... Your coding will benefit the whole project when released........ And some day may parts of it my be embedded into the stock app...... That's the way it went with the Chicken........... Just rock on.............. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 731725 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 731727 - Posted: 29 Mar 2008, 10:19:02 UTC - in response to Message 731725. Last modified: 29 Mar 2008, 10:22:07 UTC Jason...... Your efforts are lost on a few. They do not understand...... Your coding will benefit the whole project when released........ And some day may parts of it my be embedded into the stock app...... That's the way it went with the Chicken........... Just rock on.............. Let's Just make it clear now that it's Alex's code, ported with lots of help from Alex, and Joe and others as well. [Including problems found and resolved that JDWhale mentions earlier] I'm guessing that Alex will soon be making a v9 release so we can start the cycle again :D ["The Wheel it Turns, Forever, Round and Round", Kai , last of the the Brunnen-G, the Lexx] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 731727 ·

SATAN Send message Joined: 27 Aug 06 Posts: 835 Credit: 2,129,006 RAC: 0	Message 731735 - Posted: 29 Mar 2008, 10:39:46 UTC Well JD/Jason a big congrats to the pair of you. As for Alex I hope something is coming, the increase from running in Quad Channel mode is only around 8%, not bad I know but I was hoping for slightly more, would have been happy with 10-12% ID: 731735 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 731741 - Posted: 29 Mar 2008, 11:07:50 UTC - in response to Message 731735. Well JD/Jason a big congrats to the pair of you. As for Alex I hope something is coming, the increase from running in Quad Channel mode is only around 8%, not bad I know but I was hoping for slightly more, would have been happy with 10-12% hmmm, from the code I've been seeing, the memory access patterns would be really efficient. I'd take a guess, without having run extensive profiles yet, that a mac running v8 code might be cpu bound with Quad channel memory... next cpu upgrade would fix that wouldn't it? ( Does apple do that ? ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 731741 ·

SATAN Send message Joined: 27 Aug 06 Posts: 835 Credit: 2,129,006 RAC: 0	Message 731747 - Posted: 29 Mar 2008, 11:16:57 UTC Jason the next Mac Pro update will probably be to Nehalem come next January. So we are stuck with what we got. I might take out the stock 2GB and see what happens. Will do a little more searching to try and find the most efficient crunching method. ID: 731747 ·

_heinz Volunteer tester Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0	Message 731762 - Posted: 29 Mar 2008, 12:56:23 UTC Hi all, http://lunatics.kwsn.net/ it is down again. did any body know what happened ? heinz ID: 731762 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 731764 - Posted: 29 Mar 2008, 13:01:45 UTC - in response to Message 731762. Last modified: 29 Mar 2008, 13:21:49 UTC Hi all, http://lunatics.kwsn.net/ it is down again. did any body know what happened ? heinz Not sure Heinz. If & when it's back up, could you help me move all the bench tests into a new folder in TestProject [for safe keeping]? We can arrange them nicely in folders by machine type and coordinate by PM. I had plans to do this when the site went down :C [PS server problems are here again, I'm going to block SVN at the router for a while to keep things safe, remind me if I forget to wake things up... The sky is fallin'! ....] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 731764 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 731772 - Posted: 29 Mar 2008, 13:33:08 UTC - in response to Message 731727. Jason...... Your efforts are lost on a few. They do not understand...... Your coding will benefit the whole project when released........ And some day may parts of it my be embedded into the stock app...... That's the way it went with the Chicken........... Just rock on.............. Let's Just make it clear now that it's Alex's code, ported with lots of help from Alex, and Joe and others as well. [Including problems found and resolved that JDWhale mentions earlier] I'm guessing that Alex will soon be making a v9 release so we can start the cycle again :D ["The Wheel it Turns, Forever, Round and Round", Kai , last of the the Brunnen-G, the Lexx] Let's also make it clear that the hoopla from prior "efforts" along these lines by other individuals that are not currently involved seemed to be more of efforts to improve "self" and not "everyone". I'm glad this time around, people are actually collaborating and appear to have everyone's best interest at heart... ID: 731772 ·

archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0	Message 731773 - Posted: 29 Mar 2008, 13:33:45 UTC - in response to Message 729172. Last modified: 29 Mar 2008, 13:40:50 UTC On msattler's Penryn host, some recent returned results show: Windows optimized S@H Enhanced application by Alex Kan Version info: OS X SSSE3 (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan Now there is a new batch, this one appearing thus in stderr out Windows optimized S@H Enhanced application by Alex Kan Version info: SSE4.1 (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE4.1 Win32 rev 20 Pre-Release, Ported by : Jason G, Joe Segur, Alex Kan, Raistmer Again, sadly, the variety of Angle Range so far is very slight. To facilitate comparison, I've used a linear Angle Range scale, magnified in a very small range. In the legend, I've called the current one KanJasG20, and the previous one Akan_JGport. In the expanded view, it appears the two are equal in performance. I suspect the "upward stragglers" may be moment of conversion mixed cases, though I've not checked that guess. I've also updated moments ago the graphs I posted for two JDWhale hosts. They should appear in the previous messages in this thread if you use the update button on your browser. I should have mentioned in my first JDWhale host image postings that all data collection here was done using Fred W's "data vac". ID: 731773 ·

JDWhale Volunteer tester Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3	Message 731798 - Posted: 29 Mar 2008, 14:16:05 UTC Beautiful!!! I've got to get me a "Frozen Penny"... LOL I noticed on the Whale plots, the 20 minute savings per WU between versions 0.1 and 0.2 best depicted my these mid-range WU's that fill my current cache. BOINC ON!!!! ID: 731798 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 731810 - Posted: 29 Mar 2008, 14:31:18 UTC Last modified: 29 Mar 2008, 14:31:51 UTC Frozne Penny munches on with the forbidden code...... Purely for testing purposes, of course.......got a couple of data vacs out there scoping the results..... "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 731810 ·

hiamps Volunteer tester Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0	Message 731840 - Posted: 29 Mar 2008, 15:23:46 UTC Thats really cool you were able to get this going...Boinc on and let the rest of us have a go soon.... Official Abuser of Boinc Buttons... And no good credit hound! ID: 731840 ·

John Clark Volunteer tester Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0	Message 731852 - Posted: 29 Mar 2008, 15:49:32 UTC Last modified: 29 Mar 2008, 15:50:29 UTC I look forwards to the trial results and conclusions of the Jason/JD/et al port of AK's code to Windows driven Core 2 Duo and Quads coming out, and being able to run the same myself. Clearly you guys need to BOINC on and we can do the same soon. One non-frozen Penny waiting in the wings, a little slower (by >600MHz) but still eager to try out the new fare! It's good to be back amongst friends and colleagues ID: 731852 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.