Message boards :
Number crunching :
Windows port of Alex v8 code
Message board moderation
Author | Message |
---|---|
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
After 13 hours nonstop porting, the first WU processed validates...resultid=783586431 Processor isn't very impressive E4500 running @ 2418 mHz on a crippled ECS 945GCT-M MoBo with disfunctional Crucial Ballistix memory that fails about every 6-7 hours. OS is Windows XP running diskless with BoincPE. Build is using : MS Visual Studio 2005 Professional Intel C++ 10.1.020 Intel IPP 5.3.2.073 Here is the compile commandline: /c /O3 /Og /Ob2 /Oi /Ot /Oy /GT /GA /I "C:\\Program Files\\Intel\\IPP\\5.3.2.073\\ia32\\tools\\staticlib" /I "C:\\Program Files\\Intel\\IPP\\5.3.2.073\\ia32\\include" /I "C:\\Program Files\\Intel\\Compiler\\C++\\10.1.020\\IA32\\include" /I "../../../boinc/win_build" /I "../../../boinc" /I "../../../boinc/api" /I "../../../boinc/api/win" /I "../../../boinc/client/win" /I "../../../boinc/" /I "../../../boinc/lib" /I "../../image_libs" /I "../../jpeglib" /I "../../db" /I "../../glut" /I "../../" /I "../" /D "USE_IPP" /D "USE_SSSE3" /D "USE_I386_OPTIMIZATIONS" /D "USE_I386_CORE2" /D "__INTEL_COMPILER" /D "WIN32" /D "_MT" /D "NDEBUG" /D "_WINDOWS" /D "CLIENT" /D "_CONSOLE" /D "NBOINC_APP_GRAPHICS" /D "_MBCS" /D "_VC80_UPGRADE=0x0710" /GF /FD /EHsc /MT /Zp16 /GS- /Gy /GR /Yc"..\\StdAfx.h" /Fp".\\Release/seti_boinc.pch" /Fo".\\Release/" /W3 /nologo /Zi /Gd /TP /FI "win-config.h" /fp:fast /Qprec-div- /Qprec-sqrt- /Qfp-speculationfast /QxO I'm not very familiar developing on Windows with these tools (first try, I just installed VS2005PRO & Intel Compiler yesterday). If anyone has suggestions for better options, please let me know and I'll try. I'll let it run while I get a few hours sleep and check some other AR's, Now it's 3:35am, time to get some shut eye. These WU's have already been detatched by the project, so nothing lost by trying to crunch them with first attempt at ported code. Regards, JDWhale |
David Send message Joined: 19 May 99 Posts: 411 Credit: 1,426,457 RAC: 0 |
Interesting, thanks. |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
Many thanks for the effort. We look forwards to the results. It's good to be back amongst friends and colleagues |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
1,574 seconds for a VHAR at 2.4GHz is very honorable - impressive even, especially for a first run after a session like that. Congratulations! Enjoy your well-earned sleep, but I hope you had time to crack a celebratory beer first. Once you get it bedded down and checked at other ARs, I'd be happy to give it a benchmarking run on one of the quaddies where we've already got plots for stock, Chicken and Crunch3r. Edit - 781770844 has come in too. 5,897 seconds for 63.98 cr, and valid. Looking good. |
W-K 666 Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67 |
1,574 seconds for a VHAR at 2.4GHz is very honorable - impressive even, especially for a first run after a session like that. Congratulations! Enjoy your well-earned sleep, but I hope you had time to crack a celebratory beer first. Thats an impressive speed up, your previous 63.98 cr units took ~7,100 secs. |
David Send message Joined: 19 May 99 Posts: 411 Credit: 1,426,457 RAC: 0 |
Edit - 781770844 has come in too. 5,897 seconds for 63.98 cr, and valid. Looking good. Wow it takes my Quaddies 5,500-5,800 seconds to earn that sort of credit (Give or take a minute), and thats clocked at 3033 Mhz for the media PC and 3105 Mhz for mine. Compare that to JDWhale's PC which is a a reasonable bit slower, so I'd say the code is really working well. As long as the science is accurate then I'd say it's a real winner |
Adri Send message Joined: 27 Apr 07 Posts: 56 Credit: 132,673 RAC: 0 |
WOW!!! THIS IS AWESOME!!!! Now, we shall wait till this gets incorporated into another update of Crunchers app... eeekkk!!!! |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Looks like you going to be the most popular guy on NC for a while ;) For a .38 AR WU, you've knocked 12.5% off the time it takes my Q6600 clocked at 3336MHz running Crunch3r's SSSE3. I'll post a quick comparison chart when there are a few more results to go on (perhaps this evening UTC), but for a true measure I support Richard H's proposal that, once it is bedded down you let him run it on a box that has already been used for benchmarking other apps - that way we reduce the variability introduced by hardware. Excellent piece of work so far, F. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Looks like you going to be the most popular guy on NC for a while ;) *** Hi there , *** impressive peace off work John, think a lot off the WINDOWERS are waiting for such results. Done a great job ! |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
I'll admit that I didn't check that slew of options you have already, and I admit that I'm a bit rusty in Visual Studio, but if you don't have it enabled, you may want to enable runtime checks. I would guess this probably has already been done, but it won't hurt. What it will do is give you the ability to handle runtime problems, such as array out of bounds, among other things... It will also give you the capability of generating output to file and continue on. I found it useful for tracking down array problems with null terminated strings where the prior developer had not allowed for the null. IIRC, it will also report on uninitialized variables... I'm heading out to a friend's house...and probably won't be able to post until after 10pm EDT tonight... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hi JDWhale, Below are the compiler options for comparison [fairly similar] from the seti_boinc project of 'our' pre alpha port of the same AK_v8 code... Expect significant speedup particularly with regards to pulse finding on core 2 [SSSE3] and higher [SSE4.1] machines, Some decent improvement on SSE3 p4 builds also. Testing offline with knabench (available from the Lunatics site) or similar would allow direct comparison between apps (say against 2.4V SSSE3 or a stock app) with the same WU, and can test with pre-shortened workunits speeding development, and enhancing repeatability. Yes /O2 optimisations test a few percent faster than /O3 with our builds, as they [Lunatics] did also with 2.2b and 2.4 / 2.4V. I would be interested to know if you showed improvement using /O3. Jason [P.S my new Intel stuff arrives soon, so I haven't bothered updating for a while ... Must go to IPP5.3 ASAP]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Adrian Taylor Send message Joined: 22 Apr 01 Posts: 95 Credit: 10,933,449 RAC: 0 |
well done JDWhale i'm so glad that someone has gone from the land of BS on this and into the world of actually doing something...! i hope you have great success, although as a mac user i should be miffed, but its all for the science eh ? also glad that ? isnt involved, how refreshing :-) keep up the good work regards adrian 63. (1) (b) "music" includes sounds wholly or predominantly characterised by the emission of a succession of repetitive beats |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49 |
After 13 hours nonstop porting, the first WU processed validates...resultid=783586431 Impressive, Sounds very promising, So I'll keep an eye on this. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Hi all, Thanks for all the positive feedback... but all is not well with the WHALEapp just yet. Some WU's enter what seems to be "wait state", mostly within the first minute of execution. Not what you want for benchmarking or daily driver :-( But those VHAR WUs sure haul a**! I'll try changing some compiler options as suggested and try again. I really don't want to debug this, too many years on the job have burned me out. I'll answer the obvious question before someone asks... Yes those constant 913 second run times are true for the 17 credit VHAR WUs. Only thing was I set BOINC to run on only "1" processor on this E4500 Core2 Duo. No memory or cache contention, the whole 2MB L2 mostly for the single process. Just what "High Priority" WUs deserve! I had to shutdown the system to replace that flakey Ballistix memory, the only sticks I had laying around were unmatched 2 x 1GB DDR2-667. This system actually runs better single channel than with the unmatched sticks, but because I've configured the RAM drive (remember BOINC_PE) to use 768MB, that didn't leave a whole lot of memory for the OS & processes, so I went with the unmatched sticks for now. Anyway, I'm putting the little C2Duo back on KWSN_2.4V_SSSE3_MB.exe for the time being. Cheers to all, John |
Gecko Send message Joined: 17 Nov 99 Posts: 454 Credit: 6,946,910 RAC: 47 |
JDWhale: GREAT job!!! Thanks for your efforts & the surprise. It's always awesome to see continued examples of the talent and generosity that make up the community. Hat's-off to you sir! BTW, please check your PM. |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
|
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Okay... we're back on... I found the error of my ways in gaussFit.cpp. The substituion for the convolution call was incorrect and causing buffer overrun to the output buffer. [Edit] Thanks for the tip, you know who you are![/edit] That's what I get for pulling a replacement function out of thin air, not knowing the exact behavior of the original, and not carefully reading the user guide for the IPP replacement. 12 hours and a pint of Vodka can have that effect. I digress... Unfortunately, I also changed some compile options and don't have any VHARs left for comparison. Prior crunched WU's can not be used for comparison as the host was left unattended and WU's were stalling, thus favoring the WU left running without contention. I've launched a direct comparison of WhaleApp vs. 2.4V...on WU's from same splitter block (I hope that term is correct)... 2.4V wuid=236103546 wuid=236103433 vs. WhaleApp wuid=236103598 wuid=236103652 The WhaleApp WUs haven't reported yet, just kicked them off simultaneously, will perform update as soon as they finish, I predict before 23:00 UDT. Similarly the 2.4V WUs above kicked off simultaneously and virtually at the same time. (never mind the crud at the beginning list, it just shows that I tried to crunch them with the earlier WhaleApp, but they both stalled less than a minute in the run. -------------------------- I know that this is not the best host, but it is representative of what is currently available for folks on a budget. Intel Core2Duo E4500 @ 2418 MHz on ECS 945GCT-M with mismatched DDR2-667 Running WindowsXP diskless via BoincPE BOINC On..On... |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49 |
Okay... we're back on... I found the error of my ways in gaussFit.cpp. The substitution for the convolution call was incorrect and causing buffer overrun to the output buffer. [Edit] Thanks for the tip, you know who you are![/edit] That's what I get for pulling a replacement function out of thin air, not knowing the exact behavior of the original, and not carefully reading the user guide for the IPP replacement. 12 hours and a pint of Vodka can have that effect. I digress... I always wondered what Whales did with those large brains, Drink vodka and do math like crazy. ;) Glad You got It sorted, Now as soon as It validates, You'll need some guinea pigs, Er lab rats, Er volunteers. Yeah that's It(Couldn't help Myself). The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
SATAN Send message Joined: 27 Aug 06 Posts: 835 Credit: 2,129,006 RAC: 0 |
John, well done and congratulations on putting your money where you mouth is. Let's hope they validate. |
Logan Send message Joined: 26 Jan 07 Posts: 743 Credit: 918,353 RAC: 0 |
:) SSE3 2.4 from lunatics (C2D version, really SSSE3) and 6.10 v3 from Crunch3r inside.;) Good combination, Joker... Try it! Is the best what I tested since today... Logan. BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.