Windows port of Alex v8 code


log in

Advanced search

Message boards : Number crunching : Windows port of Alex v8 code

1 · 2 · 3 · 4 . . . 50 · Next
Author Message
Profile JDWhale
Volunteer tester
Avatar
Send message
Joined: 6 Apr 99
Posts: 921
Credit: 20,026,572
RAC: 1,962
United States
Message 727273 - Posted: 17 Mar 2008, 8:36:15 UTC

After 13 hours nonstop porting, the first WU processed validates...resultid=783586431

Processor isn't very impressive E4500 running @ 2418 mHz on a crippled ECS 945GCT-M MoBo with disfunctional Crucial Ballistix memory that fails about every 6-7 hours. OS is Windows XP running diskless with BoincPE.

Build is using :
MS Visual Studio 2005 Professional
Intel C++ 10.1.020
Intel IPP 5.3.2.073


Here is the compile commandline:

/c /O3 /Og /Ob2 /Oi /Ot /Oy /GT /GA /I "C:\\Program Files\\Intel\\IPP\\5.3.2.073\\ia32\\tools\\staticlib" /I "C:\\Program Files\\Intel\\IPP\\5.3.2.073\\ia32\\include" /I "C:\\Program Files\\Intel\\Compiler\\C++\\10.1.020\\IA32\\include" /I "../../../boinc/win_build" /I "../../../boinc" /I "../../../boinc/api" /I "../../../boinc/api/win" /I "../../../boinc/client/win" /I "../../../boinc/" /I "../../../boinc/lib" /I "../../image_libs" /I "../../jpeglib" /I "../../db" /I "../../glut" /I "../../" /I "../" /D "USE_IPP" /D "USE_SSSE3" /D "USE_I386_OPTIMIZATIONS" /D "USE_I386_CORE2" /D "__INTEL_COMPILER" /D "WIN32" /D "_MT" /D "NDEBUG" /D "_WINDOWS" /D "CLIENT" /D "_CONSOLE" /D "NBOINC_APP_GRAPHICS" /D "_MBCS" /D "_VC80_UPGRADE=0x0710" /GF /FD /EHsc /MT /Zp16 /GS- /Gy /GR /Yc"..\\StdAfx.h" /Fp".\\Release/seti_boinc.pch" /Fo".\\Release/" /W3 /nologo /Zi /Gd /TP /FI "win-config.h" /fp:fast /Qprec-div- /Qprec-sqrt- /Qfp-speculationfast /QxO

I'm not very familiar developing on Windows with these tools (first try, I just installed VS2005PRO & Intel Compiler yesterday). If anyone has suggestions for better options, please let me know and I'll try.


I'll let it run while I get a few hours sleep and check some other AR's, Now it's 3:35am, time to get some shut eye.

These WU's have already been detatched by the project, so nothing lost by trying to crunch them with first attempt at ported code.

Regards,
JDWhale
____________

Profile David
Volunteer tester
Avatar
Send message
Joined: 19 May 99
Posts: 411
Credit: 1,328,396
RAC: 0
Australia
Message 727274 - Posted: 17 Mar 2008, 8:39:41 UTC

Interesting, thanks.

____________

Profile John Clark
Volunteer tester
Avatar
Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 727275 - Posted: 17 Mar 2008, 9:12:14 UTC

Many thanks for the effort. We look forwards to the results.
____________
It's good to be back amongst friends and colleagues



Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8670
Credit: 51,867,196
RAC: 49,385
United Kingdom
Message 727280 - Posted: 17 Mar 2008, 9:25:30 UTC
Last modified: 17 Mar 2008, 9:30:30 UTC

1,574 seconds for a VHAR at 2.4GHz is very honorable - impressive even, especially for a first run after a session like that. Congratulations! Enjoy your well-earned sleep, but I hope you had time to crack a celebratory beer first.

Once you get it bedded down and checked at other ARs, I'd be happy to give it a benchmarking run on one of the quaddies where we've already got plots for stock, Chicken and Crunch3r.

Edit - 781770844 has come in too. 5,897 seconds for 63.98 cr, and valid. Looking good.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8704
Credit: 25,179,994
RAC: 29,028
United Kingdom
Message 727290 - Posted: 17 Mar 2008, 10:20:43 UTC - in response to Message 727280.
Last modified: 17 Mar 2008, 10:21:03 UTC

1,574 seconds for a VHAR at 2.4GHz is very honorable - impressive even, especially for a first run after a session like that. Congratulations! Enjoy your well-earned sleep, but I hope you had time to crack a celebratory beer first.

Once you get it bedded down and checked at other ARs, I'd be happy to give it a benchmarking run on one of the quaddies where we've already got plots for stock, Chicken and Crunch3r.

Edit - 781770844 has come in too. 5,897 seconds for 63.98 cr, and valid. Looking good.

Thats an impressive speed up, your previous 63.98 cr units took ~7,100 secs.

Profile David
Volunteer tester
Avatar
Send message
Joined: 19 May 99
Posts: 411
Credit: 1,328,396
RAC: 0
Australia
Message 727291 - Posted: 17 Mar 2008, 10:23:06 UTC - in response to Message 727280.

Edit - 781770844 has come in too. 5,897 seconds for 63.98 cr, and valid. Looking good.


Wow it takes my Quaddies 5,500-5,800 seconds to earn that sort of credit (Give or take a minute), and thats clocked at 3033 Mhz for the media PC and 3105 Mhz for mine. Compare that to JDWhale's PC which is a a reasonable bit slower, so I'd say the code is really working well.

As long as the science is accurate then I'd say it's a real winner
____________

Profile Adri
Volunteer tester
Send message
Joined: 27 Apr 07
Posts: 56
Credit: 132,673
RAC: 0
Malaysia
Message 727301 - Posted: 17 Mar 2008, 11:35:41 UTC

WOW!!! THIS IS AWESOME!!!!

Now, we shall wait till this gets incorporated into another update of Crunchers app... eeekkk!!!!
____________

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 727303 - Posted: 17 Mar 2008, 11:55:29 UTC

Looks like you going to be the most popular guy on NC for a while ;)

For a .38 AR WU, you've knocked 12.5% off the time it takes my Q6600 clocked at 3336MHz running Crunch3r's SSSE3.

I'll post a quick comparison chart when there are a few more results to go on (perhaps this evening UTC), but for a true measure I support Richard H's proposal that, once it is bedded down you let him run it on a box that has already been used for benchmarking other apps - that way we reduce the variability introduced by hardware.

Excellent piece of work so far,

F.
____________

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,901,007
RAC: 2,731
Netherlands
Message 727306 - Posted: 17 Mar 2008, 12:25:49 UTC - in response to Message 727303.

Looks like you going to be the most popular guy on NC for a while ;)

For a .38 AR WU, you've knocked 12.5% off the time it takes my Q6600 clocked at 3336MHz running Crunch3r's SSSE3.

I'll post a quick comparison chart when there are a few more results to go on (perhaps this evening UTC), but for a true measure I support Richard H's proposal that, once it is bedded down you let him run it on a box that has already been used for benchmarking other apps - that way we reduce the variability introduced by hardware.

Excellent piece of work so far,

F.



*** Hi there , ***

impressive peace off work John, think a lot off the WINDOWERS are waiting for such results.

Done a great job !
____________

Brian Silvers
Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 727308 - Posted: 17 Mar 2008, 12:46:21 UTC - in response to Message 727273.
Last modified: 17 Mar 2008, 12:47:28 UTC


I'm not very familiar developing on Windows with these tools (first try, I just installed VS2005PRO & Intel Compiler yesterday). If anyone has suggestions for better options, please let me know and I'll try.


I'll admit that I didn't check that slew of options you have already, and I admit that I'm a bit rusty in Visual Studio, but if you don't have it enabled, you may want to enable runtime checks. I would guess this probably has already been done, but it won't hurt. What it will do is give you the ability to handle runtime problems, such as array out of bounds, among other things... It will also give you the capability of generating output to file and continue on. I found it useful for tracking down array problems with null terminated strings where the prior developer had not allowed for the null. IIRC, it will also report on uninitialized variables...

I'm heading out to a friend's house...and probably won't be able to post until after 10pm EDT tonight...
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5057
Credit: 73,919,452
RAC: 12,055
Australia
Message 727312 - Posted: 17 Mar 2008, 12:57:10 UTC
Last modified: 17 Mar 2008, 13:29:28 UTC

Hi JDWhale,
Below are the compiler options for comparison [fairly similar] from the seti_boinc project of 'our' pre alpha port of the same AK_v8 code... Expect significant speedup particularly with regards to pulse finding on core 2 [SSSE3] and higher [SSE4.1] machines, Some decent improvement on SSE3 p4 builds also.

Testing offline with knabench (available from the Lunatics site) or similar would allow direct comparison between apps (say against 2.4V SSSE3 or a stock app) with the same WU, and can test with pre-shortened workunits speeding development, and enhancing repeatability.

Yes /O2 optimisations test a few percent faster than /O3 with our builds, as they [Lunatics] did also with 2.2b and 2.4 / 2.4V. I would be interested to know if you showed improvement using /O3.

Jason
[P.S my new Intel stuff arrives soon, so I haven't bothered updating for a while ... Must go to IPP5.3 ASAP]


/c /O2 /Og /Ob2 /Oi /Ot /GA /I "C:\\Program Files\\Intel\\IPP\\5.2\\ia32\\tools\\staticlib" /I "C:\\Program Files\\Intel\\IPP\\5.2\\ia32\\include" /I "C:\\Program Files\\Intel\\Compiler\\C++\\10.1.013\\IA32\\include" /I "../../../boinc/win_build" /I "../.." /I "../../../boinc" /I "../../../boinc/api" /I "../../../boinc/api/win" /I "../../../boinc/client/win" /I "../../../boinc/" /I "../../../boinc/lib" /I ".." /I "../../image_libs" /I "../../jpeglib" /I "../../db" /I "../../glut" /I "../../" /I "../../../lib/fftw-3.0.1/api" /D "WIN32" /D "_WIN32" /D "_MT" /D "NDEBUG" /D "_WINDOWS" /D "CLIENT" /D "_CONSOLE" /D "USE_I386_OPTIMIZATIONS" /D "USE_IPP" /D "USE_I386_CORE2" /D "__SSE__" /D "__SSE2__" /D "__SSE3__" /D "__INTEL_COMPILER" /D "_ATL_MIN_CRT" /D "_MBCS" /D "_VC80_UPGRADE=0x0710" /GF /Gm /MT /GS- /Gy /GR /Fo"D:\\BoincSeti_Prog\\sinbad_repositories\\AK_v8\\client\\win_build\\\\Release SSSE3\\Intermediate\\seti_boinc/" /W3 /nologo /Zi /Gd /TP /FI "win-config.h" /fp:fast /Qprec-div- /Qprec-sqrt- /Qipo- /Qftz /QxO

____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Adrian Taylor
Volunteer tester
Avatar
Send message
Joined: 22 Apr 01
Posts: 94
Credit: 9,068,394
RAC: 107,274
United Kingdom
Message 727322 - Posted: 17 Mar 2008, 13:39:37 UTC - in response to Message 727273.

well done JDWhale

i'm so glad that someone has gone from the land of BS on this and into the world of actually doing something...!

i hope you have great success, although as a mac user i should be miffed, but its all for the science eh ?

also glad that ? isnt involved, how refreshing :-)

keep up the good work

regards

adrian
____________
63. (1) (b) "music" includes sounds wholly or predominantly characterised by the emission of a succession of repetitive beats

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46541
Credit: 36,892,357
RAC: 5,321
United States
Message 727333 - Posted: 17 Mar 2008, 14:52:30 UTC - in response to Message 727273.

After 13 hours nonstop porting, the first WU processed validates...resultid=783586431

Processor isn't very impressive E4500 running @ 2418 MHz on a crippled ECS 945GCT-M MoBo with dysfunctional Crucial Ballistix memory that fails about every 6-7 hours. OS is Windows XP running diskless with BoincPE.

Build is using :
MS Visual Studio 2005 Professional
Intel C++ 10.1.020
Intel IPP 5.3.2.073


Here is the compile commandline:

/c /O3 /Og /Ob2 /Oi /Ot /Oy /GT /GA /I "C:\\Program Files\\Intel\\IPP\\5.3.2.073\\ia32\\tools\\staticlib" /I "C:\\Program Files\\Intel\\IPP\\5.3.2.073\\ia32\\include" /I "C:\\Program Files\\Intel\\Compiler\\C++\\10.1.020\\IA32\\include" /I "../../../boinc/win_build" /I "../../../boinc" /I "../../../boinc/api" /I "../../../boinc/api/win" /I "../../../boinc/client/win" /I "../../../boinc/" /I "../../../boinc/lib" /I "../../image_libs" /I "../../jpeglib" /I "../../db" /I "../../glut" /I "../../" /I "../" /D "USE_IPP" /D "USE_SSSE3" /D "USE_I386_OPTIMIZATIONS" /D "USE_I386_CORE2" /D "__INTEL_COMPILER" /D "WIN32" /D "_MT" /D "NDEBUG" /D "_WINDOWS" /D "CLIENT" /D "_CONSOLE" /D "NBOINC_APP_GRAPHICS" /D "_MBCS" /D "_VC80_UPGRADE=0x0710" /GF /FD /EHsc /MT /Zp16 /GS- /Gy /GR /Yc"..\\StdAfx.h" /Fp".\\Release/seti_boinc.pch" /Fo".\\Release/" /W3 /nologo /Zi /Gd /TP /FI "win-config.h" /fp:fast /Qprec-div- /Qprec-sqrt- /Qfp-speculationfast /QxO

I'm not very familiar developing on Windows with these tools (first try, I just installed VS2005PRO & Intel Compiler yesterday). If anyone has suggestions for better options, please let me know and I'll try.


I'll let it run while I get a few hours sleep and check some other AR's, Now it's 3:35am, time to get some shut eye.

These WU's have already been detached by the project, so nothing lost by trying to crunch them with first attempt at ported code.

Regards,
JDWhale

Impressive, Sounds very promising, So I'll keep an eye on this.
____________
My Facebook, War Commander, 2015

Profile JDWhale
Volunteer tester
Avatar
Send message
Joined: 6 Apr 99
Posts: 921
Credit: 20,026,572
RAC: 1,962
United States
Message 727347 - Posted: 17 Mar 2008, 16:15:03 UTC
Last modified: 17 Mar 2008, 16:17:18 UTC

Hi all,

Thanks for all the positive feedback... but all is not well with the WHALEapp just yet. Some WU's enter what seems to be "wait state", mostly within the first minute of execution. Not what you want for benchmarking or daily driver :-( But those VHAR WUs sure haul a**! I'll try changing some compiler options as suggested and try again. I really don't want to debug this, too many years on the job have burned me out.

I'll answer the obvious question before someone asks... Yes those constant 913 second run times are true for the 17 credit VHAR WUs. Only thing was I set BOINC to run on only "1" processor on this E4500 Core2 Duo. No memory or cache contention, the whole 2MB L2 mostly for the single process. Just what "High Priority" WUs deserve!

I had to shutdown the system to replace that flakey Ballistix memory, the only sticks I had laying around were unmatched 2 x 1GB DDR2-667. This system actually runs better single channel than with the unmatched sticks, but because I've configured the RAM drive (remember BOINC_PE) to use 768MB, that didn't leave a whole lot of memory for the OS & processes, so I went with the unmatched sticks for now.

Anyway, I'm putting the little C2Duo back on KWSN_2.4V_SSSE3_MB.exe for the time being.

Cheers to all,
John
____________

Gecko
Volunteer tester
Send message
Joined: 17 Nov 99
Posts: 440
Credit: 6,062,469
RAC: 652
United States
Message 727365 - Posted: 17 Mar 2008, 17:16:13 UTC
Last modified: 17 Mar 2008, 17:16:25 UTC

JDWhale: GREAT job!!! Thanks for your efforts & the surprise. It's always awesome to see continued examples of the talent and generosity that make up the community.

Hat's-off to you sir!

BTW, please check your PM.
____________

Profile Sir Ulli
Volunteer tester
Avatar
Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,135,885
RAC: 219
Germany
Message 727408 - Posted: 17 Mar 2008, 19:53:59 UTC

very good job, im am looking at this...

thanks @JDWahle, for yor good work...

Greetings from Germany NRW
Ulli


Profile JDWhale
Volunteer tester
Avatar
Send message
Joined: 6 Apr 99
Posts: 921
Credit: 20,026,572
RAC: 1,962
United States
Message 727432 - Posted: 17 Mar 2008, 21:20:27 UTC
Last modified: 17 Mar 2008, 21:21:35 UTC

Okay... we're back on... I found the error of my ways in gaussFit.cpp. The substituion for the convolution call was incorrect and causing buffer overrun to the output buffer. [Edit] Thanks for the tip, you know who you are![/edit] That's what I get for pulling a replacement function out of thin air, not knowing the exact behavior of the original, and not carefully reading the user guide for the IPP replacement. 12 hours and a pint of Vodka can have that effect. I digress...

Unfortunately, I also changed some compile options and don't have any VHARs left for comparison. Prior crunched WU's can not be used for comparison as the host was left unattended and WU's were stalling, thus favoring the WU left running without contention.

I've launched a direct comparison of WhaleApp vs. 2.4V...on WU's from same splitter block (I hope that term is correct)...

2.4V
wuid=236103546
wuid=236103433

vs.

WhaleApp
wuid=236103598
wuid=236103652


The WhaleApp WUs haven't reported yet, just kicked them off simultaneously, will perform update as soon as they finish, I predict before 23:00 UDT.

Similarly the 2.4V WUs above kicked off simultaneously and virtually at the same time. (never mind the crud at the beginning list, it just shows that I tried to crunch them with the earlier WhaleApp, but they both stalled less than a minute in the run.
--------------------------

I know that this is not the best host, but it is representative of what is currently available for folks on a budget.

Intel Core2Duo E4500 @ 2418 MHz on ECS 945GCT-M with mismatched DDR2-667 Running WindowsXP diskless via BoincPE




BOINC On..On...
____________

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46541
Credit: 36,892,357
RAC: 5,321
United States
Message 727445 - Posted: 17 Mar 2008, 21:58:45 UTC - in response to Message 727432.

Okay... we're back on... I found the error of my ways in gaussFit.cpp. The substitution for the convolution call was incorrect and causing buffer overrun to the output buffer. [Edit] Thanks for the tip, you know who you are![/edit] That's what I get for pulling a replacement function out of thin air, not knowing the exact behavior of the original, and not carefully reading the user guide for the IPP replacement. 12 hours and a pint of Vodka can have that effect. I digress...

Unfortunately, I also changed some compile options and don't have any VHARs left for comparison. Prior crunched WU's can not be used for comparison as the host was left unattended and WU's were stalling, thus favoring the WU left running without contention.

I've launched a direct comparison of WhaleApp vs. 2.4V...on WU's from same splitter block (I hope that term is correct)...

2.4V
wuid=236103546
wuid=236103433

vs.

WhaleApp
wuid=236103598
wuid=236103652


The WhaleApp WUs haven't reported yet, just kicked them off simultaneously, will perform update as soon as they finish, I predict before 23:00 UDT.

Similarly the 2.4V WUs above kicked off simultaneously and virtually at the same time. (never mind the crud at the beginning list, it just shows that I tried to crunch them with the earlier WhaleApp, but they both stalled less than a minute in the run.
--------------------------

I know that this is not the best host, but it is representative of what is currently available for folks on a budget.

Intel Core2Duo E4500 @ 2418 MHz on ECS 945GCT-M with mismatched DDR2-667 Running WindowsXP diskless via BoincPE




BOINC On..On...

I always wondered what Whales did with those large brains, Drink vodka and do math like crazy. ;) Glad You got It sorted, Now as soon as It validates, You'll need some guinea pigs, Er lab rats, Er volunteers. Yeah that's It(Couldn't help Myself).
____________
My Facebook, War Commander, 2015

Profile SATAN
Avatar
Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,094,367
RAC: 0
United Kingdom
Message 727449 - Posted: 17 Mar 2008, 22:16:25 UTC

John, well done and congratulations on putting your money where you mouth is.

Let's hope they validate.
____________

Profile Logan
Volunteer tester
Avatar
Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 727452 - Posted: 17 Mar 2008, 22:22:44 UTC - in response to Message 727445.
Last modified: 17 Mar 2008, 22:57:51 UTC

:)

SSE3 2.4 from lunatics (C2D version, really SSSE3) and 6.10 v3 from Crunch3r inside.;) Good combination, Joker... Try it!

Is the best what I tested since today...
____________
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)

1 · 2 · 3 · 4 . . . 50 · Next

Message boards : Number crunching : Windows port of Alex v8 code

Copyright © 2014 University of California