Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)

Message boards : Number crunching : Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 31 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1796264 - Posted: 15 Jun 2016, 5:32:02 UTC - in response to Message 1796259.  

I have good news, Beta 3 is working perfectly on the PNY LC 580 card, with 3 wu's at a time running, there are no wu's in the 'waiting to run' state, nor are there any wu's running real slow, all are running normally, now I have not seen a guppi run yet, but so far so good!

You say Beta 3 is running. However Beta3 isn't an application- all it is, is an installer.
Are you running CUDA or the SoG application?
Grant
Darwin NT
ID: 1796264 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796270 - Posted: 15 Jun 2016, 6:13:21 UTC

There really isn't a AP CPU SSE4.1 app in this new beta Lunatics installer, correct? The installer defaults to old SSE4.1 app choice for AMD CPU as being preferred over AVX. However there is no SSE4.1 AP app now that Joe Segur left ... correct?? The SSE4.1 choice falls back on the older SSE3 AP app. We AMD users really should choose the AVX option as that one actually gets installed and is faster than SSE3. Once someone confirms my observation, I will rerun the installer and go back to the AVX app.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796270 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1796275 - Posted: 15 Jun 2016, 6:36:35 UTC - in response to Message 1796270.  

There really isn't a AP CPU SSE4.1 app in this new beta Lunatics installer, correct?

Nope.
It's the same installer as the previous one, with the same applications as before with the addition of the SoG application.

As the original post says,
This is basically the v0.44 installer (64-bit only, to start with), with the following changes.

MB v7 legacy support removed
Cuda23 app removed (to make space in the user interface, only)
r3430 SoG app for NVidia included


Beta2 gave the choice of 32 or 64 bit installer & included the new r3430 ATI & Intel GPU versions
Beta3 uses the latest version of the NVidia SoG application- r3472
Grant
Darwin NT
ID: 1796275 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1796279 - Posted: 15 Jun 2016, 6:50:13 UTC - in response to Message 1796270.  

The SSE4.1 choice falls back on the older SSE3 AP app. We AMD users really should choose the AVX option as that one actually gets installed and is faster than SSE3. Once someone confirms my observation, I will rerun the installer and go back to the AVX app.

Yes, it would be helpful if some AMD users could compare the speed of the AVX app with the 'best of the rest' - still only SSE3 so far.

If AVX turns out to be the best for AMD now, I'll disable the special selection for AMD and pre-select AVX when available - at least until somebody steps into Joe's shoes and completes the CPU app range.
ID: 1796279 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796295 - Posted: 15 Jun 2016, 7:27:52 UTC - in response to Message 1796279.  

I always have the dickens trying to find a specific task that just finished in my lists. Still trying to find examples of a MB task that used the SSE3 app versus my normal AVX app. I only was using the SSE3 app for a short while after trying the 0.45 Beta Lunatics installer. Any tasks that were completed with the SSE3 app in the past are long gone now to pick examples from. Just from my recollection .... the AVX app is about 15% faster on non-VLAR tasks and about 20% faster on VLAR tasks compared to SSE3. At least on my AMD CPU hardware. There needs to be more input from other crunchers to form a valid consensus probably before making a judgement call about changing the installer defaults.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796295 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796297 - Posted: 15 Jun 2016, 7:34:58 UTC
Last modified: 15 Jun 2016, 7:55:27 UTC

Is there a way to get the r3472 app in the installer to identify itself in the Manager as either 8.00 or 8.12 version level? The newly formed app_info has both cases defined. I now have one machine using the 8.12 identifier from the manual installation of the app and the other machine still shows the 8.00 identifier in the Manager from using the beta installer. Now I have two different entries for the SoG app in BoincTasks which is sorta hard to read since they are listed in different places.

[Edit] Never mind, I see the other machine is starting to populate with 8.12 tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796297 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1796302 - Posted: 15 Jun 2016, 8:07:35 UTC - in response to Message 1796295.  

At least on my AMD CPU hardware. There needs to be more input from other crunchers to form a valid consensus probably before making a judgement call about changing the installer defaults.

Which looks to be AMD FX-8370 and FX-8350. I'm not familiar with how many AMD CPU types are AVX-enabled these days - but yes, I'd welcome additional input.
ID: 1796302 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34428
Credit: 79,922,639
RAC: 80
Germany
Message 1796305 - Posted: 15 Jun 2016, 8:44:22 UTC - in response to Message 1796302.  

At least on my AMD CPU hardware. There needs to be more input from other crunchers to form a valid consensus probably before making a judgement call about changing the installer defaults.

Which looks to be AMD FX-8370 and FX-8350. I'm not familiar with how many AMD CPU types are AVX-enabled these days - but yes, I'd welcome additional input.


AFAIK all modern AMD CPU`s and APU`s are AVX enabled.


With each crime and every kindness we birth our future.
ID: 1796305 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1796308 - Posted: 15 Jun 2016, 9:00:56 UTC - in response to Message 1796305.  
Last modified: 15 Jun 2016, 9:01:14 UTC

AFAIK all modern AMD CPU`s and APU`s are AVX enabled.

But that's what I'm not familiar with. What is 'all', and what is 'modern'?

As usual, most of my detail comes from Wkipedia. So,

Comparison of AMD processors is incomplete.

List of AMD microprocessors doesn't mention supported features.

But putting the two together, Bulldozer, Bobcat and Jaguar - 2011 and later - should all be AVX enabled? Now to turn that into model numbers identified by BOINC in host listings...
ID: 1796308 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1796309 - Posted: 15 Jun 2016, 9:24:40 UTC - in response to Message 1796305.  

AFAIK all modern AMD CPU`s and APU`s are AVX enabled.

Problem is defining modern.

Not so long ago someone was calling their video card Mid-Range. In a list of all hardware from 18 years ago to now, it was.
As far as current hardware went (even without including Pascal), it was on the bottom rung.
Grant
Darwin NT
ID: 1796309 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1796310 - Posted: 15 Jun 2016, 9:32:34 UTC - in response to Message 1795542.  

. . I think you made the same mistake I did, .....

. . Another brilliant "D'oh!" moment.

Off-topic:
If you use these dots only to make indented text - why not use Alt+255
Like this: 
     I think you made the same mistake I did, .....

     Another brilliant "D'oh!" moment.


P.S.
Alt+255 = press Alt, type 255 (use the Right-num-keys, not the top), release Alt
Then Copy/Paste that ->                <- space (nbsp)                as many times you like.
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1796310 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1796319 - Posted: 15 Jun 2016, 10:09:22 UTC - in response to Message 1796279.  


If AVX turns out to be the best for AMD now, I'll disable the special selection for AMD and pre-select AVX when available - at least until somebody steps into Joe's shoes and completes the CPU app range.

AFAIK Joe never did AP builds. All those from me solely (for Windows).
And there is no AVX-specific optimizations in AP as I can recall now.
The advantage from AVX build could come only from auto-optimization by compiler.
Also, FFTW has separate AVX paths so this speedup will be enabled automatically (even with SSE3 main app binary) on AVX-compatible host.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1796319 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1796324 - Posted: 15 Jun 2016, 10:45:27 UTC - in response to Message 1796319.  

The references to AP in recent posts to this thread all refer back to Keith Meyer's post 1796270

There really isn't a AP CPU SSE4.1 app in this new beta Lunatics installer, correct? The installer defaults to old SSE4.1 app choice for AMD CPU as being preferred over AVX. However there is no SSE4.1 AP app now that Joe Segur left ... correct?? The SSE4.1 choice falls back on the older SSE3 AP app. We AMD users really should choose the AVX option as that one actually gets installed and is faster than SSE3. Once someone confirms my observation, I will rerun the installer and go back to the AVX app.

Let's look at the page:


(image taken on a pre-AVX intel, so enabled options and pre-selects are different)

I think it's clear that Keith's question relates to MB choices, not AP. Let's read it as if it was written that way, please, and drop AP from the conversation for now?
ID: 1796324 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796402 - Posted: 15 Jun 2016, 17:04:33 UTC - in response to Message 1796324.  


I think it's clear that Keith's question relates to MB choices, not AP. Let's read it as if it was written that way, please, and drop AP from the conversation for now?

Yes, I goofed in my mention of AP tasks. I MEANT MB tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796402 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796404 - Posted: 15 Jun 2016, 17:06:21 UTC - in response to Message 1796305.  


AFAIK all modern AMD CPU`s and APU`s are AVX enabled.

I know my processors are AVX enabled because SIV64 tells me so in its main page. At least I hope it is correct.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796404 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796407 - Posted: 15 Jun 2016, 17:13:49 UTC - in response to Message 1796319.  


Also, FFTW has separate AVX paths so this speedup will be enabled automatically (even with SSE3 main app binary) on AVX-compatible host.

Somewhere along the way I snagged onto some optimized FFTW libraries. They were supposed to be better than the ones shipped with the stock libraries.
libfftw3f-3-3-4_x86.dll 2.24 MB
libfftw3f-3-3-4_x86sse41.dll 2.24 MB
libfftw3f-3avx.dll 2.45 MB
libfftw3f-3ssse3.dll 1.95 MB
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796407 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1796420 - Posted: 15 Jun 2016, 18:48:17 UTC - in response to Message 1796407.  

And they better indeed but for a quite a low margin.
One need understand that big benefits usually comes from hand-optimized to SIMD sources.
That what I mean by AVX path in FFTW.
Compiler's vectoriser rarely can reach similar speedup.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1796420 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1796502 - Posted: 16 Jun 2016, 3:42:05 UTC

Preliminary testing looks like there is very little increase in time to the blc guppi work units.

Maybe 20-30 secs but very negligible.

Non-guppi mb is hard to tell as these work units are not from the same tape as before the change in version.

Most of these mb are coming from 06my10 and 07my10.

Since I don't have any from before from that tape, it's hard to compare. I looked at my other machine and the times are pretty close to each other, even though these are not cloned machines.

So I'm going to say very little difference in time.

However, I did notice that the % of CPU has dropped noticeably. You didn't slip in a use sleep command did you?
ID: 1796502 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796509 - Posted: 16 Jun 2016, 3:57:09 UTC - in response to Message 1796420.  

And they better indeed but for a quite a low margin.
One need understand that big benefits usually comes from hand-optimized to SIMD sources.
That what I mean by AVX path in FFTW.
Compiler's vectoriser rarely can reach similar speedup.

So my decrease in task completion time is solely due to using the optimized
FFTW library libfftw3f-3avx.dll renamed to the stock libfftw3f-3-3-4_x64.dll FFTW library? And nothing to do with using r3430 or r3472 app?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796509 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1796560 - Posted: 16 Jun 2016, 7:46:55 UTC - in response to Message 1796407.  

Hi Keith, it's my own fftw 3.3.4 recompiled libraries from Matteo Frigo's original code.
Cross compiled under Ubuntu 15.10 (with gcc gnu c/c++ compiler 5.2 and maximum optimizations choice for various architecture i.e Core 2, Sandy Bridge, Ivy Bridge,Haswell etc.) with fma enabled for Haswell and above cpus.
AVX simd instruction set can be slower than other like SSE2 due to memory bandwidth it required (mostly under notebook systems).
Some other improvements may be reached when fftw 3.3.5 will be released.

Marco.
ID: 1796560 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 31 · Next

Message boards : Number crunching : Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.