Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)

Message boards : Number crunching : Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 32 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1796308 - Posted: 15 Jun 2016, 9:00:56 UTC - in response to Message 1796305.  
Last modified: 15 Jun 2016, 9:01:14 UTC

AFAIK all modern AMD CPU`s and APU`s are AVX enabled.

But that's what I'm not familiar with. What is 'all', and what is 'modern'?

As usual, most of my detail comes from Wkipedia. So,

Comparison of AMD processors is incomplete.

List of AMD microprocessors doesn't mention supported features.

But putting the two together, Bulldozer, Bobcat and Jaguar - 2011 and later - should all be AVX enabled? Now to turn that into model numbers identified by BOINC in host listings...
ID: 1796308 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13742
Credit: 208,696,464
RAC: 304
Australia
Message 1796309 - Posted: 15 Jun 2016, 9:24:40 UTC - in response to Message 1796305.  

AFAIK all modern AMD CPU`s and APU`s are AVX enabled.

Problem is defining modern.

Not so long ago someone was calling their video card Mid-Range. In a list of all hardware from 18 years ago to now, it was.
As far as current hardware went (even without including Pascal), it was on the bottom rung.
Grant
Darwin NT
ID: 1796309 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1796310 - Posted: 15 Jun 2016, 9:32:34 UTC - in response to Message 1795542.  

. . I think you made the same mistake I did, .....

. . Another brilliant "D'oh!" moment.

Off-topic:
If you use these dots only to make indented text - why not use Alt+255
Like this: 
     I think you made the same mistake I did, .....

     Another brilliant "D'oh!" moment.


P.S.
Alt+255 = press Alt, type 255 (use the Right-num-keys, not the top), release Alt
Then Copy/Paste that ->                <- space (nbsp)                as many times you like.
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1796310 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1796319 - Posted: 15 Jun 2016, 10:09:22 UTC - in response to Message 1796279.  


If AVX turns out to be the best for AMD now, I'll disable the special selection for AMD and pre-select AVX when available - at least until somebody steps into Joe's shoes and completes the CPU app range.

AFAIK Joe never did AP builds. All those from me solely (for Windows).
And there is no AVX-specific optimizations in AP as I can recall now.
The advantage from AVX build could come only from auto-optimization by compiler.
Also, FFTW has separate AVX paths so this speedup will be enabled automatically (even with SSE3 main app binary) on AVX-compatible host.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1796319 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1796324 - Posted: 15 Jun 2016, 10:45:27 UTC - in response to Message 1796319.  

The references to AP in recent posts to this thread all refer back to Keith Meyer's post 1796270

There really isn't a AP CPU SSE4.1 app in this new beta Lunatics installer, correct? The installer defaults to old SSE4.1 app choice for AMD CPU as being preferred over AVX. However there is no SSE4.1 AP app now that Joe Segur left ... correct?? The SSE4.1 choice falls back on the older SSE3 AP app. We AMD users really should choose the AVX option as that one actually gets installed and is faster than SSE3. Once someone confirms my observation, I will rerun the installer and go back to the AVX app.

Let's look at the page:


(image taken on a pre-AVX intel, so enabled options and pre-selects are different)

I think it's clear that Keith's question relates to MB choices, not AP. Let's read it as if it was written that way, please, and drop AP from the conversation for now?
ID: 1796324 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796402 - Posted: 15 Jun 2016, 17:04:33 UTC - in response to Message 1796324.  


I think it's clear that Keith's question relates to MB choices, not AP. Let's read it as if it was written that way, please, and drop AP from the conversation for now?

Yes, I goofed in my mention of AP tasks. I MEANT MB tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796402 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796404 - Posted: 15 Jun 2016, 17:06:21 UTC - in response to Message 1796305.  


AFAIK all modern AMD CPU`s and APU`s are AVX enabled.

I know my processors are AVX enabled because SIV64 tells me so in its main page. At least I hope it is correct.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796404 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796407 - Posted: 15 Jun 2016, 17:13:49 UTC - in response to Message 1796319.  


Also, FFTW has separate AVX paths so this speedup will be enabled automatically (even with SSE3 main app binary) on AVX-compatible host.

Somewhere along the way I snagged onto some optimized FFTW libraries. They were supposed to be better than the ones shipped with the stock libraries.
libfftw3f-3-3-4_x86.dll 2.24 MB
libfftw3f-3-3-4_x86sse41.dll 2.24 MB
libfftw3f-3avx.dll 2.45 MB
libfftw3f-3ssse3.dll 1.95 MB
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796407 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1796420 - Posted: 15 Jun 2016, 18:48:17 UTC - in response to Message 1796407.  

And they better indeed but for a quite a low margin.
One need understand that big benefits usually comes from hand-optimized to SIMD sources.
That what I mean by AVX path in FFTW.
Compiler's vectoriser rarely can reach similar speedup.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1796420 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1796502 - Posted: 16 Jun 2016, 3:42:05 UTC

Preliminary testing looks like there is very little increase in time to the blc guppi work units.

Maybe 20-30 secs but very negligible.

Non-guppi mb is hard to tell as these work units are not from the same tape as before the change in version.

Most of these mb are coming from 06my10 and 07my10.

Since I don't have any from before from that tape, it's hard to compare. I looked at my other machine and the times are pretty close to each other, even though these are not cloned machines.

So I'm going to say very little difference in time.

However, I did notice that the % of CPU has dropped noticeably. You didn't slip in a use sleep command did you?
ID: 1796502 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1796509 - Posted: 16 Jun 2016, 3:57:09 UTC - in response to Message 1796420.  

And they better indeed but for a quite a low margin.
One need understand that big benefits usually comes from hand-optimized to SIMD sources.
That what I mean by AVX path in FFTW.
Compiler's vectoriser rarely can reach similar speedup.

So my decrease in task completion time is solely due to using the optimized
FFTW library libfftw3f-3avx.dll renamed to the stock libfftw3f-3-3-4_x64.dll FFTW library? And nothing to do with using r3430 or r3472 app?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1796509 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65757
Credit: 55,293,173
RAC: 49
United States
Message 1796510 - Posted: 16 Jun 2016, 4:04:24 UTC

Well I'm as happy as a clam with SoG in lunatics 0.45 Beta3.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1796510 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1796560 - Posted: 16 Jun 2016, 7:46:55 UTC - in response to Message 1796407.  

Hi Keith, it's my own fftw 3.3.4 recompiled libraries from Matteo Frigo's original code.
Cross compiled under Ubuntu 15.10 (with gcc gnu c/c++ compiler 5.2 and maximum optimizations choice for various architecture i.e Core 2, Sandy Bridge, Ivy Bridge,Haswell etc.) with fma enabled for Haswell and above cpus.
AVX simd instruction set can be slower than other like SSE2 due to memory bandwidth it required (mostly under notebook systems).
Some other improvements may be reached when fftw 3.3.5 will be released.

Marco.
ID: 1796560 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1796561 - Posted: 16 Jun 2016, 7:51:45 UTC - in response to Message 1796502.  


However, I did notice that the % of CPU has dropped noticeably. You didn't slip in a use sleep command did you?

There will be new build quite soon, with much more changes - would be interesting to see how it reacts.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1796561 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1796587 - Posted: 16 Jun 2016, 13:02:11 UTC - in response to Message 1796310.  
Last modified: 16 Jun 2016, 13:05:30 UTC

. . I think you made the same mistake I did, .....

. . Another brilliant "D'oh!" moment.

Off-topic:
If you use these dots only to make indented text - why not use Alt+255
Like this: 
     I think you made the same mistake I did, .....

     Another brilliant "D'oh!" moment.


P.S.
Alt+255 = press Alt, type 255 (use the Right-num-keys, not the top), release Alt
Then Copy/Paste that ->                <- space (nbsp)                as many times you like.



Do you mean like this? Thank you for that advice I have found the other method tedious :)

This is so much better :)

[edit] a pity it didn't work though. I wonder why it worked for you and not for me. It shows as spaces in my edit window but disappears when posted, same as using the spacebar.
ID: 1796587 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1796828 - Posted: 17 Jun 2016, 11:48:45 UTC - in response to Message 1796587.  

[edit] a pity it didn't work though.

Use [Quote] button and Copy/Paste the following line to Notepad:
->               <- space (nbsp)

Then use (Copy/Paste from Notepad in future) the "spaces" that are between the arrows.

P.S.
- Do not Copy the above line from the page, Copy from the typing box after [Quote] button.
- Use [Preview] button before post to see if "it did work"

Alt+255 (and similar) exist since MS-DOS times.

"Your spaces" look like normal spaces (Hex: 20 20 20 ...) and "My spaces" look like A0 A0 A0 ...:
20 20 20 20 20 20 54 68 69 73 20 69 73 20 73 6F 20 6D 75 63 68 20 62 65 74 74 65 72 :      This is so much better
2D 3E A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 3C 2D                            : ->               <-

 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1796828 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1796976 - Posted: 18 Jun 2016, 1:17:57 UTC - in response to Message 1796828.  

[quote][edit] a pity it didn't work though.

Use
button and Copy/Paste the following line to Notepad:
->               <- space (nbsp)

Then use (Copy/Paste from Notepad in future) the "spaces" that are between the arrows.

P.S.
- Do not Copy the above line from the page, Copy from the typing box after [Quote] button.
- Use [Preview] button before post to see if "it did work"

Alt+255 (and similar) exist since MS-DOS times.

"Your spaces" look like normal spaces (Hex: 20 20 20 ...) and "My spaces" look like A0 A0 A0 ...:
20 20 20 20 20 20 54 68 69 73 20 69 73 20 73 6F 20 6D 75 63 68 20 62 65 74 74 65 72 :      This is so much better
2D 3E A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 3C 2D                            : ->               <-

  test
^
worked for me ...
though I've found the pre /pre functions to be more useful ...
ID: 1796976 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1796980 - Posted: 18 Jun 2016, 1:44:13 UTC

Ok, I was trying to find the post where Keith talked about the time to complete under the new app. I'm sure it's here but I forgot where it was.

Now that I've had 24 hour with it running, I think he is right.

For non-guppi work units, it looks like a 200-240 sec increase for the work,not the 100 sec I originally thought.(these are running multiple work units at once, not single work units per card) Took a while to get enough work units to look at all of the different types.

I've switched back to r3430 and the times have come back down on new work from the same tape.

So for now, I'm sticking with that.

No real change to Guppi work units.
ID: 1796980 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65757
Credit: 55,293,173
RAC: 49
United States
Message 1796998 - Posted: 18 Jun 2016, 4:05:58 UTC

For guppi's 45-50 mins, non-guppi's 10-25 mins, mind you I'm doing 3 wu's at a time on r3472/SoG, on a PNY LC 580 card @ 857MHz, plus there are 4 cpu's being done on AVX.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1796998 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1797005 - Posted: 18 Jun 2016, 4:28:53 UTC - in response to Message 1796980.  

Zalster, to throw more wood on the fire .... I just changed my computers to remove the -use_sleep option in the MB txt file. !WOW! That really goosed the engines. I am now doing GUPPI VLAR in 12-15 minutes vice 22-24 minutes with sleep. Now doing non-VLAR Arecibo tasks in 5-9 minutes vice 12-15 minutes. CPU is at 90-100% usage with MB tasks basically taking a full CPU core. I think that is normal for the SoG app from what I've read about its history. Funny thing is that the system lags are actually reduced from using the sleep function and I haven't done anything to change my very aggressive MB txt file parameters. I wonder if the sleeping actually impacted the video graphics engine causing lags more than normal. I did see only about 50% CPU usage and NO red in the SIV64 CPU utilization traces when using sleep. There's RED all over the place now but the systems are stable as always and the text entry and mouse cursor movement is acceptable. I still think there is a speedup of the r3430 app vs the newer r3472 app. I am trying that version out now since it was supposed to play nice with the sleep parameter. Not sure I need it now. Will have to revert to the older r3430 app and compare throughput but need a few days of data collection to form a baseline.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1797005 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 32 · Next

Message boards : Number crunching : Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.