Stock vs Lunatics: CPU throughput

Message boards : Number crunching : Stock vs Lunatics: CPU throughput
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1806409 - Posted: 1 Aug 2016, 17:39:18 UTC

I switched my workstation back to stock and noticed the most amazing thing: for the first time in recent memory my CPU temps are actually lower than my GPU! 5 degrees and declining (which is shocking given the GPU temps are actually higher).



I know that S@H uses libfftw which should use AVX and I assume that's where the heavy SIMD-lifting is happening (maybe I'm wrong?). Based on my experiences with SSSE3-SSE4.2 I can't imagine any other processor feature making that big of a difference for this as AVX. The only other theory I can think of is maybe stock is being built with an older version of VS -- I recall x64 codegen being improved in more recent builds (I think it was is VS2012?). Or maybe the lunatics version is being built as a VEX app so it can use AVX throughout without transition penalties?

I don't mean to take a dig at Lunatics but I was also fascinated to see the default CUDA load making extremely good use of my GPU -- I used to run 3 apps at a time (on the left) but the right is default 1 task OpenCL. From what I can tell the OpenCL app is keeping the GPU more busy even with one task at a time (I haven't tuned anything at all). That, or OpenHWM is only measuring the first SM cluster or something whacky like that?

The other worthless anecdote after switching was my power to the wall is a bit lower -- I assume this is consistent with the CPU running cooler.

This begs the question: can you mix and match? Lunatics CPU + OpenCL SoG/SaH GPU? And what's stopping the CPU optimizations from getting into the main build?
ID: 1806409 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1806429 - Posted: 1 Aug 2016, 18:40:21 UTC - in response to Message 1806409.  
Last modified: 1 Aug 2016, 18:43:13 UTC

This begs the question: can you mix and match? Lunatics CPU + OpenCL SoG/SaH GPU? And what's stopping the CPU optimizations from getting into the main build?

That's what Richards' Beta Lunatics installer allows you to do- install the CPU & CUDA or OpenCL applications without losing your cache or in progress work. SoG performs much better than SaH.


And what's stopping the CPU optimizations from getting into the main build?

No idea, however the more optimised the stock application becomes, the lower the Credit payout becomes.
Grant
Darwin NT
ID: 1806429 · Report as offensive
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1806441 - Posted: 1 Aug 2016, 19:28:32 UTC - in response to Message 1806409.  
Last modified: 1 Aug 2016, 19:43:51 UTC

I can't imagine any other processor feature making that big of a difference for this as AVX.


Me would interest the impact on AMD cpu that support AVX etc. any workunit samples ?

I am asking because i am shortly before to install lunarics 64-bit again, but now on an AMD system and see.

SSE3 is recomended before avx , i see...
ID: 1806441 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806456 - Posted: 1 Aug 2016, 20:39:43 UTC - in response to Message 1806409.  

And what's stopping the CPU optimizations from getting into the main build?

They do but slowly.
Also, AKv8-based CPU Lunatics apps use quite different approach to host divercity than stock so it's unprropriate to replace stock with them.
They could be distributed along with current stock as AstroPulse opt apps do.
Maybe at some point it will be done.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806456 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1806459 - Posted: 1 Aug 2016, 20:44:11 UTC - in response to Message 1806441.  
Last modified: 1 Aug 2016, 20:45:41 UTC

I can't imagine any other processor feature making that big of a difference for this as AVX.


Me would interest the impact on AMD cpu that support AVX etc. any workunit samples ?

I am asking because i am shortly before to install lunarics 64-bit again, but now on an AMD system and see.

SSE3 is recomended before avx , i see...


You can always check my results.
I`m running AVX version on my FX CPU.
Its slightly faster than other Lunatics versions but it depends on the CPU/memory/host combination.
Needless to say its much faster than stock.


With each crime and every kindness we birth our future.
ID: 1806459 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1806467 - Posted: 1 Aug 2016, 21:29:12 UTC

As the installer says, SSE4.2 was reported to be preferable over AVX on AMD CPUs - some years ago, when we had an SSE4.2 build available, courtesy of Joe Segur.

It is certainly possible that AVX may be better than SSE3, which is the next-best I've been provided with for v8. Try them both, and see what you think.
ID: 1806467 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1806475 - Posted: 1 Aug 2016, 22:29:21 UTC - in response to Message 1806467.  

Richard, I've seen Joes named mentioned sometimes when the CPU app is brought up. Is there a thread about what happened and why he has decided to move on. From everything I've gathered, he was a great asset to the project, and it appears that his leaving has left a huge hole in that side of development. Or do you have anything you could share about it? Thanks!

ID: 1806475 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806476 - Posted: 1 Aug 2016, 22:33:25 UTC - in response to Message 1806475.  
Last modified: 1 Aug 2016, 22:38:46 UTC

Richard, I've seen Joes named mentioned sometimes when the CPU app is brought up. Is there a thread about what happened and why he has decided to move on. From everything I've gathered, he was a great asset to the project, and it appears that his leaving has left a huge hole in that side of development. Or do you have anything you could share about it? Thanks!

I'm afraid it was not his decision... He was old enough and suddenly all his activity both on SETI boards and Lunatics site was disrupted. Month or 2 could be internet difficulties (he in "hard-internetizable" area) but so long time... I'm afraid the worst happened :(
P.S. and indeed he was very valuable and knowledgeable member of the team. His experience {and the manner of communication he always showed} missed a lot...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806476 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1806502 - Posted: 2 Aug 2016, 0:28:11 UTC - in response to Message 1806476.  

Aww crap, that stinks if that is true. Anyone here know him personally, who might be able to share with us a more definitive answer? He sounds like he was a great guy, I wonder if I had ever talked to him back when I was more active here around 2009-2011? :-(

ID: 1806502 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1806504 - Posted: 2 Aug 2016, 0:31:48 UTC - in response to Message 1806502.  

Did anyone know him in RL and could try to contact him? i
ID: 1806504 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1806575 - Posted: 2 Aug 2016, 7:30:28 UTC - in response to Message 1806504.  

Did anyone know him in RL and could try to contact him? i

He keeps his private life very private, and only really communicated on these boards about technical matters - a true number cruncher.

I did find a physical address and telephone number for the house he shares with his sister, and passed them on to the project team (who also miss him) - I don't know whether or how hard they tried to follow it up.
ID: 1806575 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806771 - Posted: 3 Aug 2016, 11:48:16 UTC - in response to Message 1806575.  

AFAIK Mike tried to call him by phone and I sent E-mail... all w/o any success :(
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806771 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1806776 - Posted: 3 Aug 2016, 12:21:03 UTC - in response to Message 1806771.  

AFAIK Mike tried to call him by phone and I sent E-mail... all w/o any success :(


Yes, i tried a few times.


With each crime and every kindness we birth our future.
ID: 1806776 · Report as offensive
_heinz
Volunteer tester

Send message
Joined: 25 Feb 05
Posts: 744
Credit: 5,539,270
RAC: 0
France
Message 1806863 - Posted: 3 Aug 2016, 20:48:21 UTC

hmm, talking about Josef W. Segur alias Joe
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Looks like the same as we lost the board-moderator of lunatics Simon from Salzburg Austria with his last activity on 26 Aug 2007, 09:53:05 am
I had have called his father working in Switzerland as a doctor, but get not any Information from him. He said I don't know anything about him.
Nobody knows what happened....still sadness and helplessness.
ID: 1806863 · Report as offensive
_heinz
Volunteer tester

Send message
Joined: 25 Feb 05
Posts: 744
Credit: 5,539,270
RAC: 0
France
Message 1806971 - Posted: 4 Aug 2016, 7:01:41 UTC - in response to Message 1806863.  

I looked up about joes last activity on Lunatics.
Date Registered: 05 Jul 2006, 06:52:16 pm
Last Active: 06 Sep 2015, 08:11:33 pm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Joe is generous helpful gentleman with excellent knowledge.
We are missing him. I wish him all the best.

Respectfully _heinz
ID: 1806971 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1807008 - Posted: 4 Aug 2016, 9:46:27 UTC - in response to Message 1806409.  



I know that S@H uses libfftw which should use AVX and I assume that's where the heavy SIMD-lifting is happening (maybe I'm wrong?). Based on my experiences with SSSE3-SSE4.2 I can't imagine any other processor feature making that big of a difference for this as AVX. The only other theory I can think of is maybe stock is being built with an older version of VS -- I recall x64 codegen being improved in more recent builds (I think it was is VS2012?). Or maybe the lunatics version is being built as a VEX app so it can use AVX throughout without transition penalties?

I don't mean to take a dig at Lunatics but I was also fascinated to see the default CUDA load making extremely good use of my GPU -- I used to run 3 apps at a time (on the left) but the right is default 1 task OpenCL. From what I can tell the OpenCL app is keeping the GPU more busy even with one task at a time (I haven't tuned anything at all). That, or OpenHWM is only measuring the first SM cluster or something whacky like that?

The other worthless anecdote after switching was my power to the wall is a bit lower -- I assume this is consistent with the CPU running cooler.

This begs the question: can you mix and match? Lunatics CPU + OpenCL SoG/SaH GPU? And what's stopping the CPU optimizations from getting into the main build?



Hi Shaggie76, your assumptions are correct. But AVX SIMD on some systems can be slower than SSE2 (e.g 256 bit transfers can impair memories subsystems) as Matteo Frigo quotes in its fftw's pages.
There's new fftw library 3.3.5 with extended SIMD support beyond than sse,sse2 and AVX (e.g AVX2 with fma for example belonging to Haswell and above microarchitectures).
http://www.fftw.org/

http://www.fftw.org/release-notes.html

Thanks.

Marco71.
ID: 1807008 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1807031 - Posted: 4 Aug 2016, 13:17:05 UTC - in response to Message 1807008.  
Last modified: 4 Aug 2016, 13:17:44 UTC

But AVX SIMD on some systems can be slower than SSE2 (e.g 256 bit transfers can impair memories subsystems) as Matteo Frigo quotes in its fftw's pages.

I didn't do AVX research but some time ago there were AMD CPUs with SSE3 support. And they performed SSE2-only calculations better than SSE3-enabled ones.
Cause SSE3 was implemented, but with so many cycles per instruction that using SSE3 for speed was not possible.
To support some instruction set per se doesn't mean that using that instruction set will speedup things.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1807031 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1807032 - Posted: 4 Aug 2016, 13:34:04 UTC - in response to Message 1807008.  


Hi Shaggie76, your assumptions are correct. But AVX SIMD on some systems can be slower than SSE2 (e.g 256 bit transfers can impair memories subsystems) as Matteo Frigo quotes in its fftw's pages.
There's new fftw library 3.3.5 with extended SIMD support beyond than sse,sse2 and AVX (e.g AVX2 with fma for example belonging to Haswell and above microarchitectures).

I'm well-aware of how AVX2 may not be ideal when used in isolation -- I did a bunch of experiments for Xbox One and PlayStation/4 and left the code #ifdef'd out :) I also ran the same tests on a Haswell (mobile) chip and saw improvements in specific cases (and even when it was about as fast the instruction cache was under much less pressure) so my gut is saying for an app like S@H an extra build-target with VEX encoding might be a big enough win [for my Haswell-E desktop] to explain the improved throughput.
ID: 1807032 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1807051 - Posted: 4 Aug 2016, 16:23:58 UTC
Last modified: 4 Aug 2016, 16:24:26 UTC

I'm in the process of re cross compiling under Linux o.s with gcc 5.4 (with -O3 and -march/-mtune switches settings for haswell,ivybridge,core2 with sse4.1 etc.).
So far only avx2/fma, avx and sse4.1 simd.
Library name libfftw3f-3-3-4_x64.dll to avoid changes in configuration files.
Fftw3.3.5 have support for the following simd isa:
https://github.com/FFTW/fftw3/tree/master/simd-support




https://drive.google.com/folderview?id=0B9iU4E_jpim0Y3ExUm9lZk5zN0E&usp=sharing

Marco71.
ID: 1807051 · Report as offensive

Message boards : Number crunching : Stock vs Lunatics: CPU throughput


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.