AVX Extensions - Ongoing development?

Message boards : Number crunching : AVX Extensions - Ongoing development?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 286
Credit: 167,386,578
RAC: 0
India
Message 1074436 - Posted: 5 Feb 2011, 19:09:25 UTC

@ Raistmer & Richard,

Thanks for the clarification.
______________

ID: 1074436 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1074462 - Posted: 5 Feb 2011, 20:05:01 UTC - in response to Message 1074431.  

Definitely! I'm getting a new SB soon to replace my aging colo q6600

If running more than 2 SATA devices, keep in mind the likely failure of the 3Gb/s SATA controller. The 6Gb/s controller isn't affected. Boards with the fixed chipset aren't expected to make it to retail probably until late March, early April.
Grant
Darwin NT
ID: 1074462 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1074477 - Posted: 5 Feb 2011, 20:56:06 UTC
Last modified: 5 Feb 2011, 20:56:48 UTC

Because of Linux and CUDA..

There is already a Linux app available (AFAIK, only the 64bit work): lunatics.kwsn.net/seti-mb-cuda-for-linux.

(AFAIK, currently not Fermi suitable)
ID: 1074477 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21253
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1074738 - Posted: 6 Feb 2011, 15:36:30 UTC - in response to Message 1074477.  

There is already a Linux app available (AFAIK, only the 64bit work): lunatics.kwsn.net/seti-mb-cuda-for-linux.

(AFAIK, currently not Fermi suitable)


The Fermi version is already being worked on. Looking good already...

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1074738 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 1074834 - Posted: 6 Feb 2011, 20:14:08 UTC - in response to Message 1074462.  

Definitely! I'm getting a new SB soon to replace my aging colo q6600

If running more than 2 SATA devices, keep in mind the likely failure of the 3Gb/s SATA controller. The 6Gb/s controller isn't affected. Boards with the fixed chipset aren't expected to make it to retail probably until late March, early April.


Sure Grant, I will wait until the bug has been ironed out, but I will use 6Gb/s anyway, the server does over a million hits a day, so has to be as meaty as possible. As it is, the db and web server are separate machines.
ID: 1074834 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1074978 - Posted: 7 Feb 2011, 6:40:26 UTC - in response to Message 1074834.  

Sure Grant, I will wait until the bug has been ironed out...

I was going to, but i've been intending to upgrade for over 18 months now & just couldn't wait. I'd ordered my system just a couple of days before the news broke. Luckily my supplier was prepared to supply a 4 port addon SATA card at cost price, otherwise i'd have cancelled the order & would have to wait another 3 or so months.
By which time the 6 core systems & new chipsets would be out & then i'd have to consider one of those...
Grant
Darwin NT
ID: 1074978 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1075204 - Posted: 7 Feb 2011, 23:20:50 UTC - in response to Message 1074978.  

I read something interesting about the bulldozer (amd) line of CPUs last night. It involves AVX, so I figured I'd share. The Intel design has 8 256-bit FPUs. Of course those can do 8 (non-AVX) 128-bit, 8 64-bit, etc. AMD on the other hand, also has 8 256-bit, but when you are running (non-AVX) 128-bit, you get SIXTEEN (because each bulldozer FPU can "split" in two and run 2 128-bit instructions in the 256-bit space). So, theoretically, AMD will be twice as fast for 128-bit floating-point calculations. The same should also be true for 64-bit (16, not 32 though, because they only split once). I figured that Intel was going to run away with the top CPUs list for seti, but if this works as well as it sounds on paper, the new king of the crunching hill may be AMD! Oddly enough, that may mean it would be better NOT to run AVX on Bulldozer CPUs...which would be a very interesting turn of events, imo.

http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=137432&start=825

The guy who wrote the piece works for AMD, so I presume he knows what he's talking about. Of course this is "unofficial", meaning, not an announcement from AMD, so it might be best to take it with a grain of salt until we get real benchmarks and/or information from the horse's mouth. I thought it was fascinating though, so I figured I'd pass it along.

-baron_iv
Proud member of:
GPU Users Group
ID: 1075204 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1075213 - Posted: 7 Feb 2011, 23:51:39 UTC - in response to Message 1075204.  

I read something interesting about the bulldozer (amd) line ...


Plus Bulldozer will support FMA4.
ID: 1075213 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1079593 - Posted: 20 Feb 2011, 1:42:19 UTC

Anything new on this? It's been almost 2 weeks since there was a post. I've upgraded to win 7 SP1 so I can beta test whenever someone comes up with an application with AVX support for the sandy bridge processors.

Thanks in advance for replies and interest.
-baron_iv
Proud member of:
GPU Users Group
ID: 1079593 · Report as offensive
Profile Todd Hebert
Volunteer tester
Avatar

Send message
Joined: 16 Jun 00
Posts: 648
Credit: 228,292,957
RAC: 0
United States
Message 1079601 - Posted: 20 Feb 2011, 2:24:32 UTC

I just received another 2600k on Wednesday so my offer of development hardware was pushed back. These days getting a board is somewhat of a challenge but I was able to get one before they were pulled. Also I was kinda holding out for the release of SP1 for Win7. Being a technet member I have gotten it but need to do an image of the machine prior to the install. I do have new install disks that have SP1 integrated so the new machine might become the target.

Todd
ID: 1079601 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1079715 - Posted: 20 Feb 2011, 14:35:53 UTC

Well, I'm still more than happy to offer any of the project's developers access to my 2600K system via VNC, if they need it to develop for AVX. I'd also love to test any new apps. I think I said it above, but I'm running Windows 7 x64 SP1, so software-wise, I'm ready to go for testing.
-baron_iv
Proud member of:
GPU Users Group
ID: 1079715 · Report as offensive
outlaw

Send message
Joined: 6 Mar 00
Posts: 43
Credit: 17,063,897
RAC: 0
Canada
Message 1080150 - Posted: 21 Feb 2011, 21:01:02 UTC - in response to Message 1079715.  

I posted a link to the 32 bit version.

Unfortunately I haven't had the time, yet, to fix the 64 bit build issues...
ID: 1080150 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1080202 - Posted: 21 Feb 2011, 23:42:31 UTC - in response to Message 1080150.  

I posted a link to the 32 bit version.

Unfortunately I haven't had the time, yet, to fix the 64 bit build issues...


I saw that, but won't it throw errors if I run it on the 64-bit version of boinc on win64?
-baron_iv
Proud member of:
GPU Users Group
ID: 1080202 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1080284 - Posted: 22 Feb 2011, 7:04:02 UTC - in response to Message 1080202.  

I posted a link to the 32 bit version.

Unfortunately I haven't had the time, yet, to fix the 64 bit build issues...


I saw that, but won't it throw errors if I run it on the 64-bit version of boinc on win64?


Boinc and Seti@Home apps are two different things. You should be fine, I run 64bit lunatics apps for the non-Cuda stuff but the lunatics cuda apps are 32 bit. ;)
Traveling through space at ~67,000mph!
ID: 1080284 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1080346 - Posted: 22 Feb 2011, 14:41:41 UTC

Ok, running the build from above. To me, it actually APPEARS to be running MUCH slower than the SSE 4.1 build, based on how fast the numbers are going up and the time to completion. So I paused 4 old tasks and began 4 new ones with the new app (while continuing to run 4 old ones). It's taken about 8 minutes for 1%, which is CONSIDERABLY longer than it would have taken with the optimized lunatics application. I will go ahead and let all these run to completion so we can know if they are valid, but I suspect that after that, I will go back to the SSE 4.1 app for now...which finishes a full unit in about 125-140 minutes on average.

Can anyone tell me exactly what applications I need to build the seti apps from source with AVX instructions/optimizations? All of this is so easy on linux, since it comes with compilers, but I have no clue how to compile anything on windows. I guess now is as good of a time as any to learn. :)

I looked at intel's site and there are a dozen different apps there for download. Visual compiler, C++ compiler, C compiler, several different composers, etc etc etc. Then when I went to download one, it said I had to have a microsoft product of some sort. All rather confusing if you ask me. Makes me miss good ol' GCC and the SIMPLICITY of Linux! I mean I compiled my entire operating system from time to time in linux, and it was easy compared to even figuring out what applications to use on windows. hehe
-baron_iv
Proud member of:
GPU Users Group
ID: 1080346 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1080372 - Posted: 22 Feb 2011, 15:42:44 UTC - in response to Message 1080346.  
Last modified: 22 Feb 2011, 16:25:43 UTC

Can anyone tell me exactly what applications I need to build the seti apps from source with AVX instructions/optimizations? All of this is so easy on linux, since it comes with compilers,


For Windows builds:
- AKv8b used Intel compiler 10.1 update 3 (pro licence), over Visual Studio 2005 professional with SP1, including Platform SDK.
- Stock uses DevC++/MingGW environment (I believe, though I use Visual Studio to work with the stock codebase. )

I'd better chip in my 2 cents, because, after working with these codebases for a few years, I believe expecting AVX optimisation to 'fall-out' of compilers is not an approach that's going to yield satisfactory results.

Recompilation with some compiler flag turned on is not optimisation (I wish it was :) ).

With respect to AVX, it all boils down to the fact that the 'true optimised' core components of the multibeam applications, both stock and AKv8b, are actually hand optimised & vectorised pieces of code, rather than overly compiler dependant (generic code).

In some areas use of Intel Compiler/Libraries & profile guided optimisations accounts for 10-20% (or so) of the difference between stock & AKv8, the rest being a combination of optimisations at a higher algorithmic level, and at a lower microarchitectural level taking higher level algorithm details into account. ( Compilers can't do that :) )

Simply changing compiler options will not do for this style of code, as it is not generic C/C++ but primarily intrinsics & assembly. No compiler option is capable of optimisation of the nature necessary, and it must be done by hand.

That is, for example, AFAIK no compiler exists that will convert hand SSSE3 intrinsics, such as found in Alex Kan's excellent pulsefinding, into AVX, since the 256 bittage is a fundamental architectural shift that is going to need higher level consideration, along with extensive expansion & re-factoring of earlier generation code to do the new architecture justice.

That isn't to say recompilation with new libraries & AVX turned on won't give *some* benefit. It will most definitely (if done right), but don't be under the illusion that compilers will (or can) do all the work that's needed to get the most out of the new instruction set.

If you, or anyone, happens to be interested in finding out more on optimisation, there are plenty of resources I could point you in the direction of, as well as plenty of code around here that could do with a few thousand man hours of refinement as homework :D

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1080372 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1080397 - Posted: 22 Feb 2011, 16:57:41 UTC - in response to Message 1080372.  

Unfortunately, sounds like that's WAY over my head. I was hoping it would be as simple as taking the source and recompiling. Looks like AVX instructions are gonna take a while to make it into seti. I understand though, you guys work hard, using your own free time to make these things work on different hardware. If I knew anything at all about programming, I'd jump right in on this project, but all I'd do is make a giant mess.

I still offer VNC access to my sandy bridge system for any developers who want to give this a shot, if it's not something they can do on their own hardware. There are more and more people using SB CPUs every day, so I'm guessing the requests for this will grow over time. As of now, it seems that there aren't that many of us who have upgraded. If there's anything I can do to help the effort, please don't hesitate to ask. :)
-baron_iv
Proud member of:
GPU Users Group
ID: 1080397 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1080409 - Posted: 22 Feb 2011, 17:24:09 UTC - in response to Message 1080397.  

If there's anything I can do to help the effort, please don't hesitate to ask. :)


No probs. When I started with exploring the multibeam code I was reasonably comfortable with the Stock & Lunatics stuff, but even now years later I need a good cup of tea & a lie down to recover after looking at particular portions ( especially Alex's pulsefinding :) ). That's saying something, since I'm a coffee & beer drinker.

I really enjoy seeing new technologies come online, and the interest they stir. The current 'state of the art' applications happen to be a bit of a patchwork quilt of many contributions, and can be a challenge with all the styles & mixed stages of completion.

AVX is going to be a worthwhile endeavour, but not something I'm taking lightly myself, nor rushing with high short term expectations. I've got Todd's and you're own offers of VNC access under the belt, as well as the tools.

Now all I need is some time to complete the optimisations still under my belt for Cuda cards (including pre-emptive Kepler) , which thanks to an 'accidentally & hugely productive public testing run' with power-spectrum pipeline unit tests amounts to a ~60% rewrite of the Cuda application so far, before moving onto more challenging pulse-finding code .... Phew! it's a big year for technology :D

Jason



"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1080409 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1080437 - Posted: 22 Feb 2011, 22:24:31 UTC

I'm almost beginning to wonder if I should have held out for bulldozer. The design of bulldozer may make it an absolute BEAST when it comes to pure number crunching, since the 8 256-bit vector units can be split up into 16 128-bit or 16 64-bit (or 32, etc). AMD has fallen behind in the years since they stunned the market with their K7 and K8 series, which were better than anything Intel was doing at that time, imo. I've always liked AMD because they seem to offer a ton of value, but Intel has kinda run away with the performance crown for quite some time now. It's going to be fascinating to see if bulldozer can bulldoze Sandy Bridge. I mean the rumors are that BD is 50% faster than core i7, which would be incredible, to say the least (and fairly unlikely as well).

For those of you who are interested in the AMD/Intel race, ArsTechnica did a really in-depth series of articles a few years ago about the K7/K8 vs P4/Core2.

http://www.arstechnica.com/cpu/3q99/k7_theory/k7-one-1.html
This one is particularly interesting because he went in-depth into what made the K7 so revolutionary. That's just before I'd started seti, but I suspect that with K7 there was a significant jump in RAC for those who had K7 systems. I wish we had some sort of record of what new technology does for seti, like a long-term graph that shows things in terms of RAC and correlates that with new technology like the K7, K8, P4, Core2, C2Q, GPU, i-series, etc. I would assume the biggest jump of all time was when GPUs were brought into the project. Probably had more of an impact than the CPU revisions did over many years.


-baron_iv
Proud member of:
GPU Users Group
ID: 1080437 · Report as offensive
outlaw

Send message
Joined: 6 Mar 00
Posts: 43
Credit: 17,063,897
RAC: 0
Canada
Message 1081293 - Posted: 25 Feb 2011, 20:17:37 UTC - in response to Message 1080346.  

... It's taken about 8 minutes for 1%, which is CONSIDERABLY longer than it would have taken with the optimized lunatics application. I will go ahead and let all these run to completion so we can know if they are valid, but I suspect that after that, I will go back to the SSE 4.1 app for now...


Thanks, good info.



ID: 1081293 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : AVX Extensions - Ongoing development?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.