Message boards :
Number crunching :
AVX Extensions - Ongoing development?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
@ Raistmer & Richard, Thanks for the clarification. ______________ |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Definitely! I'm getting a new SB soon to replace my aging colo q6600 If running more than 2 SATA devices, keep in mind the likely failure of the 3Gb/s SATA controller. The 6Gb/s controller isn't affected. Boards with the fixed chipset aren't expected to make it to retail probably until late March, early April. Grant Darwin NT |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Because of Linux and CUDA.. There is already a Linux app available (AFAIK, only the 64bit work): lunatics.kwsn.net/seti-mb-cuda-for-linux. (AFAIK, currently not Fermi suitable) |
ML1 Send message Joined: 25 Nov 01 Posts: 21253 Credit: 7,508,002 RAC: 20 |
There is already a Linux app available (AFAIK, only the 64bit work): lunatics.kwsn.net/seti-mb-cuda-for-linux. The Fermi version is already being worked on. Looking good already... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
Definitely! I'm getting a new SB soon to replace my aging colo q6600 Sure Grant, I will wait until the bug has been ironed out, but I will use 6Gb/s anyway, the server does over a million hits a day, so has to be as meaty as possible. As it is, the db and web server are separate machines. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Sure Grant, I will wait until the bug has been ironed out... I was going to, but i've been intending to upgrade for over 18 months now & just couldn't wait. I'd ordered my system just a couple of days before the news broke. Luckily my supplier was prepared to supply a 4 port addon SATA card at cost price, otherwise i'd have cancelled the order & would have to wait another 3 or so months. By which time the 6 core systems & new chipsets would be out & then i'd have to consider one of those... Grant Darwin NT |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
I read something interesting about the bulldozer (amd) line of CPUs last night. It involves AVX, so I figured I'd share. The Intel design has 8 256-bit FPUs. Of course those can do 8 (non-AVX) 128-bit, 8 64-bit, etc. AMD on the other hand, also has 8 256-bit, but when you are running (non-AVX) 128-bit, you get SIXTEEN (because each bulldozer FPU can "split" in two and run 2 128-bit instructions in the 256-bit space). So, theoretically, AMD will be twice as fast for 128-bit floating-point calculations. The same should also be true for 64-bit (16, not 32 though, because they only split once). I figured that Intel was going to run away with the top CPUs list for seti, but if this works as well as it sounds on paper, the new king of the crunching hill may be AMD! Oddly enough, that may mean it would be better NOT to run AVX on Bulldozer CPUs...which would be a very interesting turn of events, imo. http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=137432&start=825 The guy who wrote the piece works for AMD, so I presume he knows what he's talking about. Of course this is "unofficial", meaning, not an announcement from AMD, so it might be best to take it with a grain of salt until we get real benchmarks and/or information from the horse's mouth. I thought it was fascinating though, so I figured I'd pass it along. -baron_iv Proud member of: GPU Users Group |
Frizz Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 |
I read something interesting about the bulldozer (amd) line ... Plus Bulldozer will support FMA4. |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
Anything new on this? It's been almost 2 weeks since there was a post. I've upgraded to win 7 SP1 so I can beta test whenever someone comes up with an application with AVX support for the sandy bridge processors. Thanks in advance for replies and interest. -baron_iv Proud member of: GPU Users Group |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
I just received another 2600k on Wednesday so my offer of development hardware was pushed back. These days getting a board is somewhat of a challenge but I was able to get one before they were pulled. Also I was kinda holding out for the release of SP1 for Win7. Being a technet member I have gotten it but need to do an image of the machine prior to the install. I do have new install disks that have SP1 integrated so the new machine might become the target. Todd |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
Well, I'm still more than happy to offer any of the project's developers access to my 2600K system via VNC, if they need it to develop for AVX. I'd also love to test any new apps. I think I said it above, but I'm running Windows 7 x64 SP1, so software-wise, I'm ready to go for testing. -baron_iv Proud member of: GPU Users Group |
outlaw Send message Joined: 6 Mar 00 Posts: 43 Credit: 17,063,897 RAC: 0 |
I posted a link to the 32 bit version. Unfortunately I haven't had the time, yet, to fix the 64 bit build issues... |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
I posted a link to the 32 bit version. I saw that, but won't it throw errors if I run it on the 64-bit version of boinc on win64? -baron_iv Proud member of: GPU Users Group |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
I posted a link to the 32 bit version. Boinc and Seti@Home apps are two different things. You should be fine, I run 64bit lunatics apps for the non-Cuda stuff but the lunatics cuda apps are 32 bit. ;) Traveling through space at ~67,000mph! |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
Ok, running the build from above. To me, it actually APPEARS to be running MUCH slower than the SSE 4.1 build, based on how fast the numbers are going up and the time to completion. So I paused 4 old tasks and began 4 new ones with the new app (while continuing to run 4 old ones). It's taken about 8 minutes for 1%, which is CONSIDERABLY longer than it would have taken with the optimized lunatics application. I will go ahead and let all these run to completion so we can know if they are valid, but I suspect that after that, I will go back to the SSE 4.1 app for now...which finishes a full unit in about 125-140 minutes on average. Can anyone tell me exactly what applications I need to build the seti apps from source with AVX instructions/optimizations? All of this is so easy on linux, since it comes with compilers, but I have no clue how to compile anything on windows. I guess now is as good of a time as any to learn. :) I looked at intel's site and there are a dozen different apps there for download. Visual compiler, C++ compiler, C compiler, several different composers, etc etc etc. Then when I went to download one, it said I had to have a microsoft product of some sort. All rather confusing if you ask me. Makes me miss good ol' GCC and the SIMPLICITY of Linux! I mean I compiled my entire operating system from time to time in linux, and it was easy compared to even figuring out what applications to use on windows. hehe -baron_iv Proud member of: GPU Users Group |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Can anyone tell me exactly what applications I need to build the seti apps from source with AVX instructions/optimizations? All of this is so easy on linux, since it comes with compilers, For Windows builds: - AKv8b used Intel compiler 10.1 update 3 (pro licence), over Visual Studio 2005 professional with SP1, including Platform SDK. - Stock uses DevC++/MingGW environment (I believe, though I use Visual Studio to work with the stock codebase. ) I'd better chip in my 2 cents, because, after working with these codebases for a few years, I believe expecting AVX optimisation to 'fall-out' of compilers is not an approach that's going to yield satisfactory results. Recompilation with some compiler flag turned on is not optimisation (I wish it was :) ). With respect to AVX, it all boils down to the fact that the 'true optimised' core components of the multibeam applications, both stock and AKv8b, are actually hand optimised & vectorised pieces of code, rather than overly compiler dependant (generic code). In some areas use of Intel Compiler/Libraries & profile guided optimisations accounts for 10-20% (or so) of the difference between stock & AKv8, the rest being a combination of optimisations at a higher algorithmic level, and at a lower microarchitectural level taking higher level algorithm details into account. ( Compilers can't do that :) ) Simply changing compiler options will not do for this style of code, as it is not generic C/C++ but primarily intrinsics & assembly. No compiler option is capable of optimisation of the nature necessary, and it must be done by hand. That is, for example, AFAIK no compiler exists that will convert hand SSSE3 intrinsics, such as found in Alex Kan's excellent pulsefinding, into AVX, since the 256 bittage is a fundamental architectural shift that is going to need higher level consideration, along with extensive expansion & re-factoring of earlier generation code to do the new architecture justice. That isn't to say recompilation with new libraries & AVX turned on won't give *some* benefit. It will most definitely (if done right), but don't be under the illusion that compilers will (or can) do all the work that's needed to get the most out of the new instruction set. If you, or anyone, happens to be interested in finding out more on optimisation, there are plenty of resources I could point you in the direction of, as well as plenty of code around here that could do with a few thousand man hours of refinement as homework :D Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
Unfortunately, sounds like that's WAY over my head. I was hoping it would be as simple as taking the source and recompiling. Looks like AVX instructions are gonna take a while to make it into seti. I understand though, you guys work hard, using your own free time to make these things work on different hardware. If I knew anything at all about programming, I'd jump right in on this project, but all I'd do is make a giant mess. I still offer VNC access to my sandy bridge system for any developers who want to give this a shot, if it's not something they can do on their own hardware. There are more and more people using SB CPUs every day, so I'm guessing the requests for this will grow over time. As of now, it seems that there aren't that many of us who have upgraded. If there's anything I can do to help the effort, please don't hesitate to ask. :) -baron_iv Proud member of: GPU Users Group |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
If there's anything I can do to help the effort, please don't hesitate to ask. :) No probs. When I started with exploring the multibeam code I was reasonably comfortable with the Stock & Lunatics stuff, but even now years later I need a good cup of tea & a lie down to recover after looking at particular portions ( especially Alex's pulsefinding :) ). That's saying something, since I'm a coffee & beer drinker. I really enjoy seeing new technologies come online, and the interest they stir. The current 'state of the art' applications happen to be a bit of a patchwork quilt of many contributions, and can be a challenge with all the styles & mixed stages of completion. AVX is going to be a worthwhile endeavour, but not something I'm taking lightly myself, nor rushing with high short term expectations. I've got Todd's and you're own offers of VNC access under the belt, as well as the tools. Now all I need is some time to complete the optimisations still under my belt for Cuda cards (including pre-emptive Kepler) , which thanks to an 'accidentally & hugely productive public testing run' with power-spectrum pipeline unit tests amounts to a ~60% rewrite of the Cuda application so far, before moving onto more challenging pulse-finding code .... Phew! it's a big year for technology :D Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
I'm almost beginning to wonder if I should have held out for bulldozer. The design of bulldozer may make it an absolute BEAST when it comes to pure number crunching, since the 8 256-bit vector units can be split up into 16 128-bit or 16 64-bit (or 32, etc). AMD has fallen behind in the years since they stunned the market with their K7 and K8 series, which were better than anything Intel was doing at that time, imo. I've always liked AMD because they seem to offer a ton of value, but Intel has kinda run away with the performance crown for quite some time now. It's going to be fascinating to see if bulldozer can bulldoze Sandy Bridge. I mean the rumors are that BD is 50% faster than core i7, which would be incredible, to say the least (and fairly unlikely as well). For those of you who are interested in the AMD/Intel race, ArsTechnica did a really in-depth series of articles a few years ago about the K7/K8 vs P4/Core2. http://www.arstechnica.com/cpu/3q99/k7_theory/k7-one-1.html This one is particularly interesting because he went in-depth into what made the K7 so revolutionary. That's just before I'd started seti, but I suspect that with K7 there was a significant jump in RAC for those who had K7 systems. I wish we had some sort of record of what new technology does for seti, like a long-term graph that shows things in terms of RAC and correlates that with new technology like the K7, K8, P4, Core2, C2Q, GPU, i-series, etc. I would assume the biggest jump of all time was when GPUs were brought into the project. Probably had more of an impact than the CPU revisions did over many years. -baron_iv Proud member of: GPU Users Group |
outlaw Send message Joined: 6 Mar 00 Posts: 43 Credit: 17,063,897 RAC: 0 |
... It's taken about 8 minutes for 1%, which is CONSIDERABLY longer than it would have taken with the optimized lunatics application. I will go ahead and let all these run to completion so we can know if they are valid, but I suspect that after that, I will go back to the SSE 4.1 app for now... Thanks, good info. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.