Message boards :
Number crunching :
Why do the F@H chaps get all the fun? Folding on ATI GPUs
Message board moderation
Author | Message |
---|---|
mr.kjellen Send message Joined: 4 Jan 01 Posts: 195 Credit: 71,324,196 RAC: 0 |
Rant First there was the news that folding@home would work on the PS3...And now it seems it will work on ATI GPU's as well! FAH on ATI Will there ever be a spin off that Boinc users (and projects for that matter) will benefit from? /Rant :) /Anton |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Interesting. Maybe their data doesn't need the precision SETI's does. Or perhaps the SETI@Home programmers haven't tried the Radeon X1900XT yet, as it seems that's the only model that is fast enough to crunch, according to that article. They even said the previous generation model, the Radeon X1800XT, is "considerably slower" than the X1900XT. My guess is that, even though it's a cool idea, they are severely limiting themselves in available crunchers since not everyone has $400-$600 to purchase the high end video cards - not to mention third party cards that might not be cooled properly for long, full loads, which will result in the inevitable graphics card crash along with upset users. Maybe once the X1900XT and future cards become mainstream, it might be worth it. But it is an interesting pioneering effort. |
Francis Noel Send message Joined: 30 Aug 05 Posts: 452 Credit: 142,832,523 RAC: 94 |
|
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
I'm going to go out on a limb and guess the function F@H are doing on the graphics card (GPU) is Fast Fourier Transforms. Hmm, checked site...the system is "gromacs". Have to read up on that. I have seen a few web projects that have libraries capable of doing FFT with various vendors video cards. Currently FFT is using about 18% of the time for a seti WU crunching on Pentium 4 & higher CPUs (from the code I've compiled and profiled)...so only this portion would be speeded up. Also you have to upload the data to the graphics card RAM, FFT it, then download it again to system memory. These 3 things have to take less time than the FFT on the regular computer CPU to be a benefit. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Also you have to upload the data to the graphics card RAM, FFT it, then download it again to system memory. These 3 things have to take less time than the FFT on the regular computer CPU to be a benefit. That was my understanding too. Sending the data across the PCIe/AGP bus, minus latencies, caused too much of a slow down (weakest link in the chain and all that). By the time all is said and done, it's simply faster on the main CPU. Not to mention the fact that this seems to only work well with Radeon X1900 XT type hardware as anything less tested too slow, and the fact that most people out there are using some form of integrated graphics, such as Intel's Graphics Extreme seies. You're not really reaching the widest audience this way. |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
Maybe their data doesn't need the precision SETI's does. I'd like to take this opportunity to clear up a commonly-held misconception about SETI. SETI doesn't need as much precision as you might think it does, even for valid science. First of all, with a few exceptions, the only thing computed in double precision is the FLOP counter--everything else is done in single precision. Second, for strong similarity, the validation limits for SETI WUs require accuracy to within 1 part in 1000, which works out to 10 bits of significand. This is easily achieved in single precision, IEEE 754 compliance or not. Speaking of IEEE 754 compliance, it's also worth pointing out that while GPUs don't guarantee IEEE 754 compliance, neither does the Intel compiler with its default settings, and presumably not with the settings that optimizers use to compile SETI apps. As for GPUs, just because they're meant for graphics doesn't mean that they aren't held to accuracy requirements as well. In the case of OpenGL (and ARB_fragment_program by extension), the accuracy requirement for floating-point calculations is 1 part in about 10000. GPUs released in the past couple years are accurate to within a few ULPs, which is accurate enough. |
EricVonDaniken Send message Joined: 17 Apr 04 Posts: 177 Credit: 67,881 RAC: 0 |
Currently FFT is using about 18% of the time for a seti WU crunching on Pentium 4 & higher CPUs (from the code I've compiled and profiled)...so only this portion would be speeded up. Interesting. Can you post the rest of the profile of where time goes when crunching a seti WU? Does that profile change drastically for WU's with different expected execution times? |
EricVonDaniken Send message Joined: 17 Apr 04 Posts: 177 Credit: 67,881 RAC: 0 |
Maybe their data doesn't need the precision SETI's does. So why did I get such negative pushback when I suggested using GPUs to help with s@h crunching some months back? One of the "grand old men" of the DB community, Jim Grey, used GPUs to assist in external sorting in an algorithm he called "GPUTeraSort". As one can see from the name, it was intended to handle =lots= of data. From the abstract of _GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management_: "Our algorithm uses the data and task parallelism on the GPU to perform memory-intensive and compute-intensive tasks while the CPU is used to perform I/O and resource management. We therefore exploit both the high bandwidth GPU memory interface and the lower bandwidth CPU memory interface..." "...In practice, a 3GHz P4 PC with a $265 nVidia 7800GT is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny;..." Since GPUs with at least the performance of the 7800GT are becoming standard even in modern laptops, it would seem logical to explore using similar techniques for BOINC and BOINC projects. |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
Currently FFT is using about 18% of the time for a seti WU crunching on Pentium 4 & higher CPUs (from the code I've compiled and profiled)...so only this portion would be speeded up. I've begun looking into GPUFFTW a GPU based FFT library...I might try to incorporate it into seti as a lark. All the profiling I've done is over at Simon's (chicken) site. I'll post a highlight here. I've done windows profile, and Josef Segur has done for *nix. For different CPUs and speeds of memory these percentages would vary a bit, but the order of the functions should be about the same. The number in the 3rd column is the total of that function and the sub-functions it calls to complete its work. Windows 5.15 - Intel C++ compiler - Athlon 64 X2 3800+ % of WU run time used Function Name Solo With calls Notes GetFixedPoT 16.43% 26.78% cache miss totals analyze_pot 10.35% f_GetPeak 8.74% w7_ipps_cRadix4InvNorm_32fc 6.60% 14.97% Intel FFT totals f_GetTrueMean 5.24% sum of floats loop find_pulse 4.90% 20.45% sse_tableSum2 4.47% new sub call of find_pulse w7_ipps_cRadix4Inv_32fc 3.84% GaussFit 3.65% v_ChirpData 3.37% 6.23% CalcTrigArray 2.86% memcpy 2.76% *nix 5.17 on DevC++/MinGW optimization: O2 - 1.4 GHz Pentium-m 37.90% find_pulse() ([color=red]Note:[/color]- FFT on released 5.15 and 5.17 is done in [url=http://www.FFTW.org]FFTW separate library[/url], and so the percentages are off because those times are not included) 11.09% v_Transpose4() - Cache misses 6.04% v_ChirpData() 5.28% CalcTrigArray() 5.24% GaussFit() 5.22% f_GetChiSq() 4.71% f_GetTrueMean() 3.61% FindSpikes() 3.29% f_GetPeak() 2.57% lcgf() 2.51% find_triplets() 2.36% v_GetPowerSpectrum() |
EricVonDaniken Send message Joined: 17 Apr 04 Posts: 177 Credit: 67,881 RAC: 0 |
All the profiling I've done is over at Simon's (chicken) site... Got a pointer or a link to where? |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
in a related topic on the Boinc dev mail list, Dr. Anderson posted the following: David Anderson to Tigher, boinc_dev More options 12:53 pm (1 hour ago) I asked MS for help porting BOINC to Xbox about a year ago. They said no. I asked them again 2 days ago, after the F@h/playstation story came out. They said they'd think about it. |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
So why did I get such negative pushback when I suggested using GPUs to help with s@h crunching some months back? You got negative pushback for the very reason that I made my post up there--namely, that people think that GPUs aren't accurate enough for SETI. (Note that you asked me a similar question last time. :P) And hey, if you think people are being unfairly critical of your ideas, what better course of action could there be than to prove them wrong? On a related note, don't think that just because you've made a good suggestion to the SETI boards, it's going to implement itself. The majority of the people on this board are crunchers, not coders. The actual SETI/BOINC developers have other fish to fry, like getting 5.18 out so they can analyze the data from their new multi-beam data recorder. As for active optimizers, I believe I can count them on one hand, perhaps two if I'm feeling optimistic, and we seem to have our hands full enough at the moment. Since GPUs with at least the performance of the 7800GT are becoming standard even in modern laptops, it would seem logical to explore using similar techniques for BOINC and BOINC projects. Just because they're out in modern GPUs doesn't mean we all have them yet. Both of my primary machines, a desktop and a laptop, are still running R300-class GPUs. However, since this isn't the first time that you've brought up the topic of using GPUs for SETI, I'd certainly be interested in seeing a proof-of-concept implementation. I haven't done much work on figuring out how SETI's analysis maps to GPU computation, but perhaps you have. |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
The software for SETI is open source, so go ahead and try and see if you can make it work faster on a GPU. You want it done so much, it's up to you to do it. Prove to yourself it can be done. Instead of asking everyone else to do your bidding. It's time to take the reins in your own hands. My movie https://vimeo.com/manage/videos/502242 |
EricVonDaniken Send message Joined: 17 Apr 04 Posts: 177 Credit: 67,881 RAC: 0 |
The software for SETI is open source, so go ahead and try and see if you can make it work faster on a GPU. You want it done so much, it's up to you to do it. Prove to yourself it can be done. Instead of asking everyone else to do your bidding. It's time to take the reins in your own hands. First, and most important, I have not "asked everyone else to do my bidding". That is a gross mischaracterization of my posts and my POV. Second, I have no need to "prove to myself it can be done". I =know= it can be done. There is no need for another existence proof beyond the myriad number that already exist in domain after domain, including some other BOINC projects. Third, as Alex has rightfully noted most around here are crunchers, not s@h or BOINC coders. At present that includes me. I have a job and other life responsibilities. I do not have the time or $$$ to spare for spending 6-12 months getting to know the code base well enough and then rearchitecting it from the ground up by myself. I =certainly= don't want to do it if the negative response I've seen from the coding community is any indication of how poorly such an effort would be received. So unless or until I see someone "official" supporting the idea, it is going to remain no more than a suggestion I make from time to time. If some folks come out of the woodwork to cooperate with me in an effort to "Make It So", I'll reevaluate my position. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Benher: Actually, that's on Windows 2000 sp4. The DevC++/MinGW combination uses the GCC compiler but produces Windows binaries. It's what Eric Korpela is using to produce the stock Windows builds, my intent was to profile with a build as close to stock as possible. [pre]37.90% find_pulse() That's certainly true, but my guess is FFTW would have been less than 18% since it has built-in optimizations and I was profiling a generic i386 build. Joe |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 |
Third, as Alex has rightfully noted most around here are crunchers, not s@h or BOINC coders. At present that includes me. When I said that the majority of forum posters are "crunchers, not coders," I wasn't referring to SETI or BOINC coders exclusively--I actually meant that they're not coders, period. Given that I'm releasing optimized clients in an unofficial capacity, would you lump me into the category of "s@h or BOINC coder?" I have a day job too, but at the end of the day I find SETI interesting enough to merit the amount of work I've put into it. Besides, since Crunch3r's departure, I feel that the attitude towards optimizers has generally been one of respect, given how few of us are still working on it. I would hardly describe the response to Simon's work with optimized clients as negative. Lastly, optimizing SETI only costs you money when you start needing Intel software tools. :) So unless or until I see someone "official" supporting the idea, it is going to remain no more than a suggestion I make from time to time. Well, if I'm "official" enough for you, then it's on. Mapping the computation to the GPU is a different matter (as I've said before), but I'm up for discussing the concepts. I may be stuck with R300 GPUs in my machines at home, but I'm pretty sure I understand SETI analysis well enough. But as far as people popping up from time to time saying how nice it would be if we could run SETI on our PS3/Xbox 360/GPU/toaster...trust me, there's no shortage of those people already. :P |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 |
I have to side with Alex here. There has been an overwhelmingly positive response to recent optimization efforts. It's a classic case of "if you build it, they will come". Of course, if you don't but just talk about it, you're bound not to feel this effect much. Not to put down your prospective contribution Eric, but it's time to stop talking and start doing :o) I believe I've posted much the same thing in a previous thread and reply to your posts (the original "are there any sites providing optimized clients" thread). Should you choose to do the former, head on over to http://lunatics.at (used to be http://www.zadra.org/seti_enhanced but has since moved to a new URL). Simply register, message me your username and participate. So far, there are lots of capable people who are actively involved in sharing methods and results. One more can't hurt :o) You know how it goes - you have to give some to get some back. So put your knowledge into action. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
Diego -=Mav3rik=- Send message Joined: 1 Jun 99 Posts: 333 Credit: 3,587,148 RAC: 0 |
Let's rig our wrist-watches to crunch SETI WUs. |
Team AUSTRALIA (AlexD) Send message Joined: 1 Jun 99 Posts: 54 Credit: 66,602 RAC: 0 |
Let's rig our wrist-watches to crunch SETI WUs. /applause and /rofl |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.