Why do the F@H chaps get all the fun? Folding on ATI GPUs

Message boards : Number crunching : Why do the F@H chaps get all the fun? Folding on ATI GPUs
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile mr.kjellen
Volunteer tester
Avatar

Send message
Joined: 4 Jan 01
Posts: 195
Credit: 71,324,196
RAC: 0
Sweden
Message 405921 - Posted: 25 Aug 2006, 13:25:09 UTC

Rant
First there was the news that folding@home would work on the PS3...And now it seems it will work on ATI GPU's as well!
FAH on ATI
Will there ever be a spin off that Boinc users (and projects for that matter) will benefit from?
/Rant

:) /Anton
ID: 405921 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15682
Credit: 83,150,612
RAC: 22,421
United States
Message 405981 - Posted: 25 Aug 2006, 14:50:31 UTC

Interesting.


Maybe their data doesn't need the precision SETI's does. Or perhaps the SETI@Home programmers haven't tried the Radeon X1900XT yet, as it seems that's the only model that is fast enough to crunch, according to that article. They even said the previous generation model, the Radeon X1800XT, is "considerably slower" than the X1900XT.

My guess is that, even though it's a cool idea, they are severely limiting themselves in available crunchers since not everyone has $400-$600 to purchase the high end video cards - not to mention third party cards that might not be cooled properly for long, full loads, which will result in the inevitable graphics card crash along with upset users.

Maybe once the X1900XT and future cards become mainstream, it might be worth it. But it is an interesting pioneering effort.
ID: 405981 · Report as offensive
Profile Francis Noel
Avatar

Send message
Joined: 30 Aug 05
Posts: 452
Credit: 137,960,675
RAC: 0
Canada
Message 406018 - Posted: 25 Aug 2006, 15:26:35 UTC

Yep, just saw that at http://folding.stanford.edu/FAQ-ATI.html

I'm kinda jealous.
mambo
ID: 406018 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 406722 - Posted: 26 Aug 2006, 0:20:34 UTC - in response to Message 405921.  
Last modified: 26 Aug 2006, 0:34:38 UTC

I'm going to go out on a limb and guess the function F@H are doing on the graphics card (GPU) is Fast Fourier Transforms. Hmm, checked site...the system is "gromacs". Have to read up on that. I have seen a few web projects that have libraries capable of doing FFT with various vendors video cards.

Currently FFT is using about 18% of the time for a seti WU crunching on Pentium 4 & higher CPUs (from the code I've compiled and profiled)...so only this portion would be speeded up.

Also you have to upload the data to the graphics card RAM, FFT it, then download it again to system memory. These 3 things have to take less time than the FFT on the regular computer CPU to be a benefit.
ID: 406722 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15682
Credit: 83,150,612
RAC: 22,421
United States
Message 406758 - Posted: 26 Aug 2006, 1:34:53 UTC - in response to Message 406722.  

Also you have to upload the data to the graphics card RAM, FFT it, then download it again to system memory. These 3 things have to take less time than the FFT on the regular computer CPU to be a benefit.


That was my understanding too. Sending the data across the PCIe/AGP bus, minus latencies, caused too much of a slow down (weakest link in the chain and all that). By the time all is said and done, it's simply faster on the main CPU.

Not to mention the fact that this seems to only work well with Radeon X1900 XT type hardware as anything less tested too slow, and the fact that most people out there are using some form of integrated graphics, such as Intel's Graphics Extreme seies. You're not really reaching the widest audience this way.
ID: 406758 · Report as offensive
Alex Kan
Volunteer developer

Send message
Joined: 4 Dec 03
Posts: 127
Credit: 29,269
RAC: 0
United States
Message 406907 - Posted: 26 Aug 2006, 4:37:22 UTC - in response to Message 405981.  
Last modified: 26 Aug 2006, 4:40:01 UTC

Maybe their data doesn't need the precision SETI's does.

I'd like to take this opportunity to clear up a commonly-held misconception about SETI.

SETI doesn't need as much precision as you might think it does, even for valid science. First of all, with a few exceptions, the only thing computed in double precision is the FLOP counter--everything else is done in single precision. Second, for strong similarity, the validation limits for SETI WUs require accuracy to within 1 part in 1000, which works out to 10 bits of significand. This is easily achieved in single precision, IEEE 754 compliance or not.

Speaking of IEEE 754 compliance, it's also worth pointing out that while GPUs don't guarantee IEEE 754 compliance, neither does the Intel compiler with its default settings, and presumably not with the settings that optimizers use to compile SETI apps. As for GPUs, just because they're meant for graphics doesn't mean that they aren't held to accuracy requirements as well. In the case of OpenGL (and ARB_fragment_program by extension), the accuracy requirement for floating-point calculations is 1 part in about 10000. GPUs released in the past couple years are accurate to within a few ULPs, which is accurate enough.
ID: 406907 · Report as offensive
EricVonDaniken

Send message
Joined: 17 Apr 04
Posts: 177
Credit: 67,881
RAC: 0
United States
Message 407244 - Posted: 26 Aug 2006, 14:25:58 UTC - in response to Message 406722.  
Last modified: 26 Aug 2006, 14:26:16 UTC

Currently FFT is using about 18% of the time for a seti WU crunching on Pentium 4 & higher CPUs (from the code I've compiled and profiled)...so only this portion would be speeded up.

Interesting. Can you post the rest of the profile of where time goes when crunching a seti WU?

Does that profile change drastically for WU's with different expected execution times?
ID: 407244 · Report as offensive
EricVonDaniken

Send message
Joined: 17 Apr 04
Posts: 177
Credit: 67,881
RAC: 0
United States
Message 407256 - Posted: 26 Aug 2006, 14:51:53 UTC - in response to Message 406907.  
Last modified: 26 Aug 2006, 14:53:43 UTC

Maybe their data doesn't need the precision SETI's does.

I'd like to take this opportunity to clear up a commonly-held misconception about SETI.

SETI doesn't need as much precision as you might think it does, even for valid science. First of all, with a few exceptions, the only thing computed in double precision is the FLOP counter--everything else is done in single precision. Second, for strong similarity, the validation limits for SETI WUs require accuracy to within 1 part in 1000, which works out to 10 bits of significand. This is easily achieved in single precision, IEEE 754 compliance or not.

Speaking of IEEE 754 compliance, it's also worth pointing out that while GPUs don't guarantee IEEE 754 compliance, neither does the Intel compiler with its default settings, and presumably not with the settings that optimizers use to compile SETI apps. As for GPUs, just because they're meant for graphics doesn't mean that they aren't held to accuracy requirements as well. In the case of OpenGL (and ARB_fragment_program by extension), the accuracy requirement for floating-point calculations is 1 part in about 10000. GPUs released in the past couple years are accurate to within a few ULPs, which is accurate enough.

So why did I get such negative pushback when I suggested using GPUs to help with s@h crunching some months back?

One of the "grand old men" of the DB community, Jim Grey, used GPUs to assist in external sorting in an algorithm he called "GPUTeraSort".
As one can see from the name, it was intended to handle =lots= of data.

From the abstract of _GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management_:
"Our algorithm uses the data and task parallelism on the GPU to perform memory-intensive and compute-intensive tasks while the CPU is used to perform I/O and resource management. We therefore exploit both the high bandwidth GPU memory interface and the lower bandwidth CPU memory interface..."
"...In practice, a 3GHz P4 PC with a $265 nVidia 7800GT is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny;..."

Since GPUs with at least the performance of the 7800GT are becoming standard even in modern laptops, it would seem logical to explore using similar techniques for BOINC and BOINC projects.


ID: 407256 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 407325 - Posted: 26 Aug 2006, 16:52:51 UTC - in response to Message 407244.  
Last modified: 26 Aug 2006, 17:01:44 UTC

Currently FFT is using about 18% of the time for a seti WU crunching on Pentium 4 & higher CPUs (from the code I've compiled and profiled)...so only this portion would be speeded up.

Interesting. Can you post the rest of the profile of where time goes when crunching a seti WU?

Does that profile change drastically for WU's with different expected execution times?


I've begun looking into GPUFFTW a GPU based FFT library...I might try to incorporate it into seti as a lark.

All the profiling I've done is over at Simon's (chicken) site. I'll post a highlight here.

I've done windows profile, and Josef Segur has done for *nix. For different CPUs and speeds of memory these percentages would vary a bit, but the order of the functions should be about the same.

The number in the 3rd column is the total of that function and the sub-functions it calls to complete its work.

Windows 5.15 - Intel C++ compiler - Athlon 64 X2 3800+
                            % of WU run time used
Function Name               Solo   With calls       Notes
GetFixedPoT                  16.43%      26.78%     cache miss totals
analyze_pot                  10.35%
f_GetPeak                     8.74%
w7_ipps_cRadix4InvNorm_32fc   6.60%      14.97%     Intel FFT totals
f_GetTrueMean                 5.24%                 sum of floats loop
find_pulse                    4.90%      20.45%
sse_tableSum2                 4.47%                 new sub call of find_pulse
w7_ipps_cRadix4Inv_32fc       3.84%
GaussFit                      3.65%
v_ChirpData                   3.37%       6.23%
CalcTrigArray                 2.86%
memcpy                        2.76%



*nix 5.17 on DevC++/MinGW optimization: O2 - 1.4 GHz Pentium-m
37.90% find_pulse()
 ([color=red]Note:[/color]- FFT on released 5.15 and 5.17 is done in [url=http://www.FFTW.org]FFTW separate library[/url], and so the percentages are off because those times are not included)
11.09% v_Transpose4()   - Cache misses
6.04% v_ChirpData()
5.28% CalcTrigArray()
5.24% GaussFit()
5.22% f_GetChiSq()
4.71% f_GetTrueMean()
3.61% FindSpikes()
3.29% f_GetPeak()
2.57% lcgf()
2.51% find_triplets()
2.36% v_GetPowerSpectrum()
ID: 407325 · Report as offensive
EricVonDaniken

Send message
Joined: 17 Apr 04
Posts: 177
Credit: 67,881
RAC: 0
United States
Message 407353 - Posted: 26 Aug 2006, 17:32:08 UTC

All the profiling I've done is over at Simon's (chicken) site...

Got a pointer or a link to where?
ID: 407353 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 407366 - Posted: 26 Aug 2006, 18:19:30 UTC

in a related topic on the Boinc dev mail list, Dr. Anderson posted the following:

David Anderson to Tigher, boinc_dev
More options 12:53 pm (1 hour ago)

I asked MS for help porting BOINC to Xbox about a year ago.
They said no.
I asked them again 2 days ago, after the F@h/playstation story came out.
They said they'd think about it.

ID: 407366 · Report as offensive
Alex Kan
Volunteer developer

Send message
Joined: 4 Dec 03
Posts: 127
Credit: 29,269
RAC: 0
United States
Message 407661 - Posted: 27 Aug 2006, 0:52:03 UTC - in response to Message 407256.  

So why did I get such negative pushback when I suggested using GPUs to help with s@h crunching some months back?

You got negative pushback for the very reason that I made my post up there--namely, that people think that GPUs aren't accurate enough for SETI. (Note that you asked me a similar question last time. :P) And hey, if you think people are being unfairly critical of your ideas, what better course of action could there be than to prove them wrong?

On a related note, don't think that just because you've made a good suggestion to the SETI boards, it's going to implement itself. The majority of the people on this board are crunchers, not coders. The actual SETI/BOINC developers have other fish to fry, like getting 5.18 out so they can analyze the data from their new multi-beam data recorder. As for active optimizers, I believe I can count them on one hand, perhaps two if I'm feeling optimistic, and we seem to have our hands full enough at the moment.
Since GPUs with at least the performance of the 7800GT are becoming standard even in modern laptops, it would seem logical to explore using similar techniques for BOINC and BOINC projects.

Just because they're out in modern GPUs doesn't mean we all have them yet. Both of my primary machines, a desktop and a laptop, are still running R300-class GPUs.

However, since this isn't the first time that you've brought up the topic of using GPUs for SETI, I'd certainly be interested in seeing a proof-of-concept implementation. I haven't done much work on figuring out how SETI's analysis maps to GPU computation, but perhaps you have.
ID: 407661 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3222
Credit: 4,603,826
RAC: 0
United States
Message 407674 - Posted: 27 Aug 2006, 1:15:01 UTC

The software for SETI is open source, so go ahead and try and see if you can make it work faster on a GPU. You want it done so much, it's up to you to do it. Prove to yourself it can be done. Instead of asking everyone else to do your bidding. It's time to take the reins in your own hands.

ID: 407674 · Report as offensive
EricVonDaniken

Send message
Joined: 17 Apr 04
Posts: 177
Credit: 67,881
RAC: 0
United States
Message 407683 - Posted: 27 Aug 2006, 1:40:17 UTC - in response to Message 407674.  
Last modified: 27 Aug 2006, 1:46:36 UTC

The software for SETI is open source, so go ahead and try and see if you can make it work faster on a GPU. You want it done so much, it's up to you to do it. Prove to yourself it can be done. Instead of asking everyone else to do your bidding. It's time to take the reins in your own hands.

First, and most important, I have not "asked everyone else to do my bidding".
That is a gross mischaracterization of my posts and my POV.

Second, I have no need to "prove to myself it can be done". I =know= it can be
done. There is no need for another existence proof beyond the myriad number
that already exist in domain after domain, including some other BOINC projects.

Third, as Alex has rightfully noted most around here are crunchers, not s@h or
BOINC coders. At present that includes me.
I have a job and other life responsibilities.
I do not have the time or $$$ to spare for spending 6-12 months getting to know
the code base well enough and then rearchitecting it from the ground up by
myself.
I =certainly= don't want to do it if the negative response I've seen from the
coding community is any indication of how poorly such an effort would be
received.

So unless or until I see someone "official" supporting the idea, it is going to
remain no more than a suggestion I make from time to time.
If some folks come out of the woodwork to cooperate with me in an effort to
"Make It So", I'll reevaluate my position.
ID: 407683 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 407757 - Posted: 27 Aug 2006, 4:27:31 UTC - in response to Message 407325.  
Last modified: 27 Aug 2006, 4:31:16 UTC

Benher:
I've done windows profile, and Josef Segur has done for *nix.
...
*nix 5.17 on DevC++/MinGW optimization: O2 - 1.4 GHz Pentium-m


Actually, that's on Windows 2000 sp4. The DevC++/MinGW combination uses the GCC compiler but produces Windows binaries. It's what Eric Korpela is using to produce the stock Windows builds, my intent was to profile with a build as close to stock as possible.

[pre]37.90% find_pulse()
(Note:- FFT on released 5.15 and 5.17 is done in FFTW separate library, and so the percentages are off because those times are not included)


That's certainly true, but my guess is FFTW would have been less than 18% since it has built-in optimizations and I was profiling a generic i386 build.
Joe
ID: 407757 · Report as offensive
Alex Kan
Volunteer developer

Send message
Joined: 4 Dec 03
Posts: 127
Credit: 29,269
RAC: 0
United States
Message 409061 - Posted: 28 Aug 2006, 8:00:27 UTC - in response to Message 407683.  

Third, as Alex has rightfully noted most around here are crunchers, not s@h or BOINC coders. At present that includes me.
I have a job and other life responsibilities.
I do not have the time or $$$ to spare for spending 6-12 months getting to know the code base well enough and then rearchitecting it from the ground up by myself.
I =certainly= don't want to do it if the negative response I've seen from the coding community is any indication of how poorly such an effort would be received.

When I said that the majority of forum posters are "crunchers, not coders," I wasn't referring to SETI or BOINC coders exclusively--I actually meant that they're not coders, period. Given that I'm releasing optimized clients in an unofficial capacity, would you lump me into the category of "s@h or BOINC coder?" I have a day job too, but at the end of the day I find SETI interesting enough to merit the amount of work I've put into it.

Besides, since Crunch3r's departure, I feel that the attitude towards optimizers has generally been one of respect, given how few of us are still working on it. I would hardly describe the response to Simon's work with optimized clients as negative.

Lastly, optimizing SETI only costs you money when you start needing Intel software tools. :)
So unless or until I see someone "official" supporting the idea, it is going to remain no more than a suggestion I make from time to time.
If some folks come out of the woodwork to cooperate with me in an effort to "Make It So", I'll reevaluate my position.

Well, if I'm "official" enough for you, then it's on. Mapping the computation to the GPU is a different matter (as I've said before), but I'm up for discussing the concepts. I may be stuck with R300 GPUs in my machines at home, but I'm pretty sure I understand SETI analysis well enough.

But as far as people popping up from time to time saying how nice it would be if we could run SETI on our PS3/Xbox 360/GPU/toaster...trust me, there's no shortage of those people already. :P
ID: 409061 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 409277 - Posted: 28 Aug 2006, 16:43:11 UTC
Last modified: 28 Aug 2006, 16:43:53 UTC

I have to side with Alex here.
There has been an overwhelmingly positive response to recent optimization efforts.

It's a classic case of "if you build it, they will come". Of course, if you don't but just talk about it, you're bound not to feel this effect much.

Not to put down your prospective contribution Eric, but it's time to stop talking and start doing :o) I believe I've posted much the same thing in a previous thread and reply to your posts (the original "are there any sites providing optimized clients" thread).

Should you choose to do the former, head on over to http://lunatics.at (used to be http://www.zadra.org/seti_enhanced but has since moved to a new URL). Simply register, message me your username and participate. So far, there are lots of capable people who are actively involved in sharing methods and results. One more can't hurt :o)

You know how it goes - you have to give some to get some back. So put your knowledge into action.

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 409277 · Report as offensive
Profile Diego -=Mav3rik=-
Avatar

Send message
Joined: 1 Jun 99
Posts: 333
Credit: 3,587,148
RAC: 0
Message 410767 - Posted: 30 Aug 2006, 4:29:05 UTC - in response to Message 409277.  

Let's rig our wrist-watches to crunch SETI WUs.
ID: 410767 · Report as offensive
Team AUSTRALIA (AlexD)

Send message
Joined: 1 Jun 99
Posts: 54
Credit: 66,602
RAC: 0
Australia
Message 410829 - Posted: 30 Aug 2006, 5:08:42 UTC - in response to Message 410767.  
Last modified: 30 Aug 2006, 5:08:58 UTC

Let's rig our wrist-watches to crunch SETI WUs.

/applause and /rofl
ID: 410829 · Report as offensive

Message boards : Number crunching : Why do the F@H chaps get all the fun? Folding on ATI GPUs


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.