Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next
Author | Message |
---|---|
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
About what "toys" and what achievements you talk ?? Just a small note: Larrabee is rumored to be Intel's new high performance GPU, so it will compete with nVidia and ATi's higher end offerings. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
About what "toys" and what achievements you talk ?? From AnandTech article: " Well, it is important to keep in mind that this is first and foremost NOT a GPU. It's a CPU. A many-core CPU that is optimized for data-parallel processing. " But it should be used as replacement to current GPUs (as far as I understand from that article). But maybe even this hybrid can co-exists with nVidia GPUs in single case? ;) If yes, all that I said about GPU as co-processor remains valid for this new chip too. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
My post was not only for you.. ..it was for all at the board which don't like CUDA.. [said in kind words..] :-) |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
How would be the performance? A PCIe 2.0 GPU in PCIe 1.0 slot. A PCIe 2.0 GPU in PCIe 2.0 slot. How big would be the slowdown with PCIe 1.0 slot? --------------------------------------- In future the SETI@home-CUDA-app will always need the CPU/Core for crunching? ..and the PCIe-slot for communication/crunching? --------------------------------------- What's with the architecture? Maybe it would better to combine AMD-CPU with nVIDIA-GPU? More performance? Thanks! |
Gecko Send message Joined: 17 Nov 99 Posts: 454 Credit: 6,946,910 RAC: 47 |
Actually, there was much work w/ SIMD and ASM that occurred well before the Seti-Enhanced transition. The initial Seti-BOINC 4.x application was well optimized by the time the Enhanced transition took place in May 06'. Tetsuji Maverick Rai did significant SIMD hand coding/optimizing in 2005 as did Harold Naparst, Hans Dorn and of course, Crunch3r. On PPC front, Alex Kan basically hand-wrote vectorized code for almost the entire PPC application in 2005 to maximize VMX (Altivec) instruction usage, minimize L1 & L2 thrashing and make most efficient use of larger L2 caches in G4 & G5 PPCs. This is why a 1.25 Ghz PPC7455 G4 would produce similar RAC to a P4 running 2x the clock w/ the then-available optimized -doze aps. The Top Comp list in late 05' & early 06' was mostly PPC Macs until Core2 arrived. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
(Gives a big kitty grin).....The Core 2's are what weaned me away from my OCd Semprons.....what a relief. "Time is simply the mechanism that keeps everything from happening all at once." |
Voyager Send message Joined: 2 Nov 99 Posts: 602 Credit: 3,264,813 RAC: 0 |
Is it dumb not to use cuda? I see gpus for less money than a memory upgrade ,$59 with what sounds like more improvment. I'm in Puget Sound Wa. What I found was on Newegg, low-enders, 9400 $59 , 9500 69$ , 9600 $100 . Just thinking maybe more bang for the buck than a memory upgrade, which may add ~5% to my crunching. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
In future the SETI@home-CUDA-app will always need the CPU/Core for crunching? It already doesn't need full core for crunching. Look other threads. CPU load ~3-5% |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
In future the SETI@home-CUDA-app will always need the CPU/Core for crunching? Yes.. so the GPU communicate with the CPU.. over the PCIe-slot.. if crunching.. In the future it will be maybe that the S@H-GPU-CUDA-app don't need to communicate with the CPU if crunch WUs? So a 'high' slot (PCIe 2.0) is no longer needed. Only GPU-crunching without support from the CPU. Only 'upload' and 'download' from the WU over the PCIe-slot to the GPU. |
Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0 |
Maybe, in the future, but you'd still need some sort of external controller to do all the 'house-work' - never mind the PSU, case, OS, LAN/net connection, data storage etc...... Better to get rid of the bugs, first. Probably a mistake, for me to make comments on this subject, surrounded by lots of 'big hitters'! lol Don't take life too seriously, as you'll never come out of it alive! |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Communications very limited. Look prev posts in this thread. With heavy communication over PCI-E computing speed will be too low. Data loaded in big batches on GPU then results are retrieved. Full GPU processing could be even less effective - GPU very poor in places where many branching take place (GPU need to go both branch directions it cant take branch as CPU does). So GPU is the best for stream computations while CPU best in handling program logic. |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
On 2 ... do you think your GPU will do any better ... today, (time taken to send through PCI express + Time doing the FFT + Time sending it back ) > (doing FFT on Core i7 ) End of story. |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
In future the SETI@home-CUDA-app will always need the CPU/Core for crunching? Hahaha ... ok, so let s see what a G92 does on a Celeron ... lol try to get this to Top 1. NV claims that the GPU is the center of all, it is totally innacurate and wrong, on the top of this, they even already claim victory on their website ... what a joke. It is the same for most of the CUDA claims. In French, we call this , farting in the wind, and say that you caused the Storm ... lol |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
[quote]On 2 ... do you think your GPU will do any better ... I think it will do more processing for time spent by CPU to send data to GPU. And great data access locality just helps GPU too - it can keep needed data in GPU memory and not use PCI-E heavely. Why you refuse to look at GPU not as another CPU better or worse then your but as co-processor? IT's possible to do almost whole WU inside GPU. YOu need only pass inital data array there and retrieve results from it. Look at task size - not SO big data array need to be feeded in ideal case. How many data transfers in current CUDA MB - it's question of optimisation of this app, not CUDA technology itself. And it seems there is no many PCI-E transfers in current CUDA MB too - CPU load is really low.
Again... the point is GPU can do FFT (for example) in the same time while CPU doing ANOTHER FFT. If CPU does 10 FFTs while GPU finished one FFT (it's not the case, it's just example) - well, FINE, you will do almost 11 FFT instead of just 10 FFTs. Almost - because of some CPU share neded to feed GPU. Are you claim this share so big that CPU could make 11 FFTs per same time period if it would not feed GPU ?? The Story changed dramatically isn't it, from GPU is going to be 4x faster than Phenom, to "the GPU could do some FFTs" ... If your GPU does some FFT, you are farting in the wind, with a Jeans (the PCI express bus) And for the moment, you are the one who needs to provide data that a GPU can accelerate a Core i7 more than 1% ... because that would be a very expensive 1% I am not even talking about the Dual Nehalem coming ... your GPU will be a drop of water in the sea. lol |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
No, only beginning of story. You just can't realize (are you sure you know about parrallel computations? ;) ) that CPU is NOT SITTING IDLE when GPU does FFT. So you posted WRONG expression. You need to compare time taken to send through PCI express + Time sending it back ) ? (doing FFT on Core i7 ) You should not include time for GPU FFT here. Moreover, PCI-E transfers can be asynchronous regarding CPU, so - CPU should not wait PCI-E in this case too. So what ? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
OMG.... Any NUMBER please ? It's only your claims, no more. I see that my 9600GSO do short task for <7 mins while Q9450 2,66 GHz takes ~11-12 min for the same task. So, more than 1 additional core! If you wanna more precise numbers I will post all benchmarkings (already posted) in this thread too. I'm really tired from your unproven claims. NUMBERS ???? |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
today, (time taken to send through PCI express + Time doing the FFT + Time sending it back ) > (doing FFT on Core i7 ) True. (at least for FFTs shorter than about 64K) End of story. False. The task is not to return FFT data, it is to FFT and process the data, then return extracted meta information. 6.06 may not yet do as much as it should on the GPU, but the small amount of CPU time needed indicates it's doing fairly well in that regard. If it were able to do all setiathome_enhanced WUs without crashing or finding false signals it would be a worthy addition to our crunching capabilities. Joe |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
today, (time taken to send through PCI express + Time doing the FFT + Time sending it back ) > (doing FFT on Core i7 ) Joe, why we should compare this numbers? Lets compare time to process full task on GPU with one FFT on CPU ... What sense in this comparison at all? |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
You are the one claiming performance improvement, where are your numbers????? How can we verify them???? from now, I know you are a fan boy. I will keep adjusting your claims, right now, the public code accelerated nothing , too buggy. This is a fact, you cant change it. |
KWSN Sir Clark Send message Joined: 17 Aug 02 Posts: 139 Credit: 1,002,493 RAC: 8 |
Like Vyper Boinc Manager showed the % rapidly counting down in chunks but only after several hours of crunching so I don't know what's going on. Well, I've ended up with over 20 CUDA WUs all of a sudden So far only one failure which reset the video driver. So far so good. Going up in chunks of between 0.04% to 0.12% a second. It's taken 12 minutes for a 14 credit WU, longer WUs seem to be taking approx 30 minutes. Compared to the roughly 4 hours pre-CUDA I'm seeing approx 8x speed-up so far for the long WUs. I'm not sure what the difference is compared to my initial attempt with CUDA which had slow WUs |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.