Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
OMG... well, will talk on your language: bring me one of this hosts, I insert my poor 9600GSO there and then show you that this top host + my GSO will outperform this top host w/o GPU. You just demonstrated that no one of current top 100 (or less) using GPU. And what ?? Just wait and one of these top hosts will use GPU and will become faster than it was before. |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
In your dream, the PCI express bus will slow you down enough that it will never get there on very fast CPU, sorry!!!! you are dreaming. SETI is not like folding at home. And I forgot , Powerwise, it is a NV dissaster! lol |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
LoL :) And what this code do ? What function in MB or AP it can speed up ? It's just one single SIMD loop, not an app, not function - are you joking telling about you post code ?? Current AK8 or even opt AP have many such SIMD loops already - what you wanna demonstrate by your own one ? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, computing over PCI-E bus just nonsense. Data feeded to GPU then GPU processed data (in its own memory space, no PCI-E bus communication involved) then results go back. As long data feeding/retrieving compose small fraction of total data processing time - this solution will free main CPU for additional work. |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
look at the date dude ... At this time, SIMD was not so common. I never heard somebody chalenging my ASM capabilities ... hahaha this is the best, I think you need to start doing your home work. lol. I designed many of the instructions you used every day, want to teach me how to use them? lol. Discussion is over ... use google dude! |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
I SEE, YOU ARE A MAGICIEN ... YOU SEND DATA FROM CPU TO GPU, AND IT DOES NOT TAKE PCI EXTPRESS TIME ... INTERESTING VOODOO, YOU GOT TO GIVE US THE "RECETTE" LOL LOOK AT THE CODE OF SETI FOR CUDA, VTUNE IT, AND STOP STAYING INNACURATE STATEMENTS. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
1) Yes, I looked on date, it's ~time when Lunatics did AK8 port. And sure highly SIMDified KWSN V2.4 was online much before your post anyway :P So, what you demonstrated by that single loop? What part of code ?? 2) Well, I looking at your posts pretty long time already and yes, your ASM capabilities are doubtful for me, sorry. I have habit to trust benchmarks, not loud words. When I build app with your code involved, benchmark it versus code that do same task before and see some speedup - then I will trust words more. Till now there is nothing to benchmark from you ;) 3) I'm not wanna teach you assembler, not even intrinsics that you used in your post (it's not assembler for your info ;) ). Your claims just not founded on any checkable basis. You claim you develop new asm instructions - fine, great claim. And so? How this claim applicable to current topic ? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Sure it takes time. Moreover, it takes CPU time too. That about this statement was:
And yes, this feeding/retrieving takes small fraction ~3-4% of Q9450 time. |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
Again, the claims NV did was on a Phenom running non optimized code ... if you use a Core i7 with optimized code, you punish big time the G92, and if you use a skulltrail, the punishement is even bigger ... and the MAC ... lol! the top! On hightly performing processors, the time to send through PCI express is heavy compare to fast DDRIII ram for Nehalem. Sorry, it is all dreamland ... I ll believe it when I see it on TOP 1 vtune tell me it is not going to work, seti uses too many time the same puslse and compare ( FindPulse() ) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Again, the claims NV did was on a Phenom running non optimized code ... if you use a Core i7 with optimized code, you punish big time the G92, and if you use a skulltrail, the punishement is even bigger ... and the MAC ... lol! the top! 1)Know nothing about NVidia claims abouth some Phenom, I have no phenom avalable. I have Yorkfield Q9450 and GeForce 9600GSO. And do benchmarks on this host. 2) PCI-E will be used only at begin of computations and after computations, as I already said it's nonsense to do computations over PCI-E bus. But memory access used constantly during processing. Caause current SETI datasets don't fit in L1 and L2 caches (the same situation will be with L3 cache too - too many cores). |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
Again, the claims NV did was on a Phenom running non optimized code ... if you use a Core i7 with optimized code, you punish big time the G92, and if you use a skulltrail, the punishement is even bigger ... and the MAC ... lol! the top! On 2 ... do you think your GPU will do any better ... If you look at findpulse, or AlexK version of it, it is still using data locality a lot, more than you think for sure. Most of it is cached. Same for the FFT. You need an increase in mem traffic, due to the increase of compute power. Nehalem does very well at FFT, it does them faster than the GPU, 8 by 8, your GPU without cache will struggle in findpulse, and the FFTs in parallel will not use the max bandwidth of the load ports. I did my homework, I know what I have to do :) The rest is in your imagination, good luck with this, I am done for the day. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
On 2 ... do you think your GPU will do any better ... I think it will do more processing for time spent by CPU to send data to GPU. And great data access locality just helps GPU too - it can keep needed data in GPU memory and not use PCI-E heavely. Why you refuse to look at GPU not as another CPU better or worse then your but as co-processor? IT's possible to do almost whole WU inside GPU. YOu need only pass inital data array there and retrieve results from it. Look at task size - not SO big data array need to be feeded in ideal case. How many data transfers in current CUDA MB - it's question of optimisation of this app, not CUDA technology itself. And it seems there is no many PCI-E transfers in current CUDA MB too - CPU load is really low.
Again... the point is GPU can do FFT (for example) in the same time while CPU doing ANOTHER FFT. If CPU does 10 FFTs while GPU finished one FFT (it's not the case, it's just example) - well, FINE, you will do almost 11 FFT instead of just 10 FFTs. Almost - because of some CPU share neded to feed GPU. Are you claim this share so big that CPU could make 11 FFTs per same time period if it would not feed GPU ?? Addon: If it's so for high end CPUs - please, provide benchmark data. For not high end CPUs I know it's not true. CPU can't do 11 FFT per time period w/o GPU (in my example). |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
About what "toys" and what achievements you talk ?? I suspect a rather oblique reference to Larrabee. As to the cache references, i've no idea. The memory bandwidth on any mid range to highend video card is much more than that of a CPU, particulalry the latest models. And as for the PCIe references? I'm guessing the work is processed by the CPU to make it sutiable for the GPU, then it goes through the PCIe bus to the GPU & the GPU processes & returns the result. Not much data would be sent to the GPU, and bugger all would come back in the way of a result. And when you consider the speed of what is the 1st generation of the GPU application, compared to what would be the 6th (or more) of the CPU application, it shows just how fast the GPU is; and how much faster it can be with time & application development. Grant Darwin NT |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
About what "toys" and what achievements you talk ?? Hm, it's Intel's achievement. Is this person == Intel ?? If so, well, will look benchmarks for this new CPU :) And again, even this new CPU can benefit from co-processor ;)
With highly optimized app you should compare with L2 bandwidth, not with memory (if data locality so high that almost all processing performed inside L2 cache w/o main memory accesses).
PCI-E bus surely slow than memory bus, but it used much-much-much less often. It's roughly the same: to compare HDD speed with memory speed - sure HDD much-much-much slower, but it's used only for checkpointing!
Yes. And the less CPU pre-processing is needed the more viable CUDA solution will be.
Sure. |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Now y'all did it: Made me remove my filter on Who? just to see what the brouhaha was all about. . . *yawn* The filter is back on. |
Ehran Send message Joined: 21 Dec 03 Posts: 4 Credit: 894,870 RAC: 0 |
i just got the new client for boinc and updated the drivers to use cuda. what i'm seeing is that units run through considerably faster than before. my problem lies in that it seems to corrupt about 2/3 of the work units instead of doing them properly. it also seems to cause my video drivers to fail and recover every time it pooches a work unit which does nothing good for my temper. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
i just got the new client for boinc and updated the drivers to use cuda. LOL...ya mean I am not the only cruncher here with a temper??? "Time is simply the mechanism that keeps everything from happening all at once." |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
i just got the new client for boinc and updated the drivers to use cuda. Please look another threads (and maybe even better - threads on beta site - here too much noise) about current CUDA errors. There is a scripts (2 of them already) and modified app build that could help you diminish effects of bugs in current app version (we all wait more stable one soon). Summary of current bugs: driver crashing/freezing/overflows on VLARs (tasks with AR <0.1). Crashing/overflows on VLARS with AR~0.13. Overflows on some of VHARs (AR>~2,7). I recommend to abort all VLARs with AR <0.1 and keep eye or abort too all other tasks from "group of risk". That way you will get much more stable and productive work with current CUDA MB versions. Hope new version will be more easy in use :) |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
i just got the new client for boinc and updated the drivers to use cuda. Remember that this Cuda .exe should be considered as beta but it is on main. Buggs will be ironed out later i assure you. It's a pitty that this thread has become a pissing contest over something that i still could recall as "beta" phase. Time will only tell if Cuda is here to stay or if it's being replaced by other hardware. I urge all to tone down a bit and to avoid the frustration there is always the option to disable Cuda in s@h preferences until this "beta" bugs are ironed out. My 2 cents only.. Kind regards to all Vyper _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
i just got the new client for boinc and updated the drivers to use cuda. Well, you just reaffirmed my contention that this should have been left in Beta testing until it was truly ready....... Oh well.......a small step for Seti.......a little step back for the whole project....... Another step forward when it gets ironed out..... "Time is simply the mechanism that keeps everything from happening all at once." |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.