1)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 854063)
Posted 16 Jan 2009 by Francois Piednoel
Post:
Keep trying ...
Show the best you can get ...
|
2)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 853706)
Posted 15 Jan 2009 by Francois Piednoel
Post:
Addendum to the RAC analysis:
During the second 24 hours (without any restarts, which probably helped), I got a CUDA credit award of 3347, plus 354 extra pendings. It's getting complicated to go back through them all and do the CUDA/CPU credit adjustments, so let's just leave the raw score as a nice round 3,700 credit per day.
Now that LHC has finally finished its mammoth work issue, I switched the two spare CPU cores to Einstein, to check out their new v6.10 application - I think it's the first time an SSE2 optimisation has been released as the stock app for Windows in the current S5R4 run. (But don't all rush at once - they're talking of starting S5R5 in the next day or two). Using the same calculations as before, I got an idealised "CPU RAC" for the Q9300 of 2664, and a "Wall RAC" of 2591 - a loss of 2.7% to CUDA and system overhead.
Let's be clear, the title of the thread is "Random Musings About the Value of CPUs vs CUDA", I have the right to say that the CPU is faster, because it is.
But if you were to equip a EVGA X58 3X SLI Core i7 Motherboard with three gtx 280 graphics cards the graphics cards would out perform the cpu cores.
may be, we will see if it can make it to the Top 20.
|
3)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 853644)
Posted 15 Jan 2009 by Francois Piednoel
Post:
Addendum to the RAC analysis:
During the second 24 hours (without any restarts, which probably helped), I got a CUDA credit award of 3347, plus 354 extra pendings. It's getting complicated to go back through them all and do the CUDA/CPU credit adjustments, so let's just leave the raw score as a nice round 3,700 credit per day.
Now that LHC has finally finished its mammoth work issue, I switched the two spare CPU cores to Einstein, to check out their new v6.10 application - I think it's the first time an SSE2 optimisation has been released as the stock app for Windows in the current S5R4 run. (But don't all rush at once - they're talking of starting S5R5 in the next day or two). Using the same calculations as before, I got an idealised "CPU RAC" for the Q9300 of 2664, and a "Wall RAC" of 2591 - a loss of 2.7% to CUDA and system overhead.
Let's be clear, the title of the thread is "Random Musings About the Value of CPUs vs CUDA", I have the right to say that the CPU is faster, because it is.
|
4)
Message boards :
Number crunching :
CUDA and the BLUE SCREEN OF DEATH
(Message 853622)
Posted 15 Jan 2009 by Francois Piednoel
Post:
I expect more and more chips going bye bye, they are not designed to run for too long, see charlies articles on the inquirer about this.
this is my personal opinion.
Who?
|
5)
Message boards :
Number crunching :
Phenom II Released
(Message 853621)
Posted 15 Jan 2009 by Francois Piednoel
Post:
Quite an interesting set of benchmarks tests for the Phenom II directly compared on clock speed to others.
Looks like AMD are giving Intel a good race. The question is whether AMD can ramp up their clock speed and fab to jump Intel's next offering...
Happy crunchin',
Martin
lol ... on AMDZone ... duh!
|
6)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 850035)
Posted 6 Jan 2009 by Francois Piednoel
Post:
... Shameless marketing, speaking about the good, and only the good. ...
[...]
Bottom line: One more lie.
And Who is throwing the most Marketing?
And as for "Farting in the wind", that must be a very French term!
All very entertaining :-)
Please underwhelm us with your top-line RAC and then show the code that did it. (Nota Bene: We know the trick for gaining short term explosive RACs...)
Let the fun run!
Happy fast crunchin',
Martin
if i take the top 1 position, do you agree to stop posting for a year?
lol
OMG... today is monday, your RAC 333. So what ? What about your promises?....Just bragging?
it sound like you did not figure out the way the RAC is calculated yet. Based on a small increase, you can guess what the final RAC is going to be over 2 or 3 days ...
"Heureux, prospères sont les pauvres en esprit, car le royaume des cieux est àeux"
|
7)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 849562)
Posted 5 Jan 2009 by Francois Piednoel
Post:
... Shameless marketing, speaking about the good, and only the good. ...
[...]
Bottom line: One more lie.
And Who is throwing the most Marketing?
And as for "Farting in the wind", that must be a very French term!
All very entertaining :-)
Please underwhelm us with your top-line RAC and then show the code that did it. (Nota Bene: We know the trick for gaining short term explosive RACs...)
Let the fun run!
Happy fast crunchin',
Martin
if i take the top 1 position, do you agree to stop posting for a year?
lol
|
8)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 849401)
Posted 4 Jan 2009 by Francois Piednoel
Post:
Yet again it appears that another marketing drive, full of lies is about to start.
The CUDA app is buggy yes, of that there is no doubt. When it works however it is clearly faster than a CPU. So far I have noticed it completes the VLAR units it 360 - 420 seconds of actual time and the longer units in approximately 1450 seconds. Now on the MacPro running at 3.2GHz these units take 700 seconds and 3500 seconds respectively. And if you look at my RAC for Seti you see it has increased since I started running CUDA.
Obviously it will never be faster than a Quad or Octo chip, however it is faster than a single CPU.
I have the NV stuff running on an other account ... I don t see it going faster than Core i7, sorry, it just does not.
|
9)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 849391)
Posted 4 Jan 2009 by Francois Piednoel
Post:
The interesting part is that you ll get cheaper way to accelerate with the CPU than with the GPU ... For example, going to Dual Nehalem will give you better scalling than adding GPUs. It is already the case, and with the main stream Nehalem coming, it will get even better. The NV super mega high end card from NV will have to drop to 120$ to be competitive in term of RAC/$ ... i let you do the calculation.
So, mister NV, can we get those NV G92 for 120$?
remember , the way to calculate "how good the GPU is" force you to get a Core i7 in the price, otherwise, your GPU is not competitive... hehehehhe
Well yeah if you don't count the cost of the complete system Nehalem will be cheaper, kind of...
If you put as much effort into work as you do talking smack on line you must be the most productive person at Intel.
BTW a PD 805 feeds a 9600GT I already have fine. Guess what is the cheapest way for me to increase PPD? Buy a completely new system or just use the GPU already installed...
My Job is my hobby, i never have the feeling of working ... the down side, I never stop working :)
|
10)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 849357)
Posted 4 Jan 2009 by Francois Piednoel
Post:
.....I'm convinced the top combination will be Core i7 plus Nvidia GPUs later this year. It won't be cheap, it may not take top honors in energy efficiency, but it will surely be able to do huge numbers of WUs in very little time. Joe
I agree completely. Once things get sorted over the next couple of months, I think we'll see some pretty massive #s by those running i7 and 3x of GTX260, 280 or even dual or Tri GTX295 which would mean 4/6 GPUs. Point is: total RAC will be WAY higher than just i7 alone, or even dual slot i7 boards.
Today, GPU app performance is dependent upon CPU. Who?s point of performance difference between Celeron + GPU vs. i7 + GPU is valid. However, if I run ONE i7 920, I can also potentially add 3x GTX295s for a total of 6 GPUs + 4 CPU cores. Hummm......IDT it's a question of "which" type of processor I prefer or can do the most....I want ALL 10 cores crunching to their MAX ability in 1 rig.
Like I said above, once app/Boinc/quota/scheduling issues are sorted...total RAC should be enormous w/ this kind of set-up. It's a sobering thought considering that the current app and crunching "methodology" only scratch the surface of potential. The GPU app still acts/crunches in many ways like the CPU app & is essentially running code & methodology designed for CPU. It's more of a "port" to GPU than a GPU-designed application. There's a HUGE difference (read: handicap) in this approach.
But... it WORKS and is a logical "1st" step. The next couple of years will be VERY different form the past couple. It really is the beginning of a new era so to say ; > )
The interesting part is that you ll get cheaper way to accelerate with the CPU than with the GPU ... For example, going to Dual Nehalem will give you better scalling than adding GPUs. It is already the case, and with the main stream Nehalem coming, it will get even better. The NV super mega high end card from NV will have to drop to 120$ to be competitive in term of RAC/$ ... i let you do the calculation.
So, mister NV, can we get those NV G92 for 120$?
remember , the way to calculate "how good the GPU is" force you to get a Core i7 in the price, otherwise, your GPU is not competitive... hehehehhe
|
11)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 849355)
Posted 4 Jan 2009 by Francois Piednoel
Post:
In another month or so, we may be in position for a decent RAC competition. BOINC needs to have settings which will allow a host to do both CUDA and CPU work, the S@H app needs to have bugs fixed, the quota/day needs to be adjusted, etc. Until then it would take a lot of effort to get enough astropulse work to occupy the CPUs while the GPU does setiathome_enhanced work. I can't imagine anyone investing that much effort over a long enough time to achieve a stable RAC.
Meanwhile, there have been enough independent posts here about actual speeds of GPU crunching that I'm convinced the top combination will be Core i7 plus Nvidia GPUs later this year. It won't be cheap, it may not take top honors in energy efficiency, but it will surely be able to do huge numbers of WUs in very little time. Joe
Well seti top 20 is a small pissing contest ofcourse and those who get there or near are going to do what they can to stay in that position.
Count me in on that p***ng contest too.
But you're so right in that the code will need polishing and until that has occured it is rather meaningless pointing out what the gpu app cant and could do for my concern.
All i say is that i know several people throughout the years who tried to do a s@h app conversion. Mimo and Hans Dorn is just some of them.
The gpu app is now 100% accurate of what i told them later how a gpu app would be benefited the most when they started developing a gpu version.
And that is exactly by letting the Boinc app know and take care of where the workload is sent to the right .exe for that specific platform..
All in all even though i think that the .exe is beta in regards of the specific VLAR issue i'm also astonished how well the app is offloading the cpu to do other work.
I mean it is truly awesome how they preprocess the wu to make it "streamable" for the gpu to do it's work and when that is done, let the ride begin..
One way to "cheat" is to let Boinc know which WU's are compatible with the present GPU app without freaking out and when Boinc.exe later on can send same tasks (s@h 605 or 606 apps) to both cpu and gpu it could be implemented so that device which get there first could take the appropriate wu if the cue isn't in a high priority mode, otherwise Boinc knows that these WU's that are gpu compatible are beeing assigned to gpu cue but if cpu cue starts to drain Boinc.exe automatically assigns it to cpu.exe or if cpu.exe automatically can fetch work in the gpu cue that is fine also.
Well enough mumbling from me :) , it is going to be an exciting year..
Wonder if Ati is promoting a Amd stream version also ;) ..
I hope that .exe would be bug free first before it is released into the wild .. Lol
Kind Regards to all
Vyper
it did not stop the NV VPs to claim victory, it does say long about how rackless they are. All Cuda claims are the same, if you look at them closely.
|
12)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 849091)
Posted 4 Jan 2009 by Francois Piednoel
Post:
Re: Larrabee
For us mere mortals, hard to place 100% faith in the current info re: Larrabee. Only Who? knows for sure...BUT....if it debuts this year in 45nm process w/ 32 and 24 core versions, then I'm slightly underwhealmed. There appears to be much speculation that Larrabee will be a @ 2 Tflop (single precision) card @ launch. Logical to assume this will be the 32 core variety.
The HD4870X2 is already @ 2.4 Tflops, the GTX295 to be released in a few days will be @ 2 Tfops. I understand that Flops alone is not the end-all basis for performance expectations, but for the purpose of "general" comparison, this is the power available in the market today.
Larrabee will be available to consumers, when? Starting late 09'/early '10?
Assuming that specific compilers, libraries & development tools are available @ the same time, just maybe a Seti application could be available for the Gen-1 Larrabee by March 2010?
I think we can expect to see AMD & Nvidia introduce at least 2, possibly 3 more generations of products of significantly greater processing power by then.
We will certainly see Cuda mature more and I'm confident that our gang of volunteer developers will have some surprises in store for GPU processing after 14-15 more months of development experience. Lastly, CUDA may not be the only API implementation for Seti GPU processing for very long. I expect we will see some success w/ OpenCL based Seti applications for Nvidia and AMD by 2010.
IMO, it seems Larrabee is behind & will playing catch-up when it debuts later this year. That said, there's no question it should still provide for a massive increase in Seti crunching power.
This is only my personal opinion based on available, if not speculative info & some interpolation.
I can't say much about Larrabee, I don't understand why NV is trying to do CUDA enabling,it is giving intel an home run on this, they perfectly know that at doing computational Lrb will kick them big time, and if they don't know, they should think a little about what make more than 1 billion transistor bit fast: Process technology!!!!
anyway, no need of Lrb to beat CUDA on SETI, the CPU will do it. (correction "The CPU already does it" )
|
13)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 849083)
Posted 4 Jan 2009 by Francois Piednoel
Post:
I suspect the "consistent and reproducible SET@home workload" was poorly chosen, otherwise the tendency to crash and/or produce false positives would have been caught, and they don't say if HT was in use on the 965. Still, if the GTX 280 can do say four tasks in 391 seconds while 3/4 or 7/8 of the 965 is still available for other work, that's a productivity increase. I look forward to seeing what a Larrabee based GPU card can do on a similar test. Joe
This is a classical of CUDA claims, take the super mega corner case, and sell it as it is the general case ...
As long as they don t show up with Impressive RAC average, it is all BS ...
|
14)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 848944)
Posted 3 Jan 2009 by Francois Piednoel
Post:
This is my benchmark:
It's NOT benchmark result. You really don't know that?
WHERE is yours? I can get to 18 000 RAC ... that what matters, then, I move on to other similar project, get good at it, and move on again ...
My RAC was 18 000 the 11th Nov 2008. yours was still around 7000, did not progress ... how come? your NV stuff should make it go 25 000 if we listen your claims ... duh!
My own hosts statistics you can look by yourself on any statistic server - I don't hide my hosts :P
And I didn't do any claims. I provide facts. Claims - it's your prerogative ;)
You did many claims about CUDA performance, many liters of dirt you spilled on CUDA. So, I want to see data that allows such behavior for you.
Now about RAC of my quad:
Yes, it's something that should increase if CUDA can speed up things, indeed, you right in that.
Before CUDA MB release it did mostly SETI with AK6 app.
After CUDA MB release it did AP + CUDA few days, now it does Einstein@home on its CPU cores and CUDA MB app on GPU.
Moreover, I do regular standalone testing on this host too, because I'm interesting in debugging and speeding up CUDA app, not just in blaming CUDA. These reasons lead to RAC drop (at least RAC for SETI).
Whe I will finish with standalone testing lets see on sustained RAC of this host (total one, not just for SETI - SETI now only on GPU, CPU does Einstein). Hope this answer question about RAC of this host (my total RAC consists of few hosts - some of them not always available, some have no connections with CUDA at all - so total RAC can't be used as indicator at all).
really, the RAC is not the ultimate SETI benchmark????????????
what ever dude ... Classified: FANBOY.
|
15)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 848938)
Posted 3 Jan 2009 by Francois Piednoel
Post:
Show something that can make your curve go like this, then, you can say that your technology will improve SETI
if you can't show something like this, well, you have no impact on your average RANKING over the other users, and you are farting in the wind.
See the gain of Skulltrail on my curve, from Skulltrail proto in December 2007, to May 2008, where I moved on to an other project. (see, it gets stipper)
Those are real benchmark over time, if you can't show this, and only few units, it is misleading at best. (Trust me, I learn this from working on my own code, it can look very good on lunatic benchtool, and be only as good at the code from Alex in reality ... I learned how to shut up after this)
You can see that when I added nehalem, my work ranking immediatly started to gain compare to other used, showing that I was crunking faster than the average users, this is how you see on SETI is you have a technology that will win.
your curve does not look anylook nothing like this recently.
there is no break through your curve, you did not get any faster recently, so, stop telling us the opposite. (my guess is that you bought a Core 2 in August 2008 ... hehehe )
Best regards ....
PS: You keep dancing around, changing subject to subject: Show us with and without NV RAC then, it may be true ... otherwise, you are in lalalala land.
|
16)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 848935)
Posted 3 Jan 2009 by Francois Piednoel
Post:
those are just few units ... we want to average RAC with and without, this is what matters, the rest is farting in the wind.
LoL, it's very revealing words, indeed! :)
Are you know that it's the same set that was used for PGOing of AK8 opt app? Here are different AR represented, total execution time reflect performance of app being tested on whole SETI@home data set.
So it's not "just few units" at all for anyone who did any benchmarking for SETI before...
And don't speak about RAC with me, your RAC still dropping, waiting monday ...
Your numbers ? Your benchmarking tools (apparently you never used Lunatics toolset) ? Any things that could be reproduced from you ?? Loud words again ?
This is my benchmark:
WHERE is yours? I can get to 18 000 RAC ... that what matters, then, I move on to other similar project, get good at it, and move on again ...
My RAC was 18 000 the 11th Nov 2008. yours was still around 7000, did not progress ... how come? your NV stuff should make it go 25 000 if we listen your claims ... duh!
|
17)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 848932)
Posted 3 Jan 2009 by Francois Piednoel
Post:
those are just few units ... we want to average RAC with and without, this is what matters, the rest is farting in the wind.
LoL, it's very revealing words, indeed! :)
Are you know that it's the same set that was used for PGOing of AK8 opt app? Here are different AR represented, total execution time reflect performance of app being tested on whole SETI@home data set.
So it's not "just few units" at all for anyone who did any benchmarking for SETI before...
And don't speak about RAC with me, your RAC still dropping, waiting monday ...
Your numbers ? Your benchmarking tools (apparently you never used Lunatics toolset) ? Any things that could be reproduced from you ?? Loud words again ?
Just show up an impressive RAC or just close your month, you can say what ever you want, you can t show an good RAC ...
that should make a point! lol ....
|
18)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 848926)
Posted 3 Jan 2009 by Francois Piednoel
Post:
OMG.... Any NUMBER please ?
You are the one claiming performance improvement, where are your numbers?????
How can we verify them????
from now, I know you are a fan boy. I will keep adjusting your claims, right now, the public code accelerated nothing , too buggy. This is a fact, you cant change it.
1) ok, my numbers (sure you should read that thread before if you so interesting in GPU/CPU performance comparisons ;) )
http://setiathome.berkeley.edu/beta/forum_thread.php?id=1440
Thread called "CUDA MB benchmarking" pretty straightforward name, isn't it?
It's not very good maner to answer by question on question, right? So, your benchmarks ?
2) I don't know how you can verify them - at least you need so hatred GPU with CUDA support :P But others can do it with easy - SETI CUDA (as all SETI CPU versions) can be run in standalone mode and there is very handy benchmarking tool from Lunatics that automates testing process. If you ever did some measuremetns with SETI app and not just say loud words about future "neha", "lara","aha" ;) and so on and so forth you should know how to use it.
those are just few units ... we want to average RAC with and without, this is what matters, the rest is farting in the wind.
You find some units that do show some gain, from my long experience in SETI, it does not mean the other 99% of the units will not decelerate by 5X ...
|
19)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 848920)
Posted 3 Jan 2009 by Francois Piednoel
Post:
Like Vyper Boinc Manager showed the % rapidly counting down in chunks but only after several hours of crunching so I don't know what's going on.
Well, giving it another go now.
I've decided to give CUDA another go this time without any other projects vying for CPU time but I've got to get through two Astropulses and several WCG
Well, I've ended up with over 20 CUDA WUs all of a sudden
So far only one failure which reset the video driver.
So far so good. Going up in chunks of 0.04% a second.
It's taken 12 minutes for a 14 credit WU, longer WUs seem to be taking approx 30 minutes.
Compared to the roughly 4 hours pre-CUDA I'm seeing approx 8x speed-up so far for the long WUs.
I'm not sure what the difference is compared to initial attempt
can you point out the units?
|
20)
Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
(Message 848919)
Posted 3 Jan 2009 by Francois Piednoel
Post:
today, (time taken to send through PCI express + Time doing the FFT + Time sending it back ) > (doing FFT on Core i7 )
True. (at least for FFTs shorter than about 64K)
End of story.
False. The task is not to return FFT data, it is to FFT and process the data, then return extracted meta information. 6.06 may not yet do as much as it should on the GPU, but the small amount of CPU time needed indicates it's doing fairly well in that regard. If it were able to do all setiathome_enhanced WUs without crashing or finding false signals it would be a worthy addition to our crunching capabilities. Joe
well, after the FFT, you process findpluse() if i am right ... and it is in the cache, usually in the L2, and for sure in the L3 with Core i7. There is not time to do extra FFTs,and that is why it will not help.
It may help on very low end CPU, but then, the findpulse will be slow too. (low end CPU usually have less cache, and will cache miss)
why do you think NV toke a phenom to compare too, they knew exactly that they could not accelerate Core i7
who?
|