Message boards :
Number crunching :
Random Musings About the Value of CPUs vs CUDA
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next
Author | Message |
---|---|
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
today, (time taken to send through PCI express + Time doing the FFT + Time sending it back ) > (doing FFT on Core i7 ) well, after the FFT, you process findpluse() if i am right ... and it is in the cache, usually in the L2, and for sure in the L3 with Core i7. There is not time to do extra FFTs,and that is why it will not help. It may help on very low end CPU, but then, the findpulse will be slow too. (low end CPU usually have less cache, and will cache miss) why do you think NV toke a phenom to compare too, they knew exactly that they could not accelerate Core i7 who? |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
Like Vyper Boinc Manager showed the % rapidly counting down in chunks but only after several hours of crunching so I don't know what's going on. can you point out the units? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
1) ok, my numbers (sure you should read that thread before if you so interesting in GPU/CPU performance comparisons ;) ) http://setiathome.berkeley.edu/beta/forum_thread.php?id=1440 Thread called "CUDA MB benchmarking" pretty straightforward name, isn't it? It's not very good maner to answer by question on question, right? So, your benchmarks ? 2) I don't know how you can verify them - at least you need so hatred GPU with CUDA support :P But others can do it with easy - SETI CUDA (as all SETI CPU versions) can be run in standalone mode and there is very handy benchmarking tool from Lunatics that automates testing process. If you ever did some measuremetns with SETI app and not just say loud words about future "neha", "lara","aha" ;) and so on and so forth you should know how to use it. ADDON: 3) And, BTW, your RAC 252 now, still dropping... waiting monday, right ? ;) |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
those are just few units ... we want to average RAC with and without, this is what matters, the rest is farting in the wind. You find some units that do show some gain, from my long experience in SETI, it does not mean the other 99% of the units will not decelerate by 5X ... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
LoL, it's very revealing words, indeed! :) Are you know that it's the same set that was used for PGOing of AK8 opt app? Here are different AR represented, total execution time reflect performance of app being tested on whole SETI@home data set. So it's not "just few units" at all for anyone who did any benchmarking for SETI before... And don't speak about RAC with me, your RAC still dropping, waiting monday ... Your numbers ? Your benchmarking tools (apparently you never used Lunatics toolset) ? Any things that could be reproduced from you ?? Loud words again ? |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
Just show up an impressive RAC or just close your month, you can say what ever you want, you can t show an good RAC ... that should make a point! lol .... |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
This is my benchmark: WHERE is yours? I can get to 18 000 RAC ... that what matters, then, I move on to other similar project, get good at it, and move on again ... My RAC was 18 000 the 11th Nov 2008. yours was still around 7000, did not progress ... how come? your NV stuff should make it go 25 000 if we listen your claims ... duh! |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
[offtopic] This picture just shows that your RAC dropping and probably will continue dropping. This projuect not about highest RAC at all (btw, high RAC can be produced artifically with easy, think you know methods :) ), it's about computations done. If you can support sustained RAC about your maximum - well, fine, no questions in this area (although this has no connection to CUDA quastion at all) Can you? Your graph demonstrates - you cant'. Your total too low to say anything good about systems you use. So don't knock your past RAC. RAC is valuable only if you can keep it long time. All other just blown record. [/offtopic] And returning to CUDA benchmarking question: ANY reproducible data from you? Sorry, you can't shut my mouth with this graph saying nothing. |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
Show something that can make your curve go like this, then, you can say that your technology will improve SETI if you can't show something like this, well, you have no impact on your average RANKING over the other users, and you are farting in the wind. See the gain of Skulltrail on my curve, from Skulltrail proto in December 2007, to May 2008, where I moved on to an other project. (see, it gets stipper) Those are real benchmark over time, if you can't show this, and only few units, it is misleading at best. (Trust me, I learn this from working on my own code, it can look very good on lunatic benchtool, and be only as good at the code from Alex in reality ... I learned how to shut up after this) You can see that when I added nehalem, my work ranking immediatly started to gain compare to other used, showing that I was crunking faster than the average users, this is how you see on SETI is you have a technology that will win. your curve does not look anylook nothing like this recently. there is no break through your curve, you did not get any faster recently, so, stop telling us the opposite. (my guess is that you bought a Core 2 in August 2008 ... hehehe ) Best regards .... PS: You keep dancing around, changing subject to subject: Show us with and without NV RAC then, it may be true ... otherwise, you are in lalalala land. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
This is my benchmark: It's NOT benchmark result. You really don't know that?
My own hosts statistics you can look by yourself on any statistic server - I don't hide my hosts :P And I didn't do any claims. I provide facts. Claims - it's your prerogative ;) You did many claims about CUDA performance, many liters of dirt you spilled on CUDA. So, I want to see data that allows such behavior for you. Now about RAC of my quad: Yes, it's something that should increase if CUDA can speed up things, indeed, you right in that. Before CUDA MB release it did mostly SETI with AK6 app. After CUDA MB release it did AP + CUDA few days, now it does Einstein@home on its CPU cores and CUDA MB app on GPU. Moreover, I do regular standalone testing on this host too, because I'm interesting in debugging and speeding up CUDA app, not just in blaming CUDA. These reasons lead to RAC drop (at least RAC for SETI). Whe I will finish with standalone testing lets see on sustained RAC of this host (total one, not just for SETI - SETI now only on GPU, CPU does Einstein). Hope this answer question about RAC of this host (my total RAC consists of few hosts - some of them not always available, some have no connections with CUDA at all - so total RAC can't be used as indicator at all). |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
This is my benchmark: really, the RAC is not the ultimate SETI benchmark???????????? what ever dude ... Classified: FANBOY. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
PS: You keep dancing around, changing subject to subject: Show us with and without NV RAC then, it may be true ... otherwise, you are in lalalala land. No, we surely in different lands with you ;) I'm not dancing around, not my style. I still wanna see benchmark results from you, timings in seconds for workunit completed on your CPU, completed on your GPU. With loaded CPU with idle CPU and so on. Any real data that can be reproduced, discussed and so on. You wanted talk about RAC, Ok I can talk about RAC too, but it's not my question. You want RAC (but you should speak about sustained RAC, we all know RAC is very variable thing) with GPU and w/o GPU - Ok you will recive it too, just later - now I have no production host with CUDA and sustained RAC (see earlier post). But where is your RAC with and w/o CUDA ? What you show to us? Some fast CPU ? Fine, and what ? Where comparison with CUDA ? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Yes, REALLY. Sustained RAC - maybe, but RAC itself too variable thing - is this info really new for you? And better to avoid classification of my person - you really will not like if I will start to classify you, right ?;) And who is dancing now? Just for protocol - renew my question - where your timings ? |
KWSN Sir Clark Send message Joined: 17 Aug 02 Posts: 139 Credit: 1,002,493 RAC: 8 |
Like Vyper Boinc Manager showed the % rapidly counting down in chunks but only after several hours of crunching so I don't know what's going on. One from the first batch: Task 1 One from recent batch: Task 2 Can't see any difference in the flop count or anything else reported in the Task information, Task 1 took hours rather than minutes Task 2 did. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
today, (time taken to send through PCI express + Time doing the FFT + Time sending it back ) > (doing FFT on Core i7 ) The output from the FFT is converted to a Power Spectrum, and Spike finding is always done. For some chirp/fft pairs the full array of Power Spectrums is transposed and Pulse, Triplet, or Gaussian searches are done. It may help on very low end CPU, but then, the findpulse will be slow too. (low end CPU usually have less cache, and will cache miss) Perhaps you missed this part of the NVIDIA Press Release? ii Based on a consistent and reproducible SETI@home workload. Time-to-compute is measured and lower time is better. NVIDIA® GeForce® GTX 280-based system processes workload on the NVIDIA GPU and is based on an NVIDIA nForce® 780i SLI™-based motherboard, NVIDIA GTX 280 GPU, Intel Core i7 965 CPU, 2GB DDR2 DRAM and processes the workload in 391 seconds. “Fastest consumer multicore CPU-based system†processes the entire workload on CPU and is based on an ATI Radeon HD4870 GPU, Intel x58-based motherboard, Intel Core i7 965, 3GB DDR3 DRAM and processes the workload in 670 seconds. I suspect the "consistent and reproducible SET@home workload" was poorly chosen, otherwise the tendency to crash and/or produce false positives would have been caught, and they don't say if HT was in use on the 965. Still, if the GTX 280 can do say four tasks in 391 seconds while 3/4 or 7/8 of the 965 is still available for other work, that's a productivity increase. I look forward to seeing what a Larrabee based GPU card can do on a similar test. Joe |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Ah, I just realized, you posted my rank in BOINC ? LoL, are you really think it's saying something about CUDA performance ?? Is it real "analytical" approach of your firm? ROFL. Just for your information: BOINC ranks depends of performance my own hosts (all of them, not just CUDA enabled), from their up time, from accessible work from projects I participate, and finally - from parformance of all other hosts involved in comparison! Are you really think this value can say anything about CUDA parformance on my host? It's just not true. But I provided values that illustrate CUDA performance on my host, exactly CUDA performance, not something else. Can you provide same data about your host or not? Pretty simple question. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
there is no break through your curve, you did not get any faster recently, so, stop telling us the opposite. (my guess is that you bought a Core 2 in August 2008 ... hehehe ) 1) I described reasons why you didn't see any speedups on this graph - too many factors in play - that's why such graphs CAN'T be used as benchmarks. 2) You right I installed BOINC on quad in august :) And it gave great performance boost, now it's fastest of my host and soon will complete more work than all my other hosts did, no doubts in that. |
Voyager Send message Joined: 2 Nov 99 Posts: 602 Credit: 3,264,813 RAC: 0 |
I would hope this thread would just not be arguing, but provide simple info on cuda. It looks like it's here for now, so I would hope I can learn from the boards. I don't want to be granted a Phd in theory, please something understandable so helpful. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Here are Intel- and nVIDIA- fans.. ;-D I'm a SETI@home-fan! :-D So I would like to have the best hardware to crunch in less time! Sorry for my ignorance.. I think my post is going under in this thread.. So please back to topic.. How would be the performance? Larrabee? When it's available to buy? :-) AND, the S@H-CUDA-app can run on this GPU? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Larrabee? Rumours are, sometime later this year. Like anything new, expect prices to be excessive, and performance to be OK at best. V2/2nd revision is expected mid 2010 & will more likey give a better idea of just what it is capable of. AND, the S@H-CUDA-app can run on this GPU? Given that a Larrabee video card will essentially be a whole bunch of modified x86 CPUs on the same silicon, theoretically it should be possible to run Seti on it with minimal changes to get it to do so. From my previous link Programming for Larrabee The Larrabee programming model is what sets it apart from the competition. While competing GPU architectures have become increasingly programmable over the years, Larrabee starts from a position of being fully programmable. To the developer, it appears as exactly what it is - an arrangement of fully cache coherent x86 microprocessors. The first iteration of Larrabee will hide this fact from the OS through its graphics driver, but future versions of the chip could conceivably populate task manager just like your desktop x86 cores do today. Given that the initial card will probably have 24 or so cores, and the later one (2010) 64 or more, that will mean the possibility of processing that many Work Units at a time (or close to it- depending on what mode your video card is running in for your desktop & applications). Wild personal specualtion- like this first Seti CUDA effort, i expect the intial Larrabee to be pretty underwheliming- lots of potential, reasonable performance, but plenty of work still to be done. However as Intel revise the microcode, work on the caches, chip to chip communications, data transfer etc, etc, and if developers get on the band wangon, late 2010/early 2011 could be a very interesting period for distributed computing. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.