CUDA cards: SETI crunching speeds

Author	Message
Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14683 Credit: 200,643,578 RAC: 874	Message 862792 - Posted: 6 Feb 2009, 18:46:07 UTC - in response to Message 862783. I'm not so sure that wallclock time is acurate.... I would expect that the wallclock time would be accurate, but it isn't necessarily what we want to measure. A user at Einstein yesterday helped us pin down an elusive BOINC bug to do with DCF: wallclock time continues to be counted when a computer is in hibernation. And this afternoon, I had BOINC suspended while I did some fiddly editing in client_state.xml - according to the script, one of my VLARs (off-chart) took 45 minutes longer than the others. Ideally, we need to find a way of measuring (and persuading BOINC to record) the length of time a GPU is actively working on a task. ID: 862792 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 862793 - Posted: 6 Feb 2009, 18:51:56 UTC - in response to Message 862783. I'm not so sure that wallclock time is acurate. I'm showing one I did on CPU and it is showing a wallclock time of 10515.8 but a CPU time of 9269.751. Another one showed 9785.6W/C and 9349.06 CPU. This is the 10515.8 one http://setiathome.berkeley.edu/result.php?resultid=1150583601 So what? What you tried to illustrate with that example? Wall clock time greater than CPU time, all OK. IF CPU time would be greater (w/o task restart) - that would be strange! Actually posted in stderr wall clock time less than "actual" wall clock time for 2-3 seconds. Some time need for runtime initialization before time begins to be counted. BTW, it's good indicator of OS overhead - the difference between wallclock time and CPU time (if cores not overloaded). ID: 862793 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 862797 - Posted: 6 Feb 2009, 18:59:38 UTC - in response to Message 862792. Ideally, we need to find a way of measuring (and persuading BOINC to record) the length of time a GPU is actively working on a task. IMHO good approach for this would be to take only lowest wallclock time for given host at given AR. ID: 862797 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 862817 - Posted: 6 Feb 2009, 19:33:22 UTC - in response to Message 862793. What I was trying to show is I noticed the difference in the two times even though that WU was run on my CPU not my graphics card. I was wondering why there was 1000+ seconds difference. PROUD MEMBER OF Team Starfire World BOINC ID: 862817 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 862823 - Posted: 6 Feb 2009, 19:42:17 UTC - in response to Message 862817. Last modified: 6 Feb 2009, 19:52:16 UTC What I was trying to show is I noticed the difference in the two times even though that WU was run on my CPU not my graphics card. I was wondering why there was 1000+ seconds difference. I hope I enlighted this topic more already. But for example try to figure out what wall clock time and what CPU time will be if you will run CPU app and (for example) watch video on single core CPU ;) (ADDON: the answer is wallclock time will be bigger, the more bigger the less powerful that single core CPU is, and wallclock time - CPU time difference will be especially high if you watch high-def video :) ) ID: 862823 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 862842 - Posted: 6 Feb 2009, 20:27:29 UTC Last modified: 6 Feb 2009, 20:35:02 UTC My PC does not enter in hibernation. Ever. So the wall time shows exactly how long took GPU to process a certain unit PLUS the CPU time for feeding that GPU. Difference between Wall clock time and CPU time shall give exactly the GPU time. That's it if I don't do anything else with the GPU in that time (not using the PC at night time). And for now, looks like one core of a C2D running at 2.8-3GHz is equal with my GF9500GT - using all the instructions available in that core (SSE3 or higher). If NV did compare GPU performance with a CPU that doesn't use all the instructions it have, it's just silly... it's like saying I can beat up Mike Tyson if he has the hands and legs tied up. I guess rather that they compared the CPU times in both cases, ignoring the GPU actual time... Don't get me wrong, it it nice to have another processor, but let's keep it real, it is not 10 times faster! ID: 862842 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 862846 - Posted: 6 Feb 2009, 20:35:02 UTC - in response to Message 862842. Anyway thread not about is nVidia told truth or not ;) Info provided in this thread can be used for chosing the best performance/cost solutions for example. If you see that GPU with price more than 2 time higher than some another GPU will give only (for example) 20% speed increase... well, maybe it's worth to buy 2 cheaper GPUs... and so on. ID: 862846 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 21453 Credit: 7,508,002 RAC: 20	Message 862848 - Posted: 6 Feb 2009, 20:35:34 UTC - in response to Message 862787. Last modified: 6 Feb 2009, 20:42:05 UTC ... and another 8600GT from Martin (ML1) - no host ID, I'm afraid. That's on host 4606186. ... Linux user (running SETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r), so that's a first for this charting sequence. So it's particularly useful to have Brodo's little point plumb in the middle of Martin's trendline: on the limited evidence so far, the GPU speeds of the Linux and Windows apps are the same. Just need to get the Linux CPU usage down a bit, so the CPU can do something useful at the same time. So... Does that suggest the the CPU is doing a busy-wait for Linux? Thanks for the graphing and a very interesting comparison there. For anyone interested and trying the Linux app on Linux/*nix, I've got the scraper scripted to give a nice csv output. Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 862848 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 862864 - Posted: 6 Feb 2009, 21:50:30 UTC If the OS is doing its job correctly it will account only for those time periods when the task is actively running on the CPU. when the task is suspended or sleeping it will not, or should not, accumulate CPU time. The accounting can be off absolute reality if the task suspends within a time slice and the accounting is on a per time slice basis ... In only large computer days this was of huge importance because this drove the billing for time. Not sure how accurate MS and others are now these days when mostly the time accounting is for determining priorites for scheduling where the accuracy of the accounting is less important ... ID: 862864 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 862915 - Posted: 7 Feb 2009, 0:09:57 UTC Last modified: 7 Feb 2009, 0:32:59 UTC I did run the script and I have send the results as 2 PM's... I missed the post that was saying the email address. I will send you more zipped as you wanted, sorry for the long PM. I am using Raistmer apps, 3CPU's. The host is: http://setiathome.berkeley.edu/show_host_detail.php?hostid=4777950 ID: 862915 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 863054 - Posted: 7 Feb 2009, 8:12:29 UTC - in response to Message 862848. ... So... Does that suggest the the CPU is doing a busy-wait for Linux? ... Martin Of course it suggests that. For standalone testing of the stock builds there's a -poll command line argument when you want to get the fastest possible GPU crunch. Otherwise it uses some kind of interrupt scheme. I suggest asking Crunch3r directly. Joe ID: 863054 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 863155 - Posted: 7 Feb 2009, 14:52:27 UTC - in response to Message 863054. Last modified: 7 Feb 2009, 14:55:12 UTC Well, OK. Here test results (PG0444 was used ): S:\Chii_bench\1>.\AppTimes.exe .\MB_6.08_mod_CPU_team_CUDA.exe 145.018 secs Elapsed 36.769 secs CPU time S:\Chii_bench\1>.\AppTimes.exe .\MB_6.08_mod_CPU_team_CUDA.exe -poll 141.164 secs Elapsed 138.576 secs CPU time How do you think, is such decrease in elapsed time worth such CPU consumption increase ? I think not, at least for my config. It can be useful indeed if you find some really slow CPU plugged in motherboard with PCI-Express slot equipped top-end GPU. Maybe then you will need to do poll indeed.... Interesting, what results will recive guys with top GPUs ? My own is 9600GSO only... ID: 863155 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 863169 - Posted: 7 Feb 2009, 15:56:36 UTC - in response to Message 863155. Last modified: 7 Feb 2009, 15:57:37 UTC How do you think, is such decrease in elapsed time worth such CPU consumption increase ? I think not, at least for my config. Is at least that scheme impoving the responsivness? I mean can you play a 3D game without suspending the CUDA units? I would give away some CPU time for that... BTW, my system become more responsive in last 2 days that whan I started using your CUDA apps... Don't know if it is a flucke or the SO "learned" something about that app :)? ID: 863169 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 863171 - Posted: 7 Feb 2009, 15:59:57 UTC - in response to Message 863169. Never tried. Don't think that increase load on GPU and CPU can make things more responsible though.... Now CPU used more and GPU used slightly more too - why it should perform more smooth for another consumer in this situation?... ID: 863171 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 863173 - Posted: 7 Feb 2009, 16:04:23 UTC - in response to Message 863171. I was hoping that pooling is a way to get control from the GPU more often, so when other apps need the GPU, have more chances to get it's "attention". But I DON'T know :) A 3DMark or a Passmark run should tell... ID: 863173 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 21453 Credit: 7,508,002 RAC: 20	Message 863180 - Posted: 7 Feb 2009, 16:32:42 UTC - in response to Message 863054. ... So... Does that suggest the the CPU is doing a busy-wait for Linux? ... Of course it suggests that. For standalone testing of the stock builds there's a -poll command line argument when you want to get the fastest possible GPU crunch. Otherwise it uses some kind of interrupt scheme... Good answer and a good test from Raistmer, thanks. No indication as to where the "-poll" is included. A compile/build option? A quick hexdump suggests that it is a debug build... OK, it stays on low priority for the time being. PM -> Crunch3r unless he's watching. Very good first test. It certainly works! Can the interrupt version be tested next please? ;-) Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 863180 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 21453 Credit: 7,508,002 RAC: 20	Message 863182 - Posted: 7 Feb 2009, 16:39:31 UTC - in response to Message 863169. How do you think, is such decrease in elapsed time worth such CPU consumption increase ? I think not, at least for my config. Is at least that scheme impoving the responsivness? I very much doubt that. Polling is very wasteful of all resources for most (almost all) cases. Interrupts and better scheduling will improve responsiveness. For example, there are three versions of the Linux kernel whereby there is a 1ms 'tick' (1kHz sample) for interactive work, a slower 2ms(?) tick (500Hz sample) for servers and a yet slower tick for a (high throughput but unresponsive) batch system. More generally, poor response usually means you need more RAM, faster disks, or a new higher performance system. Or you're running dog-slow software! Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 863182 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 863210 - Posted: 7 Feb 2009, 18:00:44 UTC - in response to Message 863180. Last modified: 7 Feb 2009, 18:03:02 UTC Very good first test. It certainly works! Can the interrupt version be tested next please? ;-) Happy crunchin', Martin Either I don't understand you, or you didn't understand my results. First test was "interrupt" version, i.e. my Windows build running w/o any command line switches. Second was the same windows build running with -poll switch (in regime that embedded in Linux version by default it seems). You can see how CPU usage increase in this mode of work and how diminishing GPU speed improvement is. What "interrupt" version you told about? ID: 863210 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 21453 Credit: 7,508,002 RAC: 20	Message 863216 - Posted: 7 Feb 2009, 18:11:19 UTC - in response to Message 863210. First test was "interrupt" version, i.e. my Windows build running w/o any command line switches. Second was the same windows build running with -poll switch (in regime that embedded in Linux version by default it seems). You can see how CPU usage increase in this mode of work and how diminishing GPU speed improvement is. What "interrupt" version you told about? A mix of translations...? I see for the Linux version what appears to be near continuous 100% CPU which suggests "-poll" is being used somewhere. Hence, is there a Linux version available or possible that uses non-poll-ing and so avoids the busy wait waste? Is Crunch3r the only one to have managed to make a Linux build? Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 863216 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 21453 Credit: 7,508,002 RAC: 20	Message 863217 - Posted: 7 Feb 2009, 18:13:55 UTC - in response to Message 863210. Very good first test. It certainly works! Can the interrupt version be tested next please? ;-) Either I don't understand you, or you didn't understand my results... The confusion there is due to my brevity losing context... The "test" I was meaning was the test I'm running with Crunch3r's Linux compile. Your test is a different "test" in this thread to show poll vs non-poll. Thanks for the confirmation there with your test! Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 863217 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.