CUDA cards: SETI crunching speeds

Message boards : Number crunching : CUDA cards: SETI crunching speeds
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14683
Credit: 200,643,578
RAC: 874
United Kingdom
Message 862792 - Posted: 6 Feb 2009, 18:46:07 UTC - in response to Message 862783.  

I'm not so sure that wallclock time is acurate....

I would expect that the wallclock time would be accurate, but it isn't necessarily what we want to measure.

A user at Einstein yesterday helped us pin down an elusive BOINC bug to do with DCF: wallclock time continues to be counted when a computer is in hibernation. And this afternoon, I had BOINC suspended while I did some fiddly editing in client_state.xml - according to the script, one of my VLARs (off-chart) took 45 minutes longer than the others.

Ideally, we need to find a way of measuring (and persuading BOINC to record) the length of time a GPU is actively working on a task.
ID: 862792 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 862793 - Posted: 6 Feb 2009, 18:51:56 UTC - in response to Message 862783.  

I'm not so sure that wallclock time is acurate. I'm showing one I did on CPU and it is showing a wallclock time of 10515.8 but a CPU time of 9269.751. Another one showed 9785.6W/C and 9349.06 CPU.



This is the 10515.8 one http://setiathome.berkeley.edu/result.php?resultid=1150583601


So what? What you tried to illustrate with that example? Wall clock time greater than CPU time, all OK. IF CPU time would be greater (w/o task restart) - that would be strange!
Actually posted in stderr wall clock time less than "actual" wall clock time for 2-3 seconds. Some time need for runtime initialization before time begins to be counted.

BTW, it's good indicator of OS overhead - the difference between wallclock time and CPU time (if cores not overloaded).
ID: 862793 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 862797 - Posted: 6 Feb 2009, 18:59:38 UTC - in response to Message 862792.  


Ideally, we need to find a way of measuring (and persuading BOINC to record) the length of time a GPU is actively working on a task.


IMHO good approach for this would be to take only lowest wallclock time for given host at given AR.
ID: 862797 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 862817 - Posted: 6 Feb 2009, 19:33:22 UTC - in response to Message 862793.  

What I was trying to show is I noticed the difference in the two times even though that WU was run on my CPU not my graphics card. I was wondering why there was 1000+ seconds difference.


PROUD MEMBER OF Team Starfire World BOINC
ID: 862817 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 862823 - Posted: 6 Feb 2009, 19:42:17 UTC - in response to Message 862817.  
Last modified: 6 Feb 2009, 19:52:16 UTC

What I was trying to show is I noticed the difference in the two times even though that WU was run on my CPU not my graphics card. I was wondering why there was 1000+ seconds difference.

I hope I enlighted this topic more already.
But for example try to figure out what wall clock time and what CPU time will be if you will run CPU app and (for example) watch video on single core CPU ;)

(ADDON: the answer is wallclock time will be bigger, the more bigger the less powerful that single core CPU is, and wallclock time - CPU time difference will be especially high if you watch high-def video :) )
ID: 862823 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 862842 - Posted: 6 Feb 2009, 20:27:29 UTC
Last modified: 6 Feb 2009, 20:35:02 UTC

My PC does not enter in hibernation. Ever.
So the wall time shows exactly how long took GPU to process a certain unit PLUS the CPU time for feeding that GPU. Difference between Wall clock time and CPU time shall give exactly the GPU time.
That's it if I don't do anything else with the GPU in that time (not using the PC at night time).

And for now, looks like one core of a C2D running at 2.8-3GHz is equal with my GF9500GT - using all the instructions available in that core (SSE3 or higher).
If NV did compare GPU performance with a CPU that doesn't use all the instructions it have, it's just silly... it's like saying I can beat up Mike Tyson if he has the hands and legs tied up.

I guess rather that they compared the CPU times in both cases, ignoring the GPU actual time...
Don't get me wrong, it it nice to have another processor, but let's keep it real, it is not 10 times faster!
ID: 862842 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 862846 - Posted: 6 Feb 2009, 20:35:02 UTC - in response to Message 862842.  

Anyway thread not about is nVidia told truth or not ;)
Info provided in this thread can be used for chosing the best performance/cost solutions for example.
If you see that GPU with price more than 2 time higher than some another GPU will give only (for example) 20% speed increase... well, maybe it's worth to buy 2 cheaper GPUs... and so on.
ID: 862846 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21453
Credit: 7,508,002
RAC: 20
United Kingdom
Message 862848 - Posted: 6 Feb 2009, 20:35:34 UTC - in response to Message 862787.  
Last modified: 6 Feb 2009, 20:42:05 UTC

... and another 8600GT from Martin (ML1) - no host ID, I'm afraid.

That's on host 4606186.

... Linux user (running SETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r), so that's a first for this charting sequence. So it's particularly useful to have Brodo's little point plumb in the middle of Martin's trendline: on the limited evidence so far, the GPU speeds of the Linux and Windows apps are the same. Just need to get the Linux CPU usage down a bit, so the CPU can do something useful at the same time.

So... Does that suggest the the CPU is doing a busy-wait for Linux?

Thanks for the graphing and a very interesting comparison there.

For anyone interested and trying the Linux app on Linux/*nix, I've got the scraper scripted to give a nice csv output.

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 862848 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 862864 - Posted: 6 Feb 2009, 21:50:30 UTC

If the OS is doing its job correctly it will account only for those time periods when the task is actively running on the CPU. when the task is suspended or sleeping it will not, or should not, accumulate CPU time. The accounting can be off absolute reality if the task suspends within a time slice and the accounting is on a per time slice basis ...

In only large computer days this was of huge importance because this drove the billing for time. Not sure how accurate MS and others are now these days when mostly the time accounting is for determining priorites for scheduling where the accuracy of the accounting is less important ...
ID: 862864 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 862915 - Posted: 7 Feb 2009, 0:09:57 UTC
Last modified: 7 Feb 2009, 0:32:59 UTC

I did run the script and I have send the results as 2 PM's... I missed the post that was saying the email address.
I will send you more zipped as you wanted, sorry for the long PM.
I am using Raistmer apps, 3CPU's. The host is:

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4777950
ID: 862915 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 863054 - Posted: 7 Feb 2009, 8:12:29 UTC - in response to Message 862848.  

...
So... Does that suggest the the CPU is doing a busy-wait for Linux?
...
Martin

Of course it suggests that. For standalone testing of the stock builds there's a -poll command line argument when you want to get the fastest possible GPU crunch. Otherwise it uses some kind of interrupt scheme. I suggest asking Crunch3r directly.
                                                          Joe
ID: 863054 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 863155 - Posted: 7 Feb 2009, 14:52:27 UTC - in response to Message 863054.  
Last modified: 7 Feb 2009, 14:55:12 UTC

Well, OK. Here test results (PG0444 was used ):

S:\Chii_bench\1>.\AppTimes.exe .\MB_6.08_mod_CPU_team_CUDA.exe
145.018 secs Elapsed
36.769 secs CPU time

S:\Chii_bench\1>.\AppTimes.exe .\MB_6.08_mod_CPU_team_CUDA.exe -poll
141.164 secs Elapsed
138.576 secs CPU time

How do you think, is such decrease in elapsed time worth such CPU consumption increase ?
I think not, at least for my config.
It can be useful indeed if you find some really slow CPU plugged in motherboard with PCI-Express slot equipped top-end GPU. Maybe then you will need to do poll indeed....

Interesting, what results will recive guys with top GPUs ? My own is 9600GSO only...
ID: 863155 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 863169 - Posted: 7 Feb 2009, 15:56:36 UTC - in response to Message 863155.  
Last modified: 7 Feb 2009, 15:57:37 UTC


How do you think, is such decrease in elapsed time worth such CPU consumption increase ?
I think not, at least for my config.


Is at least that scheme impoving the responsivness? I mean can you play a 3D game without suspending the CUDA units? I would give away some CPU time for that...
BTW, my system become more responsive in last 2 days that whan I started using your CUDA apps... Don't know if it is a flucke or the SO "learned" something about that app :)?
ID: 863169 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 863171 - Posted: 7 Feb 2009, 15:59:57 UTC - in response to Message 863169.  

Never tried.
Don't think that increase load on GPU and CPU can make things more responsible though.... Now CPU used more and GPU used slightly more too - why it should perform more smooth for another consumer in this situation?...
ID: 863171 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 863173 - Posted: 7 Feb 2009, 16:04:23 UTC - in response to Message 863171.  

I was hoping that pooling is a way to get control from the GPU more often, so when other apps need the GPU, have more chances to get it's "attention". But I DON'T know :)
A 3DMark or a Passmark run should tell...
ID: 863173 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21453
Credit: 7,508,002
RAC: 20
United Kingdom
Message 863180 - Posted: 7 Feb 2009, 16:32:42 UTC - in response to Message 863054.  

...
So... Does that suggest the the CPU is doing a busy-wait for Linux?
...

Of course it suggests that. For standalone testing of the stock builds there's a -poll command line argument when you want to get the fastest possible GPU crunch. Otherwise it uses some kind of interrupt scheme...

Good answer and a good test from Raistmer, thanks.

No indication as to where the "-poll" is included. A compile/build option?

A quick hexdump suggests that it is a debug build...

OK, it stays on low priority for the time being. PM -> Crunch3r unless he's watching.

Very good first test. It certainly works! Can the interrupt version be tested next please? ;-)

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 863180 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21453
Credit: 7,508,002
RAC: 20
United Kingdom
Message 863182 - Posted: 7 Feb 2009, 16:39:31 UTC - in response to Message 863169.  

How do you think, is such decrease in elapsed time worth such CPU consumption increase ?
I think not, at least for my config.

Is at least that scheme impoving the responsivness?

I very much doubt that.

Polling is very wasteful of all resources for most (almost all) cases.

Interrupts and better scheduling will improve responsiveness. For example, there are three versions of the Linux kernel whereby there is a 1ms 'tick' (1kHz sample) for interactive work, a slower 2ms(?) tick (500Hz sample) for servers and a yet slower tick for a (high throughput but unresponsive) batch system.


More generally, poor response usually means you need more RAM, faster disks, or a new higher performance system. Or you're running dog-slow software!

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 863182 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 863210 - Posted: 7 Feb 2009, 18:00:44 UTC - in response to Message 863180.  
Last modified: 7 Feb 2009, 18:03:02 UTC


Very good first test. It certainly works! Can the interrupt version be tested next please? ;-)

Happy crunchin',
Martin


Either I don't understand you, or you didn't understand my results.

First test was "interrupt" version, i.e. my Windows build running w/o any command line switches. Second was the same windows build running with -poll switch (in regime that embedded in Linux version by default it seems). You can see how CPU usage increase in this mode of work and how diminishing GPU speed improvement is.
What "interrupt" version you told about?
ID: 863210 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21453
Credit: 7,508,002
RAC: 20
United Kingdom
Message 863216 - Posted: 7 Feb 2009, 18:11:19 UTC - in response to Message 863210.  

First test was "interrupt" version, i.e. my Windows build running w/o any command line switches. Second was the same windows build running with -poll switch (in regime that embedded in Linux version by default it seems). You can see how CPU usage increase in this mode of work and how diminishing GPU speed improvement is.

What "interrupt" version you told about?

A mix of translations...?

I see for the Linux version what appears to be near continuous 100% CPU which suggests "-poll" is being used somewhere.

Hence, is there a Linux version available or possible that uses non-poll-ing and so avoids the busy wait waste?

Is Crunch3r the only one to have managed to make a Linux build?

Cheers,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 863216 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21453
Credit: 7,508,002
RAC: 20
United Kingdom
Message 863217 - Posted: 7 Feb 2009, 18:13:55 UTC - in response to Message 863210.  

Very good first test. It certainly works! Can the interrupt version be tested next please? ;-)

Either I don't understand you, or you didn't understand my results...

The confusion there is due to my brevity losing context...

The "test" I was meaning was the test I'm running with Crunch3r's Linux compile.

Your test is a different "test" in this thread to show poll vs non-poll.

Thanks for the confirmation there with your test!

Cheers,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 863217 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

Message boards : Number crunching : CUDA cards: SETI crunching speeds


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.