CUDA cards: SETI crunching speeds

Message boards : Number crunching : CUDA cards: SETI crunching speeds
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Profile Jack Shaftoe
Avatar

Send message
Joined: 19 Aug 04
Posts: 44
Credit: 2,343,242
RAC: 0
United States
Message 864310 - Posted: 11 Feb 2009, 12:38:39 UTC - in response to Message 864189.  

It's early days yet, and we don't have data from a full set of cards. But on this very preliminary evidence, using a very preliminary SETI application, the 2xx series cards don't seem to show a performance boost here at SETI commensurate with their pricing premium. This may change, but at the present (early) stage of CUDA development, I'm glad I opted for 9800-range cards.


Thank you Richard, cheers.
ID: 864310 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20265
Credit: 7,508,002
RAC: 20
United Kingdom
Message 864316 - Posted: 11 Feb 2009, 13:28:39 UTC - in response to Message 864199.  
Last modified: 11 Feb 2009, 13:30:09 UTC

... OK, now for the killer comparison with a couple of days at high priority...

If polling, then I'd expect little change in wall-clock time for the WUs. If the CPU is maxed out, then the wall-clock times should change proportionately. Here's hoping for a consistent mix of WUs to show something useful!

Such is my hypothesis!!

Well... With just a few noisy plots to compare, the low priority runs look to be slower and very much more variable than the higher priority runs. That suggests that the CPU is the limiting factor at least. So perhaps indeed there is no polling or at least limited polling.

I've also got a log running of CPU utilisation, and there does appear to be brief occasions where the CUDA task uses less than max CPU for it's work, dropping to as low as 90% rather than 99 - 100%.

Someone on Linux with a much more powerful CPU than my old clunker is needed to test!

Still curious as to why the AMD X2 has one core always maxed out for even quite a low spec graphics card...

Is not so much of the work handed over to the GPU?...


Still scraping data.

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 864316 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 864346 - Posted: 11 Feb 2009, 15:34:38 UTC
Last modified: 11 Feb 2009, 15:38:30 UTC

'Polling' mean the CPU is 'waiting'?
The Google-translator don't know 'polling'.. ;-)

If the CPU (one Core of it) would be all the time 100 %.. then it's no longer GPU crunching, or? ;-)

If my two GTX260 Core216 crunching they get ~ 7 % per GPU from the whole CPU.
This means ~ 28 % Core per one GPU.

At the first ~ 25 sec. of every WU the CPU support rise to 25 % CPU.
[~ 100 % Core]

AMD Phenom II X4 940 BE @ 3.0 GHz.


Or you mean all the time with higher CPU support to the GPU, the crunching speed would be faster?
ID: 864346 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20265
Credit: 7,508,002
RAC: 20
United Kingdom
Message 864397 - Posted: 11 Feb 2009, 19:46:45 UTC - in response to Message 864346.  
Last modified: 11 Feb 2009, 19:48:53 UTC

'Polling' mean the CPU is 'waiting'?
The Google-translator don't know 'polling'.. ;-)

"Polling" means the CPU continuously repeatedly checks for whether the GPU has finished. This is also called a "busy loop", in that the CPU is kept 100% busy just running a loop to check (poll) for a finish condition. This is very wasteful of the CPU.

Polling is similar to the 'do nothing' "idle loop" which is exactly what Boinc is trying to replace!

If the CPU (one Core of it) would be all the time 100 %.. then it's no longer GPU crunching, or? ;-)

Either it is 100% occupied feeding and retrieving data to/from the GPU, or it is in a "busy loop" pestering (polling) the GPU to see if the GPU is still busy or now idle.

If my two GTX260 Core216 crunching they get ~ 7 % per GPU from the whole CPU.
This means ~ 28 % Core per one GPU.

At the first ~ 25 sec. of every WU the CPU support rise to 25 % CPU.
[~ 100 % Core]

AMD Phenom II X4 940 BE @ 3.0 GHz.

OK... So my cores are only running at 2.6GHz but they are kept at 90% to 100% busy...


Or you mean all the time with higher CPU support to the GPU, the crunching speed would be faster?

That is my question.


My suspicion of polling is now lessened with more recent data. The rate of results appears to be roughly proportional to the CPU time, which suggests that the system is CPU limited. If polling was used, then there should be less of a results slowdown when the CPU time is reduced by setting a lower priority.

So... Is the Linux version less efficient than the Windows versions? Or is this a feature with using the 8600GT 256MBytes hardware?

Crunch3r has hinted that with a more powerful CPU, he sees the CPU idle whilst the GPU is kept busy. That strongly suggests there is no polling for that case.


Anyone with a top-end Intel CPUs running a 8600GT GPU on Linux for comparison?

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 864397 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 864409 - Posted: 11 Feb 2009, 20:22:56 UTC - in response to Message 864397.  
Last modified: 11 Feb 2009, 20:23:21 UTC

Very strange indeed.
I saw no speed increase from polling for my 9600 running with 2,6GHz quad.
I think your 8600 should be slower than my 9600GSO, so it should require even less CPU, not more CPU for feeding... But you feels CPU limits when I don't feel it. Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?...
(Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? )
ID: 864409 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 864428 - Posted: 11 Feb 2009, 21:39:57 UTC - in response to Message 864409.  

Very strange indeed.
I saw no speed increase from polling for my 9600 running with 2,6GHz quad.
I think your 8600 should be slower than my 9600GSO, so it should require even less CPU, not more CPU for feeding... But you feels CPU limits when I don't feel it. Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?...
(Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? )


it's actually the same on windows. If you do run an 8+1, 4+1 or 2+1 setup you'll starve the gpu. GPU crunching times on a Xeon V8 + 8800GT inceased by 30 to 45 min depending on the AR (running an 8+1 setup).

You can even watch that behavior in the boinc manager... simply stop all CPU tasks and you'll see the GPU crunching speed will increase.




Join BOINC United now!
ID: 864428 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 864443 - Posted: 11 Feb 2009, 22:59:35 UTC

I don't belive that, because the temperature of the GPU is as high as it can get either with the config set for 2 CPU or for 2+1 CPU (I have a C2D). When I exit from Crysis (BOINC stopped of course) the GPU temp it is the same as during cruncing.
I have a baseline now - 3000-3200 sec for a 42 credit unit. I will try for a day with 2 CPU so one of the cores will be dedicated feeding the GPU.
ID: 864443 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 864454 - Posted: 11 Feb 2009, 23:23:42 UTC - in response to Message 864428.  

Very strange indeed.
I saw no speed increase from polling for my 9600 running with 2,6GHz quad.
I think your 8600 should be slower than my 9600GSO, so it should require even less CPU, not more CPU for feeding... But you feels CPU limits when I don't feel it. Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?...
(Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? )


it's actually the same on windows. If you do run an 8+1, 4+1 or 2+1 setup you'll starve the gpu. GPU crunching times on a Xeon V8 + 8800GT inceased by 30 to 45 min depending on the AR (running an 8+1 setup).

You can even watch that behavior in the boinc manager... simply stop all CPU tasks and you'll see the GPU crunching speed will increase.



I did special test. Ran CUDA with no BOINC CPU tasks, 1, 2, 3, 4 - elapsed times differs VERY small. Both for tested /CUDA app and for CPU apps.
So it's not the same on Windows, at least on my own host.

ID: 864454 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 864460 - Posted: 11 Feb 2009, 23:39:20 UTC
Last modified: 11 Feb 2009, 23:40:23 UTC

I have the first unit. The temperature is the same (don't know why) but the time is at 1700 sec now. I will test some more. WinXP here.
ID: 864460 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 864621 - Posted: 12 Feb 2009, 11:34:45 UTC
Last modified: 12 Feb 2009, 11:36:51 UTC

The average speed after a few units increased just a little bit - from 3200 to 3000 sec/unit... So in my case the CPU feeds ok the GPU even in the case of CPU+1 units running. But I have only a GF 9500GT.
ID: 864621 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20265
Credit: 7,508,002
RAC: 20
United Kingdom
Message 864655 - Posted: 12 Feb 2009, 15:29:32 UTC - in response to Message 864409.  
Last modified: 12 Feb 2009, 15:32:30 UTC

Very strange indeed.

... Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?...
(Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? )

Mmmm... Still curious...

Sorry, this area is Linux & *nix only. The only Windows I have here is Win95C. I don't relish a 1 hour+ Windows install for a look-see test... Or... Err nope. Linux is already on here so I'd have to start physically swapping HDDs to avoid a Windows install blindly overwriting the Linux... :-(

To try to replicate for a test, a good bet would be to try Mandriva Linux One 2009. That's likely closest to the system here. You may need to add the "contrib" repositories to install:

dkms-nvidia-current-180.22-1mdv2009.0
nvidia-current-devel-180.22-1mdv2009.0

You can then install the Boinc ".sh" install script into your home directory, add the Crunch3r stuff, and crunch on.


Hope that helps,

Cheers,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 864655 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20265
Credit: 7,508,002
RAC: 20
United Kingdom
Message 864701 - Posted: 12 Feb 2009, 17:12:39 UTC - in response to Message 864655.  
Last modified: 12 Feb 2009, 17:13:46 UTC

Further thoughts:

You may well also need:

x11-driver-video-nvidia-current


An 'easy' way to sort that lot out is to call up XFdrake or to set to use the nVidia drivers from the graphics card setup.

Unless you have more than 256MBytes of VRAM, you'll have to log out of the desktop and drop down to a command terminal (so that you free up a maximum of VRAM). Eg: Run boinc, run the graphical boincmgr to set up seti, exit, logout, login via a terminal (login menu select, or just simply ctl-alt-F1), login,

cd BOINC
./boinc >boinc.log 2>&1 &


... And see what happens.

Use:

top

to see what the processes are doing.

Good luck!

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 864701 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 864767 - Posted: 12 Feb 2009, 20:13:54 UTC - in response to Message 864701.  

Well, it seems it's not "Live CD/DVD", more packages need to be added and so on... not ready to jump into Linux configs too.
This question will be sorted sooner or later by comparison RACs of similar hosts under Linux/ Windows...
ID: 864767 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 865001 - Posted: 13 Feb 2009, 12:17:13 UTC

Just upgraded my video card from 9400GT to a 9500GT...same machine, no other changes.

Very preliminary results (overnight):
9400GT processed AR .44xxx in ~5100 secs per WU (~30CS/hr)
9500GT processes AR .44xxx in ~3060 secs per WU (~50CS/hr) about 40% improvement.

Not much to compare since S@H only seems to be sending .44xxx WUs currently. The 9400GT DID process other ARs faster (up to ~37CS/hr at times), but I would expect better performance from the 9500GT at those ARs as well.

9400GT has 16 stream processors - 9500GT has 32. Price difference at Microcenter is about $11.00 more for the 9500GT.
ID: 865001 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 865036 - Posted: 13 Feb 2009, 14:32:52 UTC

OK, got the new host up. She has a c1070 260 and 9500gt.


As soon as we can get some work I will shoot some new data over. It will be neat doing 8 units at once. *grins*
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 865036 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 865186 - Posted: 13 Feb 2009, 22:27:10 UTC


To my knowledge..

More 'Processor Cores' and more 'Shader MHz' more crunching speed..

ID: 865186 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 865260 - Posted: 14 Feb 2009, 3:04:45 UTC
Last modified: 14 Feb 2009, 3:07:27 UTC

I found something confusing..

The same WU.

The CPU time is only ~ 2 sec. different.
O.K., it's not the real GPU crunching time.. ;-)

But so longer the CPU support so longer the GPU crunching time, or?

So in real maybe ~ 10 sec. different.

Why?
I thought the GTX 260 would be much faster.. than a 8800 GTS..
Because of Intel-/AMD- architecture?
Why the BOINC message don't say the real GFLOPS?

What you think?


WU true angle range is :  0.447869


AMD Phenom II X4 940 BE @ 3.0 GHz

GeForce GTX 260 (OC Edition)
           totalGlobalMem = 939261952 
           sharedMemPerBlock = 16384 
           regsPerBlock = 16384 
           warpSize = 32 
           memPitch = 262144 
           maxThreadsPerBlock = 512 
           clockRate = 1458000 
           totalConstMem = 65536 
           major = 1 
           minor = 3 
           textureAlignment = 256 
           deviceOverlap = 1 
           multiProcessorCount = 27 

[112 GFLOPS - message in BOINC]

804 GFLOPS (stock GPU) - read in a report

Shader: 216 - 1458 MHz

CPU time 118.0469 

With Raistmer's V7 mod

---------------------------------------------------

Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz

GeForce 8800 GTS 
           totalGlobalMem = 335216640 
           sharedMemPerBlock = 16384 
           regsPerBlock = 8192 
           warpSize = 32 
           memPitch = 262144 
           maxThreadsPerBlock = 512 
           clockRate = 1350000 
           totalConstMem = 65536 
           major = 1 
           minor = 0 
           textureAlignment = 256 
           deviceOverlap = 0 
           multiProcessorCount = 12 

518 GFLOPs - read in a report

Shader: 128 - 1350 MHz

CPU time 120.3594 

stock app

ID: 865260 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 865307 - Posted: 14 Feb 2009, 8:26:44 UTC - in response to Message 865260.  

Better try to get elapsed times for 2 WUs with same AR for comparison.
ID: 865307 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 865508 - Posted: 14 Feb 2009, 22:17:45 UTC - in response to Message 865307.  
Last modified: 14 Feb 2009, 23:15:14 UTC

Better try to get elapsed times for 2 WUs with same AR for comparison.


But, the current app(s) don't have the feature to show the real GPU crunching time.. ;-)

-----------------------------------------------------------------

Something confusing..

For a 44.x WU my rig have ~ 120 sec. CPU support. [AR=0.44x]
For a 72.x WU my rig have ~ 60 sec. CPU support.. [AR=0.14x]

And then subtract the ~ 25 sec. 100 % CPU load, before GPU crunching..

O.K., O.K., I don't have the real GPU crunching time.. but something confusing to tell.. ;-)


EDIT:
I looked to the rig and saw..
The AR=0.44x need around 8 min. - everything well.
The AR=0.14x need around 15 min. - and the rig need less wattage and the BOINC Manager is little bit slowly..
[GPU crunching times in BOINC Manager]
ID: 865508 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 865512 - Posted: 14 Feb 2009, 22:27:31 UTC - in response to Message 865508.  

Yuo will be less confused if yopu will use elapsed time instead of CPU time.
It roughly equals GPU time. And performance determined by elapsed time, not something else.
ID: 865512 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : CUDA cards: SETI crunching speeds


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.