GPUcrunch-rig,my experiences with 2*GTX260 &CUDA

Message boards : Number crunching : GPUcrunch-rig,my experiences with 2*GTX260 &CUDA
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 863943 - Posted: 10 Feb 2009, 0:46:40 UTC
Last modified: 10 Feb 2009, 1:32:42 UTC

I would like to share my experiences.. :-)


With AMD Quad idle (Phenom II X4 940 BE @ 4 x 3.0 GHz) and two overclocked GTX260 Core216 55nm.
[Only GPU crunching rig]


I used all stock and had 2 x 0.19* CPUs, 1 CUDA in the BOINC messages.
* If I remember correct it was at the beginning 0.18, or I had something in my eyes? ;-) Or it's possible that with all stock this can vary?


So the 0.04 CPUs are not on every system.


Then I took Raistmer's V7 mod and had then 2 x 0.04 CPUs, 1 CUDA.


Then I deleted this entries in app_info.xml:
<plan_class>cuda</plan_class>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>

Since then I have 2 x 1.00 CPUs, 1 CUDA.



BOINC is little bit 'different'.. so CPUs mean mostly Core of CPU.
So what mean this 0.04, 0.19 or 1.00 CPUs?

0.04 mean 4 % of the Core or the whole CPU?



If I crunch only on GPU one Core have ~ 30 % usage - with ups and downs - the whole time.
So CPU -> 2 Core per 30 % and 2 Core very little bit ups and downs, GPUs 100 %. [Taskmanager: ~ 7 % CPU for every app]
Sometimes higher usage of all/other Cores, because of BOINC-work.. upload or others.. [sometimes 25 % CPU usage for only boinc.exe]

So it's not really true, that if you do only GPU crunching the CPU idle.. ~ 7 % CPU usage for every GPU on my system.
So complete ~ 14 % CPU and if I add two more GPUs then ~ 28 % CPU usage for only 4 x GPU crunching.

----------------------------------------------------------------------------
I hope the SETI@home-devs will not reduce the CPU support, like the GPUGrid-people done it.
Then the performance of the GPU will be less also..
I think (it's right for every rig) the GPU have more performance than one Core of the CPU.
So reducing the CPU support for GPU crunching would be results in less performance of the whole rig.
----------------------------------------------------------------------------

At the beginning of every WU, the first ~ 25 sec. I see a jumping 100 % usage over some Cores or splitted over some/all Cores. [Taskmanager: 25 % CPU]
From #0 to #2 to #1 and so on.. not fixed on only one Core.
This will not favor Cache-miss and reduce the performance?
Maybe info for the BOINC-devs? [CPU-affinity]

I had a cc_config also with 6 CPUs.


With no cc_config I had ~ 5 % CPU for every GPU. ~ 2 less than with 6 Cores.
Or it was only because of reboot?

And the other (idle) 2 Core were also less loaded..


All the time I have ~ 95 KB RAM (mobo) per one GPU crunching.


If I have a bunch of same ARs in work.. then the remaining time show the real WU wall clock crunch time..
I have 8 min. for I guess AR=0.44x - WUs. [uploaded but current not reported]


At UTC ~ 03:15 I (or BOINC automatically) will make the report and will see how much WUs/day my nice rig had done.. :-D
[In ~ 2.5 hours]
If someone will know it how much it was.. will post it later here (tomorrow).. ;-D


Ahh.. and.. it's well that I didn't enabled RRI.. because if every WU is after 8 min. finished.. so theoretically every 4 min. I have an upload..
If** with 4 GPUs I would have every 2 min. an upload.. with RRI enabled I would have a 24/7 directly connection to the Berkeley server.. :-D

** maybe soon ;-D

Power consumption of the whole rig:
CPU idle (without power saving mode) and GPUs idle (I think with power saving mode, because less MHz): ~ 150 W
Only full GPU load: ~ 350 W
So for every GTX260 Core216 55nm GPU: ~ 100 W
They are OCed GPUs (so little bit more W).. more infos on my profile here..


Thanks a lot for reading my looong post..! :-D
And I hope someone could answer my 'hidden' questions here in this post.. ;-D



EDIT:
Please don't misread the parts with CPU und Core usage..
This is different.. ;-)

EDIT #2:
After one day online.. and the validator server didn't worked I had ~ 20,000 pendings..
If I subtract the overclaim (- 26 % ?)*** then I have a RAC of ~ 14,800 for 2 x GTX260 Core216 55nm (OC Edition)

*** I have nearly only 57.x Credit-WUs.. this are really only 42.x WUs.. [AR=0.44x]
ID: 863943 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 863991 - Posted: 10 Feb 2009, 3:46:49 UTC
Last modified: 10 Feb 2009, 4:14:58 UTC

My rig reported automatically after 24 hours (2nd day online) 370 WUs.


My pendings* after two days are now ~ 43,000. [2 GPUs]

With subtract the overclaim of 26 % it would be ~ 31,820. [2 GPUs]

So ~ 15,910 Credits/day. [2 GPUs]

So one of my OCed GTX260 Core216: ~ 7,955 Credits/day.
[I guess all WUs were AR=0.44x]

Little bit far from the ~ 10,000 at GPUGrid.. :-(


Hmm.. so maybe changing of the credit system.
Or more optimization into the SETI@home app..
:-)


Or with a 'normal' mix of SETI@home ARs I would have ~ 10,000 ?


* It's a well indicator, because the validator didn't work the last two days.. ;-D


EDIT:
I found a mistake in the first post..
Every CUDA-app get ~ 95 MB from the mobo-RAM..
ID: 863991 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 863999 - Posted: 10 Feb 2009, 4:23:10 UTC - in response to Message 863991.  

Forecasting eventual RAC from the current work distribution probably has at least 10% uncertainty. But if GPUGRID is only ~20% higher than S@H Enhanced CUDA it seems likely that improvements will tend to close the gap.

The slowness at VLAR seems related to the length of arrays used for finding pulses, those at 32K, 16K, and 8K. The longest at 0.44 AR is about 14K and may share that slowness to some extent, the AR has to be above 0.8 before there are no pulse finding arrays of 8K or longer. So if an improvement is found for the VLAR case it may well improve midrange ARs too.
                                                               Joe
ID: 863999 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 864006 - Posted: 10 Feb 2009, 4:49:14 UTC - in response to Message 863999.  
Last modified: 10 Feb 2009, 4:53:27 UTC

...
But if GPUGRID is only ~20% higher than S@H Enhanced CUDA it seems likely that improvements will tend to close the gap.
...


Note:

I got this info (~ 10,000 RAC for one GTX260/280 at GPUGrid) from other people..
I'm not a GPUGrid member.

I'm only a SETI@home 'crazy' fan.. ;-D
ID: 864006 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 864837 - Posted: 12 Feb 2009, 23:57:54 UTC


Is there a rig out there with 8 slot-openings?


To support 4 double slot GPUs.

With maybe well airflow for the GPUs..


My current have only 7 and to now I didn't found a rig with 8..


Thanks a lot!

ID: 864837 · Report as offensive
Mike Davis
Volunteer tester

Send message
Joined: 17 May 99
Posts: 240
Credit: 5,402,361
RAC: 0
Isle of Man
Message 864854 - Posted: 13 Feb 2009, 0:41:08 UTC

Sure...

Lian-Li PC-V2110B Aluminium Super Ful Tower - Black (No PSU)
Click to Enlarge

Click image to enlarge


£233.44 inc VAT
£202.99 ex VAT

2 in stock

Stock Code: CA-133-LL
Average rating of 4.0
Bookmark with Del.icio.us Digg This! Post to Reddit Share on Facebook Post to StumbleUpon Post to Kaboodle Bookmark with Yahoo Bookmark with Google

* Product Description
* Reviews

Simple and Stylish, the Lian-Li V2110 offers large capacity, excellent cooling and solid construction. Lian-Li put a lot of effect into the external finishing, all the external parts are finished in hair-line brush anodized aluminum with no sharp edges! All new V series, with new style, new structure, and better quality! Huge internal space fits E-ATX motherboard, and graphics card which up to 395mm long, and there are room for 8 hard drivers, and a lot of internal space for liquid cooling system.It is an ideal for gamer and pro user!

- Case Type Super Full Tower
- Body Material Aluminum
- 5.25" drive bay (External) 7
- 3.5" drive bay (External)
- 3.5" drive bay (Internal) 8
- Expansion Slot 8
- Motherboard E-ATX, ATX, M-ATX
- System Fan (Front) 14cm Ball Bearing Fan x 1 (800~980~1180 RPM)Factory Setting to Mid speed: 980RPM
- System Fan (Rear) 12cm Ball Bearing Fan x 2 (1020~1240~1500 RPM)Factory Setting to Mid speed: 1240RPM
- USB2.0 x 4
- IEEE1394x1
- E-SATA x 1
- AC97+HD Audio
ID: 864854 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 864886 - Posted: 13 Feb 2009, 2:22:33 UTC - in response to Message 864837.  


Is there a rig out there with 8 slot-openings?


To support 4 double slot GPUs.

With maybe well airflow for the GPUs..


My current have only 7 and to now I didn't found a rig with 8..


Thanks a lot!


No. The ATX standard only allows 7 expansion slots on the back. Therefore any motherboard that supports the ATX standard can only have 7 slots on the motherboard (not counting shared slots, which still make it a total of 7 usable slots). MicroATX (mATX) can only have a maximum of 4 expansion slots.
ID: 864886 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 864887 - Posted: 13 Feb 2009, 2:25:35 UTC - in response to Message 864854.  

Sure...

Lian-Li PC-V2110B Aluminium Super Ful Tower - Black (No PSU)


Even though that case is out of the ATX spec, it'll be tough to find a motherboard that can actually have 8 usable expansion slots. Any manufacturer that sells an 8 expansion slot motherboard would be out of the ATX spec as well, meaning they wouldn't be selling too many boards unless they seek out cases like this, which aren't too popular.
ID: 864887 · Report as offensive
Profile Grebuloner
Avatar

Send message
Joined: 4 Apr 05
Posts: 19
Credit: 20,588,464
RAC: 0
United States
Message 864892 - Posted: 13 Feb 2009, 2:50:47 UTC - in response to Message 864854.  
Last modified: 13 Feb 2009, 2:53:26 UTC

You could also go for a Thermaltake Armor+ (VH6000BWS), much cheaper than the Lian Li (at least here in the US). It's got 10 slots, and a large side fan blowing directly on top of the expansion slot area that would help with the airflow of 4 gpus crammed in there.

@OzzFan:

I think he's looking to put a GTX260 on the bottom slot of his motherboard which would cause it to overhang beyond the standard 7th slot cover.
Eating more cheese on Thursdays.
ID: 864892 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 864943 - Posted: 13 Feb 2009, 5:46:22 UTC - in response to Message 864892.  

@OzzFan:

I think he's looking to put a GTX260 on the bottom slot of his motherboard which would cause it to overhang beyond the standard 7th slot cover.


Wouldn't that assume there's an available PCI Express slot at the end of the motherboard that is capable of supporting a graphics card (PCIe x16)? Or even 4 PCIe x16 slots spaced properly apart to do what he's asking?
ID: 864943 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 864995 - Posted: 13 Feb 2009, 11:11:14 UTC

So I went to the store to buy an opteron server for the new Tesla, but alas I have returned with a q6600 in a ev3a mobo with 3 slots. figured I could stick with only a 1k power supply with three. So right now I got a 260 the c1080 and the gtx in there. It sweet, having to reinstall windows right now but will have her back tonight.

I'll like make a utube vid with all the processes running. It does 8 workunits at a time and made about 3k rac in a little under two hours with the cpu idle.. This is going to0 be really cool. Problem is until the teamwork app can use more than one gpu i'm going to be barefoot (stock apps). I might have a go at an appinfo for v8+MB Cuda x3

Anyway this should get intersting after some validaation if she runs fine going to swap the 200s for 3x 1060 Tesla.

The Beast

Top hosts list here I come. Never had a top100 machine in my life. Aloha eveyone thank you all for your efforts.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 864995 · Report as offensive
Profile Grebuloner
Avatar

Send message
Joined: 4 Apr 05
Posts: 19
Credit: 20,588,464
RAC: 0
United States
Message 865015 - Posted: 13 Feb 2009, 13:21:26 UTC - in response to Message 864943.  

Here

When he built his machine he got a 790FX mobo (MSI K9A2) that has 4 PCIe x16 slots on it evenly spaced, so there is one on the bottom rung. It may only be x8 width, but at 2.0 speeds, it's not a bottleneck.

Eating more cheese on Thursdays.
ID: 865015 · Report as offensive
Profile Woyteck - Boinc Busters Poland
Avatar

Send message
Joined: 3 Jun 99
Posts: 49
Credit: 3,203,845
RAC: 0
Poland
Message 865349 - Posted: 14 Feb 2009, 12:41:57 UTC

What we really need is a motherboard with two 32-lane PCI Express switching chips.
Only then there can be all four x16 slots fully utilized.

I don't think, such monster has been produced yet. :(
--
Get up, stand up! Don\'t give up the fight!
Credits will make everybody feel high! ;-)
ID: 865349 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 865444 - Posted: 14 Feb 2009, 19:41:20 UTC


The current GPUs have/need not more bandwidth than PCIe 1.0 x16 if crunching.

If the PCIe 2.0 slots are all with x8 it's well.. then you have PCIe 1.0 x16 connection..


OTOH.
You need also enough space between the PCIe slots for double slot GPUs.. ;-)


To my knowledge the MSI K9A2 Platinum is the only mobo with this properties.

4* PCIe 2.0 x8 and enough space for 4* double GPUs.. :-D


But the other prob is, that you need a case with enough space on the bottom..
And 8 slot-openings..

Maybe I will modify my current rig..

Or there are more rigs out there with 8 slot-openings?

ID: 865444 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 865453 - Posted: 14 Feb 2009, 19:56:58 UTC - in response to Message 865444.  
Last modified: 14 Feb 2009, 19:58:45 UTC


But the other prob is, that you need a case with enough space on the bottom..
And 8 slot-openings..

Maybe I will modify my current rig..

Or there are more rigs out there with 8 slot-openings?


There are some server cases that will have 8 ATX slots (or more), something like this.
ID: 865453 · Report as offensive

Message boards : Number crunching : GPUcrunch-rig,my experiences with 2*GTX260 &CUDA


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.