GPUcrunch-rig,my experiences with 2*GTX260 &CUDA

Author	Message
Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 863943 - Posted: 10 Feb 2009, 0:46:40 UTC Last modified: 10 Feb 2009, 1:32:42 UTC I would like to share my experiences.. :-) With AMD Quad idle (Phenom II X4 940 BE @ 4 x 3.0 GHz) and two overclocked GTX260 Core216 55nm. [Only GPU crunching rig] I used all stock and had 2 x 0.19* CPUs, 1 CUDA in the BOINC messages. * If I remember correct it was at the beginning 0.18, or I had something in my eyes? ;-) Or it's possible that with all stock this can vary? So the 0.04 CPUs are not on every system. Then I took Raistmer's V7 mod and had then 2 x 0.04 CPUs, 1 CUDA. Then I deleted this entries in app_info.xml: <plan_class>cuda</plan_class> <avg_ncpus>0.040000</avg_ncpus> <max_ncpus>0.040000</max_ncpus> Since then I have 2 x 1.00 CPUs, 1 CUDA. BOINC is little bit 'different'.. so CPUs mean mostly Core of CPU. So what mean this 0.04, 0.19 or 1.00 CPUs? 0.04 mean 4 % of the Core or the whole CPU? If I crunch only on GPU one Core have ~ 30 % usage - with ups and downs - the whole time. So CPU -> 2 Core per 30 % and 2 Core very little bit ups and downs, GPUs 100 %. [Taskmanager: ~ 7 % CPU for every app] Sometimes higher usage of all/other Cores, because of BOINC-work.. upload or others.. [sometimes 25 % CPU usage for only boinc.exe] So it's not really true, that if you do only GPU crunching the CPU idle.. ~ 7 % CPU usage for every GPU on my system. So complete ~ 14 % CPU and if I add two more GPUs then ~ 28 % CPU usage for only 4 x GPU crunching. ---------------------------------------------------------------------------- I hope the SETI@home-devs will not reduce the CPU support, like the GPUGrid-people done it. Then the performance of the GPU will be less also.. I think (it's right for every rig) the GPU have more performance than one Core of the CPU. So reducing the CPU support for GPU crunching would be results in less performance of the whole rig. ---------------------------------------------------------------------------- At the beginning of every WU, the first ~ 25 sec. I see a jumping 100 % usage over some Cores or splitted over some/all Cores. [Taskmanager: 25 % CPU] From #0 to #2 to #1 and so on.. not fixed on only one Core. This will not favor Cache-miss and reduce the performance? Maybe info for the BOINC-devs? [CPU-affinity] I had a cc_config also with 6 CPUs. With no cc_config I had ~ 5 % CPU for every GPU. ~ 2 less than with 6 Cores. Or it was only because of reboot? And the other (idle) 2 Core were also less loaded.. All the time I have ~ 95 KB RAM (mobo) per one GPU crunching. If I have a bunch of same ARs in work.. then the remaining time show the real WU wall clock crunch time.. I have 8 min. for I guess AR=0.44x - WUs. [uploaded but current not reported] At UTC ~ 03:15 I (or BOINC automatically) will make the report and will see how much WUs/day my nice rig had done.. :-D [In ~ 2.5 hours] If someone will know it how much it was.. will post it later here (tomorrow).. ;-D Ahh.. and.. it's well that I didn't enabled RRI.. because if every WU is after 8 min. finished.. so theoretically every 4 min. I have an upload.. If with 4 GPUs I would have every 2 min. an upload.. with RRI enabled I would have a 24/7 directly connection to the Berkeley server.. :-D maybe soon ;-D Power consumption of the whole rig: CPU idle (without power saving mode) and GPUs idle (I think with power saving mode, because less MHz): ~ 150 W Only full GPU load: ~ 350 W So for every GTX260 Core216 55nm GPU: ~ 100 W They are OCed GPUs (so little bit more W).. more infos on my profile here.. Thanks a lot for reading my looong post..! :-D And I hope someone could answer my 'hidden' questions here in this post.. ;-D EDIT: Please don't misread the parts with CPU und Core usage.. This is different.. ;-) EDIT #2: After one day online.. and the validator server didn't worked I had ~ 20,000 pendings.. If I subtract the overclaim (- 26 % ?)* then I have a RAC of ~ 14,800 for 2 x GTX260 Core216 55nm (OC Edition) * I have nearly only 57.x Credit-WUs.. this are really only 42.x WUs.. [AR=0.44x] ID: 863943 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 863991 - Posted: 10 Feb 2009, 3:46:49 UTC Last modified: 10 Feb 2009, 4:14:58 UTC My rig reported automatically after 24 hours (2nd day online) 370 WUs. My pendings* after two days are now ~ 43,000. [2 GPUs] With subtract the overclaim of 26 % it would be ~ 31,820. [2 GPUs] So ~ 15,910 Credits/day. [2 GPUs] So one of my OCed GTX260 Core216: ~ 7,955 Credits/day. [I guess all WUs were AR=0.44x] Little bit far from the ~ 10,000 at GPUGrid.. :-( Hmm.. so maybe changing of the credit system. Or more optimization into the SETI@home app.. :-) Or with a 'normal' mix of SETI@home ARs I would have ~ 10,000 ? * It's a well indicator, because the validator didn't work the last two days.. ;-D EDIT: I found a mistake in the first post.. Every CUDA-app get ~ 95 MB from the mobo-RAM.. ID: 863991 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 863999 - Posted: 10 Feb 2009, 4:23:10 UTC - in response to Message 863991. Forecasting eventual RAC from the current work distribution probably has at least 10% uncertainty. But if GPUGRID is only ~20% higher than S@H Enhanced CUDA it seems likely that improvements will tend to close the gap. The slowness at VLAR seems related to the length of arrays used for finding pulses, those at 32K, 16K, and 8K. The longest at 0.44 AR is about 14K and may share that slowness to some extent, the AR has to be above 0.8 before there are no pulse finding arrays of 8K or longer. So if an improvement is found for the VLAR case it may well improve midrange ARs too. Joe ID: 863999 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 864006 - Posted: 10 Feb 2009, 4:49:14 UTC - in response to Message 863999. Last modified: 10 Feb 2009, 4:53:27 UTC ... But if GPUGRID is only ~20% higher than S@H Enhanced CUDA it seems likely that improvements will tend to close the gap. ... Note: I got this info (~ 10,000 RAC for one GTX260/280 at GPUGrid) from other people.. I'm not a GPUGrid member. I'm only a SETI@home 'crazy' fan.. ;-D ID: 864006 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 864837 - Posted: 12 Feb 2009, 23:57:54 UTC Is there a rig out there with 8 slot-openings? To support 4 double slot GPUs. With maybe well airflow for the GPUs.. My current have only 7 and to now I didn't found a rig with 8.. Thanks a lot! ID: 864837 ·

Mike Davis Volunteer tester Send message Joined: 17 May 99 Posts: 240 Credit: 5,402,361 RAC: 0	Message 864854 - Posted: 13 Feb 2009, 0:41:08 UTC Sure... Lian-Li PC-V2110B Aluminium Super Ful Tower - Black (No PSU) Click to Enlarge Click image to enlarge Ã‚Â£233.44 inc VAT Ã‚Â£202.99 ex VAT 2 in stock Stock Code: CA-133-LL Average rating of 4.0 Bookmark with Del.icio.us Digg This! Post to Reddit Share on Facebook Post to StumbleUpon Post to Kaboodle Bookmark with Yahoo Bookmark with Google * Product Description * Reviews Simple and Stylish, the Lian-Li V2110 offers large capacity, excellent cooling and solid construction. Lian-Li put a lot of effect into the external finishing, all the external parts are finished in hair-line brush anodized aluminum with no sharp edges! All new V series, with new style, new structure, and better quality! Huge internal space fits E-ATX motherboard, and graphics card which up to 395mm long, and there are room for 8 hard drivers, and a lot of internal space for liquid cooling system.It is an ideal for gamer and pro user! - Case Type Super Full Tower - Body Material Aluminum - 5.25" drive bay (External) 7 - 3.5" drive bay (External) - 3.5" drive bay (Internal) 8 - Expansion Slot 8 - Motherboard E-ATX, ATX, M-ATX - System Fan (Front) 14cm Ball Bearing Fan x 1 (800~980~1180 RPM)Factory Setting to Mid speed: 980RPM - System Fan (Rear) 12cm Ball Bearing Fan x 2 (1020~1240~1500 RPM)Factory Setting to Mid speed: 1240RPM - USB2.0 x 4 - IEEE1394x1 - E-SATA x 1 - AC97+HD Audio ID: 864854 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 864886 - Posted: 13 Feb 2009, 2:22:33 UTC - in response to Message 864837. Is there a rig out there with 8 slot-openings? To support 4 double slot GPUs. With maybe well airflow for the GPUs.. My current have only 7 and to now I didn't found a rig with 8.. Thanks a lot! No. The ATX standard only allows 7 expansion slots on the back. Therefore any motherboard that supports the ATX standard can only have 7 slots on the motherboard (not counting shared slots, which still make it a total of 7 usable slots). MicroATX (mATX) can only have a maximum of 4 expansion slots. ID: 864886 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 864887 - Posted: 13 Feb 2009, 2:25:35 UTC - in response to Message 864854. Sure... Lian-Li PC-V2110B Aluminium Super Ful Tower - Black (No PSU) Even though that case is out of the ATX spec, it'll be tough to find a motherboard that can actually have 8 usable expansion slots. Any manufacturer that sells an 8 expansion slot motherboard would be out of the ATX spec as well, meaning they wouldn't be selling too many boards unless they seek out cases like this, which aren't too popular. ID: 864887 ·

Grebuloner Send message Joined: 4 Apr 05 Posts: 19 Credit: 20,588,464 RAC: 0	Message 864892 - Posted: 13 Feb 2009, 2:50:47 UTC - in response to Message 864854. Last modified: 13 Feb 2009, 2:53:26 UTC You could also go for a Thermaltake Armor+ (VH6000BWS), much cheaper than the Lian Li (at least here in the US). It's got 10 slots, and a large side fan blowing directly on top of the expansion slot area that would help with the airflow of 4 gpus crammed in there. @OzzFan: I think he's looking to put a GTX260 on the bottom slot of his motherboard which would cause it to overhang beyond the standard 7th slot cover. Eating more cheese on Thursdays. ID: 864892 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 864943 - Posted: 13 Feb 2009, 5:46:22 UTC - in response to Message 864892. @OzzFan: I think he's looking to put a GTX260 on the bottom slot of his motherboard which would cause it to overhang beyond the standard 7th slot cover. Wouldn't that assume there's an available PCI Express slot at the end of the motherboard that is capable of supporting a graphics card (PCIe x16)? Or even 4 PCIe x16 slots spaced properly apart to do what he's asking? ID: 864943 ·

Westsail and Pyxey Volunteer tester Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0	Message 864995 - Posted: 13 Feb 2009, 11:11:14 UTC So I went to the store to buy an opteron server for the new Tesla, but alas I have returned with a q6600 in a ev3a mobo with 3 slots. figured I could stick with only a 1k power supply with three. So right now I got a 260 the c1080 and the gtx in there. It sweet, having to reinstall windows right now but will have her back tonight. I'll like make a utube vid with all the processes running. It does 8 workunits at a time and made about 3k rac in a little under two hours with the cpu idle.. This is going to0 be really cool. Problem is until the teamwork app can use more than one gpu i'm going to be barefoot (stock apps). I might have a go at an appinfo for v8+MB Cuda x3 Anyway this should get intersting after some validaation if she runs fine going to swap the 200s for 3x 1060 Tesla. The Beast Top hosts list here I come. Never had a top100 machine in my life. Aloha eveyone thank you all for your efforts. "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov ID: 864995 ·

Grebuloner Send message Joined: 4 Apr 05 Posts: 19 Credit: 20,588,464 RAC: 0	Message 865015 - Posted: 13 Feb 2009, 13:21:26 UTC - in response to Message 864943. Here When he built his machine he got a 790FX mobo (MSI K9A2) that has 4 PCIe x16 slots on it evenly spaced, so there is one on the bottom rung. It may only be x8 width, but at 2.0 speeds, it's not a bottleneck. Eating more cheese on Thursdays. ID: 865015 ·

Woyteck - Boinc Busters Poland Send message Joined: 3 Jun 99 Posts: 49 Credit: 3,203,845 RAC: 0	Message 865349 - Posted: 14 Feb 2009, 12:41:57 UTC What we really need is a motherboard with two 32-lane PCI Express switching chips. Only then there can be all four x16 slots fully utilized. I don't think, such monster has been produced yet. :( -- Get up, stand up! Don\'t give up the fight! Credits will make everybody feel high! ;-) ID: 865349 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 865444 - Posted: 14 Feb 2009, 19:41:20 UTC The current GPUs have/need not more bandwidth than PCIe 1.0 x16 if crunching. If the PCIe 2.0 slots are all with x8 it's well.. then you have PCIe 1.0 x16 connection.. OTOH. You need also enough space between the PCIe slots for double slot GPUs.. ;-) To my knowledge the MSI K9A2 Platinum is the only mobo with this properties. 4* PCIe 2.0 x8 and enough space for 4* double GPUs.. :-D But the other prob is, that you need a case with enough space on the bottom.. And 8 slot-openings.. Maybe I will modify my current rig.. Or there are more rigs out there with 8 slot-openings? ID: 865444 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 865453 - Posted: 14 Feb 2009, 19:56:58 UTC - in response to Message 865444. Last modified: 14 Feb 2009, 19:58:45 UTC But the other prob is, that you need a case with enough space on the bottom.. And 8 slot-openings.. Maybe I will modify my current rig.. Or there are more rigs out there with 8 slot-openings? There are some server cases that will have 8 ATX slots (or more), something like this. ID: 865453 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.