Questions and Answers :
Unix/Linux :
2 video cards in linux. Boinc sees them as same device!
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Been going steady overnight, so think I've got it stable. Primary card has a core of 755Mhz, secondary couldn't handle that so it's at 725Mhz (I was too lazy to find the exact max for it). i'm surprised at my 285, it is running 690mhz core, 1476mhz shaders and 2600mhz memory which is only a little up from reference which is 658/1476/2484 and with the fan at 100% i have never seen it exceed 69c (averages 62-65c)which is nice and cool for a vid card under load as both a primary video card and cuda processor. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
YEah, they're sp and factory OC with non-standard coolers. 66C is the highest I've seen so far (after I fixed my heat issues), case helps a bit too (ATCS 840) as it has really good airflow and keeps the amb temps down. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
is there a cuda 2.3 version of the x86_64 vlar killer app yet? i just converted my nvidia system to 2.3 sdk and toolkit and the 190.18 driver. so far no funny stuff happening it is behaving itself well. i do notice tho that my glxgears is about 2kFPS down from what it was with the 185 driver. nothing i can notice in using the computer so far though. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
is there a cuda 2.3 version of the x86_64 vlar killer app yet? that was premature. my desktop started misbehaving in a serious way with stalling and lockups sometimes to the system level. with boinc stopped it behaved. i went back to cuda 2.2 and 185.18.29 and so far it is considerably better and has not misbehaved in any fashion. i guess the 190 driver isn't linux ready yet or i overlooked something in the install. not gonna chance it again for a while. give it some time for 190 and 2.3 to age some. also glxgears came back to the above 10kFPS scores again. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
I'm on Ubuntu 8.10 x64 and haven't had any real issues. With BOINC going it is a bit slower, sometimes taking seconds to switch active windows, but I think that has more to do with the level of OC than the drivers. My assumption is that level of OC with it being used at full load like that is the cause, less overhead for normal use. Basically it feels like every WU is a VLAR. Broke 9k RAC today! |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
I'm on Ubuntu 8.10 x64 and haven't had any real issues. With BOINC going it is a bit slower, sometimes taking seconds to switch active windows, but I think that has more to do with the level of OC than the drivers. My assumption is that level of OC with it being used at full load like that is the cause, less overhead for normal use. Basically it feels like every WU is a VLAR. cool! i only broke 8k so far for this machine (8,846.37) but lately i have also had lotsa down time due to tstorms plus all the changing back and forth of drivers/libraries/support progs etc. still thats higher than this machine has ever had. if it behaves with more than only a slight delay with oc stuff then it may not be not right. oc should simply increase speed. that may have been the problem with mine since my gtx is an oc card.. xfx black edition. its not oc by a lot but its enough. also my system is oc too (q6600 at 3ghz) but that has been more than a year with not a single burp. tested last year it could go higher but i picked a comfortable point that doesnt seem to be stressing anything and still runs in temps i like to see. with boinc running i occasionally have a minor delay in switching desktops and sometimes i can see the new window retain the old data for a fraction of a second, especially if its the same application with different data in different desktops, but nothing really objectionable. i have come to accept that as normal 'cuda-driven behavior'. of course when i was running mb on all 4 procs it was considerably worse. now that im only using 3 of them for workunits leaving 1 free for my desktop and cuda to use, its considerably better. my above behavior seems to be more vid card related probably the different desktop finding room to work in the vid card with cuda taking up all available resources. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
thanks for that reporting script! it has proven invaluable! it shows me the V5 script is working properly since i no longer have any vlar/vhar wu for the cuda and the only wu the processors are getting are vlar/vhar. everything else is going to cuda. the only time cuda catches a vlar is if it happens to do a wu that has not be caught yet between script runs. very rare now. i may even increase the frequency of the run from once an hour to twice an hour since it seems to have little impact in the time boinc is shut down. working well! |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Sounds good, I noticed today my clock failed... pitty. Guess it wasn't 100% stable, time to cut it down a little more and see if this will last. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
710Mhz seems to be much more stable, but (now that the OC isn't failing) heat went up a bit. 72C on both GPUs, but room was 34C so not terrible. Wed 05 Aug 2009 07:56:47 PM KST CUDA devices: GeForce GTX 260 (driver version 0, CUDA version 1.3, 895MB, est. 119GFLOPS), GeForce GTX 260 (driver version 0, CUDA version 1.3, 896MB, est. 119GFLOPS) |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
710Mhz seems to be much more stable, but (now that the OC isn't failing) heat went up a bit. 72C on both GPUs, but room was 34C so not terrible. that actually sounds better. i have read in numerous forums where these puppies don't like much above 700mhz. and 72c is great for that frequency and ambient temp. i need to get a small window a/c for this room. i am killing the house a/c to keep this room below 31c. the rest of the house is a walk-in freezer :P not to mention my electricity bill. ( what i need are those small a/c units that mount on the top of the computer and cover all intake areas :) dont have to be powerful either. feeding the computer 15-20c air temp is more than sufficient ) |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Yeah, I've got one of those fans you can put ice into... just a pain to use. pretty happy with this now, running 6/8 cores and the two GPUs for 12 hours and I got 192 WU crunched, RAC is just under 12k now! |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Yeah, I've got one of those fans you can put ice into... just a pain to use. pretty happy with this now, running 6/8 cores and the two GPUs for 12 hours and I got 192 WU crunched, RAC is just under 12k now! hehe i'm jealous. guess i have to go out and upgrade my system to an i7 or something :P got an idea in my head to mess with pelicer solid state cooling devices and see if i can get one of those to work. they would keep any processor way cooler than air or water could (examples showed an overclocked multicore processor running at 13c full load in a 38c ambient environment), however from what i am reading they are a pain to spec out to your system and then touchy to implement so... not sure.. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Yeah, one of those would be sweet... but a bit expensive. I'm thinking about the new Corsair H50 as the reviews look good, and would leave me plenty of working space to add in the watercooled 295s (with the whole loop obviously) later. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Yeah, one of those would be sweet... but a bit expensive. I'm thinking about the new Corsair H50 as the reviews look good, and would leave me plenty of working space to add in the watercooled 295s (with the whole loop obviously) later. true. sounds like you are planning quite a setup! however, for me, after looking it up a bit the various test results have not convinced me it will do any better than my zalman 9700nt is already doing. plus i seriously do not like pushing warmed air into the case. this would require custom ductwork from front or side panel to radiator to be able to push air out like it should be. the rear of my computer is FAR from even near ambient since it has 6 exhaust fans counting the 2 gpu fans pushing very warm air out the back. reversing one to meet corsair's requirement will only bring that warmed air back in which will not be anywhere near ambient like they want unless i duct that fan input to elsewhere in the room away from the rear of the case which really gets messy. blowing warmed air into the case seriously complicates keeping ambient air in the case cool. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
this is getting stupid now... Number of CPU tasks:504 Number of GPU tasks:152 Number of VLAR tasks:249 Number of VHAR tasks:255 Total tasks: 656 i had to lower the low ratio to 0.01 to get rid of the vlars on gpu. guess there is an odd batch of raw data being split now. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Heh, I had the opposite the other day, had to move 100 non-VLAH/VHAR to the CPU. Planning a lot for this machine, slowly getting there. Mine the CPU cooler I'm pretty happy as is, so I can wait a bit to save up more and decide what to do with the rest. Worst case, my other machine needs an upgrade, so new parts go in current one and these parts move over to the server. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Heh, I had the opposite the other day, had to move 100 non-VLAH/VHAR to the CPU. wish i had that problem.. down to 61 cuda wu now and no replacements coming. guess i will have to up my cache to get more work and then lower it back down or something or just sit tight and ignore it while the cpus do their thing. sounds like a good project going. im planning a project for next year for a dedicated cruncher/secondary desktop. it will do very little as my vnc desktop into this current workstation but it will take some of the server monitor loading off this one. since the gtx285 out performs tesla, it will be probably an i7 of some variety with either 4 gtx285 or 4 gtx295 cards so it will have 6 or 7 cpu cores crunching reserving 1 or 2 for cuda/machine and 4 or 8 cuda running. should be good. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Hi Joe. How do you alter this script to leave the VHAR's on the CPU but still move the VLAR's? My skills in this area are practically zero. BTW Thanks to you and Chuck for a very informative and interesting thread !! Regards Brodo
|
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
its an either/or situation on the VHAR. either they all go to gpu or all go to cpu. i have mine set so that all vhar go to gpu and some 'normal' wu go to cpu along with vlar so cpu gets a few 'gifts'. that was an easy change. here is my script #!/usr/bin/perl -w $path = "client_state.xml" ; $results = "client_state.new" ; open (IN, $path); $NumOfCPUTasks=0; $NumOfGPUTasks=0; $NumGPUToNumCPU_high_limit=1000; $NumGPUToNumCPU_low_limit=0.01; $Num_tasks_limit=2000; while (<IN>) { if( /<workunit>/ ){ #parsing result $trueAR=-1;#error condition $WUname=""; while(<IN>){ if( /<\/workunit>/ ){ open (WU, "projects\/setiathome.berkeley.edu\/".$WUname) || die "ERROR: cant open task file"; while(<WU>){#reading task file and deciding where it should go if( /<true_angle_range>(.*)<\/true_angle_range>/ ){ $trueAR=$1; if( $trueAR == -1 ){ die "ERROR detected - cant determine AR value\n"; } # if($trueAR < 0.13 || $trueAR > 1.127){ if($trueAR < 0.37){ $tasks{$WUname}=603; }else{$tasks{$WUname}=608;} last; } } close(WU); last; } if(/<app_name>(.*)<\/app_name>/){ if($1 ne "setiathome_enhanced"){ # print $1."\n"; last; } } if( /<name>(.*)<\/name>/ ){ $WUname=$1; # print "task:".$1."\n"; } elsif( /<version_num>603<\/version_num>/ ){$NumOfCPUTasks++;} elsif( /<version_num>608<\/version_num>/ ){$NumOfGPUTasks++;} } } } close(IN); print "Number of CPU tasks before rescheduling:".$NumOfCPUTasks."\n"; print "Number of GPU tasks before rescheduling:".$NumOfGPUTasks."\n"; if($NumOfCPUTasks!=0){ $GPU_to_CPU_ratio=$NumOfGPUTasks/$NumOfCPUTasks;} else{ $GPU_to_CPU_ratio=1;} if($GPU_to_CPU_ratio >$NumGPUToNumCPU_high_limit){ die "Too many tasks allocated to GPU already";} if($GPU_to_CPU_ratio <$NumGPUToNumCPU_low_limit){ die "Too many tasks allocated to CPU already";} if($NumOfCPUTasks+$NumOfGPUTasks>$Num_tasks_limit){ die "Too many tasks in cache already";} $NumOfCPUTasks=0; $NumOfGPUTasks=0; open (IN, $path); open (RES, ">".$results); while (<IN>) { if( /<result>/ ){ $WUname=""; print RES $_; $is_SETI_MB=0; while(<IN>){ if( /<name>(.*)_.*/){#can be SETI MB result print RES $_; if($1){ if($tasks{$1}){ if($tasks{$1}==603 || $tasks{$1}==608){ $WUname=$1; $is_SETI_MB=1; } } } }elsif( /<version_num>/ ){ if($is_SETI_MB){;} else{ print RES $_;} } elsif( /<plan_class>/ ){ if($is_SETI_MB){;} else{ print RES $_;} } elsif( /<\/result>/ ){ if($is_SETI_MB){ if($tasks{$WUname}==603){ print RES " <version_num>603<\/version_num>\n"; } if($tasks{$WUname}==608){ print RES " <version_num>608<\/version_num>\n"; print RES " <plan_class>cuda<\/plan_class>\n"; } } print RES $_; last; } else{ print RES $_;} } }elsif( /<workunit>/ ){ $WUname=""; print RES $_; $is_SETI_MB=0; while(<IN>){ if( /<name>(.*)<\/name>/ ){ $WUname=$1; print RES $_; if($1){ if($tasks{$1}){ if($tasks{$1}==603 || $tasks{$1}==608){ $WUname=$1; $is_SETI_MB=1; } } } } elsif( /<version_num>/ ){ if($is_SETI_MB){;} else{ print RES $_;} } elsif( /<\/workunit>/ ){ if($is_SETI_MB){ if($tasks{$WUname}==603){ print RES " <version_num>603<\/version_num>\n"; $NumOfCPUTasks++; } if($tasks{$WUname}==608){ print RES " <version_num>608<\/version_num>\n"; $NumOfGPUTasks++; } } print RES $_; last; } else{ print RES $_;} } }else{ print RES $_;} } print "Number of CPU tasks after rescheduling:".$NumOfCPUTasks."\n"; print "Number of GPU tasks after rescheduling:".$NumOfGPUTasks."\n"; the only change is this code in the while loop # if($trueAR < 0.13 || $trueAR > 1.127){ if($trueAR < 0.37){ the original commented statement adds all vlar and vhar to the cpu giving the gpu all the rest. i noticed the script you have is very different than the one i have but mine works so i am fine with it :) my 0.37 number gives all vlar plus a few of the lower angle normal units to cpu while giving all the other ones plus vhar to the gpu. my script above is the splitting script only. i have kept the reporting script separated and only modified that to not list the vhar workunits but still show the vhar count knowing they are all on the gpu. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
turns out running VHAR on the gpus hurt scoring a lot. seems they are better suited and score higher processing on the cpus, so i set the script back to what it was but increased the .13 to .33 to give the cpus the larger angle calculations. that seemed to help a lot. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.