2 video cards in linux. Boinc sees them as same device!


log in

Advanced search

Questions and Answers : Unix/Linux : 2 video cards in linux. Boinc sees them as same device!

Previous · 1 · 2 · 3 · 4
Author Message
Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 922384 - Posted: 30 Jul 2009, 10:52:24 UTC - in response to Message 922333.

Been going steady overnight, so think I've got it stable. Primary card has a core of 755Mhz, secondary couldn't handle that so it's at 725Mhz (I was too lazy to find the exact max for it).

I'm very happy with these clocks, think I'll stick with this company for my next purchase. Heat is still barely over what it was before, which strikes me as odd but I guess those coolers work well.


i'm surprised at my 285, it is running 690mhz core, 1476mhz shaders and 2600mhz memory which is only a little up from reference which is 658/1476/2484
and with the fan at 100% i have never seen it exceed 69c (averages 62-65c)which is nice and cool for a vid card under load as both a primary video card and cuda processor.

____________

Joseph Monk
Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 922386 - Posted: 30 Jul 2009, 11:05:32 UTC - in response to Message 922384.

YEah, they're sp and factory OC with non-standard coolers. 66C is the highest I've seen so far (after I fixed my heat issues), case helps a bit too (ATCS 840) as it has really good airflow and keeps the amb temps down.

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 922834 - Posted: 1 Aug 2009, 1:57:49 UTC - in response to Message 922386.

is there a cuda 2.3 version of the x86_64 vlar killer app yet?

i just converted my nvidia system to 2.3 sdk and toolkit and the 190.18 driver.
so far no funny stuff happening it is behaving itself well.

i do notice tho that my glxgears is about 2kFPS down from what it was with the 185 driver. nothing i can notice in using the computer so far though.
____________

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 923033 - Posted: 1 Aug 2009, 20:16:24 UTC - in response to Message 922834.

is there a cuda 2.3 version of the x86_64 vlar killer app yet?

i just converted my nvidia system to 2.3 sdk and toolkit and the 190.18 driver.
so far no funny stuff happening it is behaving itself well.

i do notice tho that my glxgears is about 2kFPS down from what it was with the 185 driver. nothing i can notice in using the computer so far though.


that was premature. my desktop started misbehaving in a serious way with stalling and lockups sometimes to the system level. with boinc stopped it behaved. i went back to cuda 2.2 and 185.18.29 and so far it is considerably better and has not misbehaved in any fashion. i guess the 190 driver isn't linux ready yet or i overlooked something in the install. not gonna chance it again for a while. give it some time for 190 and 2.3 to age some. also glxgears came back to the above 10kFPS scores again.
____________

Joseph Monk
Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 923062 - Posted: 1 Aug 2009, 23:45:21 UTC - in response to Message 923033.

I'm on Ubuntu 8.10 x64 and haven't had any real issues. With BOINC going it is a bit slower, sometimes taking seconds to switch active windows, but I think that has more to do with the level of OC than the drivers. My assumption is that level of OC with it being used at full load like that is the cause, less overhead for normal use. Basically it feels like every WU is a VLAR.

Broke 9k RAC today!

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 923071 - Posted: 2 Aug 2009, 1:12:14 UTC - in response to Message 923062.
Last modified: 2 Aug 2009, 1:13:18 UTC

I'm on Ubuntu 8.10 x64 and haven't had any real issues. With BOINC going it is a bit slower, sometimes taking seconds to switch active windows, but I think that has more to do with the level of OC than the drivers. My assumption is that level of OC with it being used at full load like that is the cause, less overhead for normal use. Basically it feels like every WU is a VLAR.

Broke 9k RAC today!


cool! i only broke 8k so far for this machine (8,846.37) but lately i have also had lotsa down time due to tstorms plus all the changing back and forth of drivers/libraries/support progs etc. still thats higher than this machine has ever had.

if it behaves with more than only a slight delay with oc stuff then it may not be not right. oc should simply increase speed. that may have been the problem with mine since my gtx is an oc card.. xfx black edition. its not oc by a lot but its enough. also my system is oc too (q6600 at 3ghz) but that has been more than a year with not a single burp. tested last year it could go higher but i picked a comfortable point that doesnt seem to be stressing anything and still runs in temps i like to see.

with boinc running i occasionally have a minor delay in switching desktops and sometimes i can see the new window retain the old data for a fraction of a second, especially if its the same application with different data in different desktops, but nothing really objectionable. i have come to accept that as normal 'cuda-driven behavior'. of course when i was running mb on all 4 procs it was considerably worse. now that im only using 3 of them for workunits leaving 1 free for my desktop and cuda to use, its considerably better. my above behavior seems to be more vid card related probably the different desktop finding room to work in the vid card with cuda taking up all available resources.
____________

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 923360 - Posted: 3 Aug 2009, 15:13:21 UTC - in response to Message 923062.
Last modified: 3 Aug 2009, 15:15:23 UTC

thanks for that reporting script! it has proven invaluable! it shows me the V5 script is working properly since i no longer have any vlar/vhar wu for the cuda and the only wu the processors are getting are vlar/vhar. everything else is going to cuda. the only time cuda catches a vlar is if it happens to do a wu that has not be caught yet between script runs. very rare now. i may even increase the frequency of the run from once an hour to twice an hour since it seems to have little impact in the time boinc is shut down.

working well!
____________

Joseph Monk
Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 923428 - Posted: 3 Aug 2009, 20:20:43 UTC - in response to Message 923360.

Sounds good, I noticed today my clock failed... pitty. Guess it wasn't 100% stable, time to cut it down a little more and see if this will last.

Joseph Monk
Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 923825 - Posted: 5 Aug 2009, 19:44:31 UTC - in response to Message 923428.

710Mhz seems to be much more stable, but (now that the OC isn't failing) heat went up a bit. 72C on both GPUs, but room was 34C so not terrible.

Wed 05 Aug 2009 07:56:47 PM KST CUDA devices: GeForce GTX 260 (driver version 0, CUDA version 1.3, 895MB, est. 119GFLOPS), GeForce GTX 260 (driver version 0, CUDA version 1.3, 896MB, est. 119GFLOPS)

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 923989 - Posted: 6 Aug 2009, 3:22:13 UTC - in response to Message 923825.
Last modified: 6 Aug 2009, 3:26:41 UTC

710Mhz seems to be much more stable, but (now that the OC isn't failing) heat went up a bit. 72C on both GPUs, but room was 34C so not terrible.

Wed 05 Aug 2009 07:56:47 PM KST CUDA devices: GeForce GTX 260 (driver version 0, CUDA version 1.3, 895MB, est. 119GFLOPS), GeForce GTX 260 (driver version 0, CUDA version 1.3, 896MB, est. 119GFLOPS)


that actually sounds better. i have read in numerous forums where these puppies don't like much above 700mhz. and 72c is great for that frequency and ambient temp. i need to get a small window a/c for this room. i am killing the house a/c to keep this room below 31c. the rest of the house is a walk-in freezer :P not to mention my electricity bill. ( what i need are those small a/c units that mount on the top of the computer and cover all intake areas :) dont have to be powerful either. feeding the computer 15-20c air temp is more than sufficient )
____________

Joseph Monk
Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 924042 - Posted: 6 Aug 2009, 11:23:40 UTC - in response to Message 923989.

Yeah, I've got one of those fans you can put ice into... just a pain to use. pretty happy with this now, running 6/8 cores and the two GPUs for 12 hours and I got 192 WU crunched, RAC is just under 12k now!

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 924327 - Posted: 7 Aug 2009, 14:42:03 UTC - in response to Message 924042.

Yeah, I've got one of those fans you can put ice into... just a pain to use. pretty happy with this now, running 6/8 cores and the two GPUs for 12 hours and I got 192 WU crunched, RAC is just under 12k now!


hehe i'm jealous. guess i have to go out and upgrade my system to an i7 or something :P

got an idea in my head to mess with pelicer solid state cooling devices and see if i can get one of those to work. they would keep any processor way cooler than air or water could (examples showed an overclocked multicore processor running at 13c full load in a 38c ambient environment), however from what i am reading they are a pain to spec out to your system and then touchy to implement so... not sure..

____________

Joseph Monk
Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 924459 - Posted: 8 Aug 2009, 0:48:49 UTC - in response to Message 924327.

Yeah, one of those would be sweet... but a bit expensive. I'm thinking about the new Corsair H50 as the reviews look good, and would leave me plenty of working space to add in the watercooled 295s (with the whole loop obviously) later.

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 924609 - Posted: 8 Aug 2009, 13:56:06 UTC - in response to Message 924459.

Yeah, one of those would be sweet... but a bit expensive. I'm thinking about the new Corsair H50 as the reviews look good, and would leave me plenty of working space to add in the watercooled 295s (with the whole loop obviously) later.



true. sounds like you are planning quite a setup! however, for me, after looking it up a bit the various test results have not convinced me it will do any better than my zalman 9700nt is already doing. plus i seriously do not like pushing warmed air into the case. this would require custom ductwork from front or side panel to radiator to be able to push air out like it should be. the rear of my computer is FAR from even near ambient since it has 6 exhaust fans counting the 2 gpu fans pushing very warm air out the back. reversing one to meet corsair's requirement will only bring that warmed air back in which will not be anywhere near ambient like they want unless i duct that fan input to elsewhere in the room away from the rear of the case which really gets messy. blowing warmed air into the case seriously complicates keeping ambient air in the case cool.
____________

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 925339 - Posted: 11 Aug 2009, 9:40:18 UTC - in response to Message 924609.

this is getting stupid now...

Number of CPU tasks:504
Number of GPU tasks:152
Number of VLAR tasks:249
Number of VHAR tasks:255
Total tasks: 656


i had to lower the low ratio to 0.01 to get rid of the vlars on gpu. guess there is an odd batch of raw data being split now.
____________

Joseph Monk
Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 925464 - Posted: 12 Aug 2009, 1:57:33 UTC - in response to Message 925339.

Heh, I had the opposite the other day, had to move 100 non-VLAH/VHAR to the CPU.

Planning a lot for this machine, slowly getting there. Mine the CPU cooler I'm pretty happy as is, so I can wait a bit to save up more and decide what to do with the rest. Worst case, my other machine needs an upgrade, so new parts go in current one and these parts move over to the server.

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 925562 - Posted: 12 Aug 2009, 14:24:41 UTC - in response to Message 925464.

Heh, I had the opposite the other day, had to move 100 non-VLAH/VHAR to the CPU.

Planning a lot for this machine, slowly getting there. Mine the CPU cooler I'm pretty happy as is, so I can wait a bit to save up more and decide what to do with the rest. Worst case, my other machine needs an upgrade, so new parts go in current one and these parts move over to the server.


wish i had that problem.. down to 61 cuda wu now and no replacements coming. guess i will have to up my cache to get more work and then lower it back down or something or just sit tight and ignore it while the cpus do their thing.

sounds like a good project going. im planning a project for next year for a dedicated cruncher/secondary desktop. it will do very little as my vnc desktop into this current workstation but it will take some of the server monitor loading off this one. since the gtx285 out performs tesla, it will be probably an i7 of some variety with either 4 gtx285 or 4 gtx295 cards so it will have 6 or 7 cpu cores crunching reserving 1 or 2 for cuda/machine and 4 or 8 cuda running. should be good.

____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1711
Credit: 204,860,582
RAC: 24,437
Australia
Message 930955 - Posted: 4 Sep 2009, 20:43:37 UTC - in response to Message 920841.

Hi Joe.
How do you alter this script to leave the VHAR's on the CPU but still move the VLAR's? My skills in this area are practically zero.

BTW Thanks to you and Chuck for a very informative and interesting thread !!

Regards
Brodo



Here's the script:

$path="client_state.xml";

open (IN, $path);

$NumOfCPUTasks=0;
$NumOfGPUTasks=0;
$NumVLAR=0;
$NumVHAR=0;
$NumGPUToNumCPU_high_limit=25;
$NumGPUToNumCPU_low_limit=0.5;

while (<IN>) {
if( /<workunit>/ ){
#parsing result
$trueAR=-1;#error condition
$WUname="";
while(<IN>){
if( /<\/workunit>/ ){
open (WU, "projects\/setiathome.berkeley.edu\/" .$WUname) || die "ERROR: cant open task file " . $WUname;
while(<WU>){#reading task file and deciding where it should go
if( /<true_angle_range>(.*)<\/true_angle_range>/ ){
$trueAR=$1;
if( $trueAR == -1 ){
die "ERROR detected - cant determine AR value\n";
}
if($trueAR < 0.13){
$tasks{$WUname}=1;
$NumVLAR++;}
elsif($trueAR > 1.127){
$tasks{$WUname}=2;
$NumVHAR++;
}else{$tasks{$WUname}=3;}
last;
}

}
close(WU);
last;
}

if( /<name>(.*)<\/name>/ ){
$WUname=$1;
#print "task:\\".$1."\\ \n";
}
elsif( /<version_num>603<\/version_num>/ ){$NumOfCPUTasks++;}
elsif( /<version_num>608<\/version_num>/ ){$NumOfGPUTasks++;}


}

}

}
close(IN);
open (IN, $path);
while (<IN>) {
if( /<name>(.*)<\/name>/ ){
$WUname=$1;
#print "task:\\".$1."\\ \n";
}
elsif( /<version_num>608<\/version_num>/ ){
if($tasks{$WUname}){
if($tasks{$WUname} == 1){
print "VLAR on GPU: " . $WUname ."\n";}
elsif($tasks{$WUname} == 2){
print "VHAR on GPU: " . $WUname ."\n";}
}
}
}
close(IN);

print "Number of CPU tasks:".$NumOfCPUTasks."\n";
print "Number of GPU tasks:".$NumOfGPUTasks."\n";
print "Number of VLAR tasks:".$NumVLAR."\n";
print "Number of VHAR tasks:".$NumVHAR."\n";
if($NumOfCPUTasks!=0){ $GPU_to_CPU_ratio=$NumOfGPUTasks/$NumOfCPUTasks;}
else{ $GPU_to_CPU_ratio=1;}
if($GPU_to_CPU_ratio >$NumGPUToNumCPU_high_limit){
print "Too many tasks allocated to GPU already ".$GPU_to_CPU_ratio."\n";}
if($GPU_to_CPU_ratio <$NumGPUToNumCPU_low_limit){
print "Too many tasks allocated to CPU already " .$GPU_to_CPU_ratio ."\n";}
print "Total tasks: ".($NumOfCPUTasks+$NumOfGPUTasks)."\n";



Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 935388 - Posted: 23 Sep 2009, 13:38:11 UTC - in response to Message 930955.

its an either/or situation on the VHAR. either they all go to gpu or all go to cpu. i have mine set so that all vhar go to gpu and some 'normal' wu go to cpu along with vlar so cpu gets a few 'gifts'. that was an easy change. here is my script

#!/usr/bin/perl -w
$path = "client_state.xml" ;
$results = "client_state.new" ;


open (IN, $path);

$NumOfCPUTasks=0;
$NumOfGPUTasks=0;
$NumGPUToNumCPU_high_limit=1000;
$NumGPUToNumCPU_low_limit=0.01;
$Num_tasks_limit=2000;
while (<IN>) {
if( /<workunit>/ ){
#parsing result
$trueAR=-1;#error condition
$WUname="";
while(<IN>){
if( /<\/workunit>/ ){
open (WU, "projects\/setiathome.berkeley.edu\/".$WUname) || die "ERROR: cant open task file";
while(<WU>){#reading task file and deciding where it should go
if( /<true_angle_range>(.*)<\/true_angle_range>/ ){
$trueAR=$1;
if( $trueAR == -1 ){
die "ERROR detected - cant determine AR value\n";
}
# if($trueAR < 0.13 || $trueAR > 1.127){
if($trueAR < 0.37){
$tasks{$WUname}=603;
}else{$tasks{$WUname}=608;}
last;
}

}
close(WU);
last;
}
if(/<app_name>(.*)<\/app_name>/){
if($1 ne "setiathome_enhanced"){
# print $1."\n";
last;
}
}
if( /<name>(.*)<\/name>/ ){
$WUname=$1;
# print "task:".$1."\n";
}
elsif( /<version_num>603<\/version_num>/ ){$NumOfCPUTasks++;}
elsif( /<version_num>608<\/version_num>/ ){$NumOfGPUTasks++;}

}

}
}
close(IN);
print "Number of CPU tasks before rescheduling:".$NumOfCPUTasks."\n";
print "Number of GPU tasks before rescheduling:".$NumOfGPUTasks."\n";
if($NumOfCPUTasks!=0){ $GPU_to_CPU_ratio=$NumOfGPUTasks/$NumOfCPUTasks;}
else{ $GPU_to_CPU_ratio=1;}
if($GPU_to_CPU_ratio >$NumGPUToNumCPU_high_limit){
die "Too many tasks allocated to GPU already";}
if($GPU_to_CPU_ratio <$NumGPUToNumCPU_low_limit){
die "Too many tasks allocated to CPU already";}
if($NumOfCPUTasks+$NumOfGPUTasks>$Num_tasks_limit){
die "Too many tasks in cache already";}
$NumOfCPUTasks=0;
$NumOfGPUTasks=0;
open (IN, $path);
open (RES, ">".$results);
while (<IN>) {
if( /<result>/ ){
$WUname="";
print RES $_;
$is_SETI_MB=0;
while(<IN>){
if( /<name>(.*)_.*/){#can be SETI MB result
print RES $_;
if($1){
if($tasks{$1}){
if($tasks{$1}==603 || $tasks{$1}==608){
$WUname=$1;
$is_SETI_MB=1;
}
}
}
}elsif( /<version_num>/ ){
if($is_SETI_MB){;}
else{ print RES $_;}
}
elsif( /<plan_class>/ ){
if($is_SETI_MB){;}
else{ print RES $_;}
}
elsif( /<\/result>/ ){
if($is_SETI_MB){
if($tasks{$WUname}==603){
print RES " <version_num>603<\/version_num>\n";

}
if($tasks{$WUname}==608){
print RES " <version_num>608<\/version_num>\n";
print RES " <plan_class>cuda<\/plan_class>\n";
}
}
print RES $_;
last;
}
else{ print RES $_;}
}
}elsif( /<workunit>/ ){
$WUname="";
print RES $_;
$is_SETI_MB=0;
while(<IN>){
if( /<name>(.*)<\/name>/ ){
$WUname=$1;
print RES $_;
if($1){
if($tasks{$1}){
if($tasks{$1}==603 || $tasks{$1}==608){
$WUname=$1;
$is_SETI_MB=1;
}
}
}
}
elsif( /<version_num>/ ){
if($is_SETI_MB){;}
else{ print RES $_;}
}
elsif( /<\/workunit>/ ){
if($is_SETI_MB){
if($tasks{$WUname}==603){
print RES " <version_num>603<\/version_num>\n";
$NumOfCPUTasks++;

}
if($tasks{$WUname}==608){
print RES " <version_num>608<\/version_num>\n";
$NumOfGPUTasks++;
}
}
print RES $_;
last;
}
else{ print RES $_;}
}

}else{ print RES $_;}
}
print "Number of CPU tasks after rescheduling:".$NumOfCPUTasks."\n";
print "Number of GPU tasks after rescheduling:".$NumOfGPUTasks."\n";


the only change is this code in the while loop

# if($trueAR < 0.13 || $trueAR > 1.127){
if($trueAR < 0.37){

the original commented statement adds all vlar and vhar to the cpu giving the gpu all the rest.

i noticed the script you have is very different than the one i have but mine works so i am fine with it :) my 0.37 number gives all vlar plus a few of the lower angle normal units to cpu while giving all the other ones plus vhar to the gpu.

my script above is the splitting script only. i have kept the reporting script separated and only modified that to not list the vhar workunits but still show the vhar count knowing they are all on the gpu.
____________

Chuck Gorish
Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 943747 - Posted: 30 Oct 2009, 0:52:50 UTC - in response to Message 935388.

turns out running VHAR on the gpus hurt scoring a lot. seems they are better suited and score higher processing on the cpus, so i set the script back to what it was but increased the .13 to .33 to give the cpus the larger angle calculations. that seemed to help a lot.


____________

Previous · 1 · 2 · 3 · 4

Questions and Answers : Unix/Linux : 2 video cards in linux. Boinc sees them as same device!

Copyright © 2014 University of California