2 video cards in linux. Boinc sees them as same device!

Questions and Answers : Unix/Linux : 2 video cards in linux. Boinc sees them as same device!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 919338 - Posted: 19 Jul 2009, 13:39:14 UTC - in response to Message 919319.  

6.4.5 has one internal scheduler for both CPU and GPU work, so that one is kind of broken. You then have to add the <ncpus>CPUs+GPUs</ncpus> (e.g. you have 4 CPUs + 2 GPUs, so this line is <ncpus>6</ncpus>) in the options section of the cc_config.xml file.

BOINC 6.6.x has a separate internal CPU and GPU scheduler. The one in 6.6.20 is still broken, the one in 6.6.36 is getting there.

The reason that not all GPUs are used is because of trouble feeding them with work. The developers know about it, but since it would entail a large code change in both the client software and the server software, with special app_classes for all the different GPUs, they've chosen to use the stop-gap solution of only using the best card in your system. If you then want to use all GPUs, you'll be able to do so by telling BOINC that using the <use_all_gpus> line in cc_config.xml

It's not pretty but you get work done. May I remind you all that a year ago you didn't even have CUDA around here? BOINC itself took more than a year in Beta test before it was ripe enough to be released to the masses. ATI detection is still coming, so I can see why the developers want to wait with these big changes until those GPUs are also added to the fray. It won't do to do all your hard work twice. The 3 main developers are overworked enough already as it is, having to fix bugs in both the client software and the back-end software.


without messing with the ncpus statement it does what i need. i only allow it to use 3 out of the 4 cpus reserving the 4th for my workstation use and cuda. it selects the correct number of cpus just using the local prefs set at 75% and feeds both cuda units so at present i cannot ask for more. i found my machine got more work done more smoothly with 3 cpus than 4. work times decreased by an average of 35min per wu by running 3. my uploaded scores were hardly affected at all and my workstation does not suffer any pauses or other ailments caused by clogged cpus. i tried keeping 4 cpus and reducing the percentage of cpu used doing work but it got worse instead of better. 3 seems to be my magic number.

hmm i wouldn't think device recognition and feeding would be so involved. maybe so, i have not seen the source. for now 6.4.5 appears to be working perfectly for me unless my scores at the home site get severely messed i will probably use this until a 6.6.x version is fixed.
ID: 919338 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 919353 - Posted: 19 Jul 2009, 14:06:48 UTC

6.6.36 was already reported to not work well on Linux. I spent and evening working on installing it to my mandriva only to have it fail. I backed it off to 6.4.5 You are probably better off backing off 6.6.36 unless you want to the latest build which may also be unstable.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 919353 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 919380 - Posted: 19 Jul 2009, 15:23:10 UTC - in response to Message 919353.  

6.6.36 was already reported to not work well on Linux. I spent and evening working on installing it to my mandriva only to have it fail. I backed it off to 6.4.5 You are probably better off backing off 6.6.36 unless you want to the latest build which may also be unstable.


6.6.37 also behaved the same way. i went all the way back to 6.4.5 which although its scheduling is a bit funky, it works fine. will stay with this until i hear that one of the 6.6.x series is working properly with multiple gpus.


ID: 919380 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 919845 - Posted: 20 Jul 2009, 20:44:16 UTC - in response to Message 919130.  

Yeah, I figured out you must be using ps to see it, confirmed it myself... for some reason 6.6.20 and 6.6.36 both stick it all on device 0, but 6.4.5 works correctly (one on each).

I ended up doing a little cable management (wish I had longer cables, not much room to move them around) and moved GPU1 down to slot 3 (still running 16x since I only have the two cards) and it seems to work. Did a test last night for about 6 hours and temps we at 70C (in a closed room, stupid dog, room temp was about 32C) so I can live with that (still might look into some extra cooling... I still haven't OC these cards). Doing a longer (12 hour) test today to verify it's all good.

Only possible issue I see now is that GPU1 is about 1" off the PSU, but that's better than the MAYBE 1/8" I had between GPU0 and GPU1 before.



hmm one thing i noticed about 6.4.5. it has not gotten any new cuda units, only cpu. it just requested 100 cpu units and no gpu. so i put 6.6.37 back to see if any gpu were available and they are downloading now. looks like im gonna have to use this to get work then switch to 6.4.5 to make it work right?

ID: 919845 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919941 - Posted: 21 Jul 2009, 1:51:35 UTC - in response to Message 919845.  

Unless you're getting nothing but AP, CPU and GPU units are the same. Check your settings here, make sure use GPU is checked, and make sure your app_info.xml is right.

http://lunatics.kwsn.net/12-gpu-crunching/cpu-gpu-rebranding-perl-script.msg17406.html#msg17406

Grab that tool, there is a V5 on page 2 or 3 I think. It will move all VLAR and VHAR to your CPU and everything else to GPU. I had to make a couple changes to it, so if it doesn't work le tme know and I'll give you the changes I made (it couldn't open the files on my system for some reason).
ID: 919941 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 919956 - Posted: 21 Jul 2009, 2:11:11 UTC - in response to Message 919941.  

Unless you're getting nothing but AP, CPU and GPU units are the same. Check your settings here, make sure use GPU is checked, and make sure your app_info.xml is right.

http://lunatics.kwsn.net/12-gpu-crunching/cpu-gpu-rebranding-perl-script.msg17406.html#msg17406

Grab that tool, there is a V5 on page 2 or 3 I think. It will move all VLAR and VHAR to your CPU and everything else to GPU. I had to make a couple changes to it, so if it doesn't work le tme know and I'll give you the changes I made (it couldn't open the files on my system for some reason).


everything is set and has been. 6.6.x versions get gpu work just fine. the seti prefs for 'home' which is what this computer belongs to are set to use gpu and local prefs are set to use gpu. 6.4.5 just did not get any gpu work when it got mb units. i dont do ap i only do mb. i use the AK ssse3 for intel for the cpus and for the gpu im using vlar killer

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu

6.4.5 messages give no indication of not using any gpu it simply did not get any. but i got a few hundred via 6.6.37 then once they were queued for download i switched back and let them continue to dl under 6.4.5. it may be i didnt give it long enough to 'settle in' after the long upload problems. i tend to get impatient with this stuff. now that it's got plenty of work i will just leave 6.4.5 up and wait to see if it runs out of cuda units to do or if it gets more.

i thought the reassignments were only available in the windows versions of the cuda app. this one does not reassign, it kills off with computation error. would be nice to get a cuda 2.2 app that would reassign them. feels like i am wasting a lot of resources on both ends with just killing, but my tesla cannot take vlars. they lock it solid every time.

i just saw that url that has a script so ill check it out.

thanks

ID: 919956 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919970 - Posted: 21 Jul 2009, 2:34:43 UTC - in response to Message 919956.  

Odd. all of mine go to the GPU on 6.4.5, I use that rebranding script to move them all over (obviously make sure BOINC is off at the time). Been using it for about a day now, works pretty well.
ID: 919970 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 919978 - Posted: 21 Jul 2009, 3:00:46 UTC - in response to Message 919970.  

Odd. all of mine go to the GPU on 6.4.5, I use that rebranding script to move them all over (obviously make sure BOINC is off at the time). Been using it for about a day now, works pretty well.


it could be because i was forcing uploads and updates manually for a little bit to try to get a feel for things but manual may mess up it getting things although i am surprised when it got its first 20 cpu units it did not get any gpu units at all and the gpus were idle at that time. when i put 6.6.37 up it immediately got 40 gpu units. im just gonna let it settle out a bit and see how it does without a human touching it :)


ID: 919978 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 919989 - Posted: 21 Jul 2009, 3:34:50 UTC - in response to Message 919970.  

Odd. all of mine go to the GPU on 6.4.5, I use that rebranding script to move them all over (obviously make sure BOINC is off at the time). Been using it for about a day now, works pretty well.


hehe it left my cpus with 54 work units :)

i had to change the path where it goes into the projects/seti directory, it had windows syntax of \\ so i changed it to \/ in 2 places in that line and separated the filename declarations with spaces at the top and added a #!/usr/bin/perl to the top line and the scipt worked just fine. oh and i had to change the max workunits from 550 to 1000.

now all i have to do is figure out the best times to run this under cron and set u a script to stop boinc, run it, copy the new file over and restart boinc.

maybe every hour? it didnt catch any vlars at all but i suspect if there were any it would have caught them and rescheduled them to the cpus. at least it didnt report any. got v 1.5. it seems that beginning around 1.6 they started concentrating on windoze and changed it into an executable along the way.



ID: 919989 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919991 - Posted: 21 Jul 2009, 3:58:23 UTC - in response to Message 919989.  

Sounds about what I had to change as well, moved a ton to my CPU (probably too many, but we'll see). Lately I've been getting a ton of VLAR and a good amount of VHAR too.
ID: 919991 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 920485 - Posted: 22 Jul 2009, 23:13:20 UTC - in response to Message 919991.  

sunu told me about boinc 6.6.11. i have been running it for about 6 hours now and it works! it understands the devices, 1 wu per device, gets proper workunits for cpu and gpu, works well with app_info.xml and the AK mb app and the 2.2 cuda app. seems this one is the best so far to run until they fix the newer versions. after it got a ton of cpu apps since i was out, i ran the V5 perl script again with lower numbers plugged in, and it changed things to 111 cpu apps with 422 gpu apps. still has not reported finding any vlar or vhar workunits to reschedule to cpu unless it is just silent about it. hehe after i ran the script when boinc came back up it promptly went out to fill the void of cpu units :)


ID: 920485 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 920544 - Posted: 23 Jul 2009, 2:52:49 UTC - in response to Message 920485.  

Sounds good, where did you find 6.6.11 so I can give it a try?
ID: 920544 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 920600 - Posted: 23 Jul 2009, 7:26:25 UTC - in response to Message 920544.  

Sounds good, where did you find 6.6.11 so I can give it a try?


You can find almost every release at this place.
http://boincdl.ssl.berkeley.edu/dl/

ID: 920600 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 920610 - Posted: 23 Jul 2009, 8:15:46 UTC - in response to Message 920600.  

Sounds good, where did you find 6.6.11 so I can give it a try?


You can find almost every release at this place.
http://boincdl.ssl.berkeley.edu/dl/


yes that is where i got it from. specifically


http://boincdl.ssl.berkeley.edu/dl/boinc_6.6.11_x86_64-pc-linux-gnu.sh


ID: 920610 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 920615 - Posted: 23 Jul 2009, 9:25:12 UTC - in response to Message 920600.  

Thanks, didn't know about that one and couldn't find it from the normal download location. I'll try it out when I get home tonight!
ID: 920615 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 920625 - Posted: 23 Jul 2009, 11:00:32 UTC - in response to Message 920610.  

Sounds good, where did you find 6.6.11 so I can give it a try?


You can find almost every release at this place.
http://boincdl.ssl.berkeley.edu/dl/


yes that is where i got it from. specifically


http://boincdl.ssl.berkeley.edu/dl/boinc_6.6.11_x86_64-pc-linux-gnu.sh



That works perfectly!
ID: 920625 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 920633 - Posted: 23 Jul 2009, 11:44:12 UTC - in response to Message 920625.  

Sounds good, where did you find 6.6.11 so I can give it a try?


You can find almost every release at this place.
http://boincdl.ssl.berkeley.edu/dl/


yes that is where i got it from. specifically


http://boincdl.ssl.berkeley.edu/dl/boinc_6.6.11_x86_64-pc-linux-gnu.sh



That works perfectly!


cool. i think we finally hit on the magic version until they catch up with the fixes in the new versions.

ID: 920633 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 920638 - Posted: 23 Jul 2009, 12:01:25 UTC - in response to Message 920633.  

Sounds good, where did you find 6.6.11 so I can give it a try?


You can find almost every release at this place.
http://boincdl.ssl.berkeley.edu/dl/


yes that is where i got it from. specifically


http://boincdl.ssl.berkeley.edu/dl/boinc_6.6.11_x86_64-pc-linux-gnu.sh



That works perfectly!


cool. i think we finally hit on the magic version until they catch up with the fixes in the new versions.


Yup, it still says "High priority" on stuff not due until the 9th, but downloading new WU regularly so I'm happy with that. I'll just periodically check for VLAR (and VHAR, once my VLAR count goes back down) on the GPU (modified that script to make a new one that just spits out info about types of tasks and how many are assigned to each) and reassign them.
ID: 920638 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 920662 - Posted: 23 Jul 2009, 13:36:53 UTC - in response to Message 920638.  

cool. yeah it does a few funky things but they all get processed so i don't really care.. im hoping the 'stock' functionality of the script is sufficient since other than changing the ratios at the top i basically have to use it as is. i can only trust it checks cuda wu for vlar and moves them to cpu since it doesnt report anything.
ID: 920662 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 920841 - Posted: 23 Jul 2009, 23:28:40 UTC - in response to Message 920662.  

Here's the modified script I use, it's pretty simple. Just run it (I've seen no harm in running while BOINC is, as it doesn't change anything) and it spits out something like:

VHAR on GPU: 12mr09ac.6289.7025.14.10.195
VHAR on GPU: 05dc08ae.32507.890.16.10.254
Number of CPU tasks:413
Number of GPU tasks:323
Number of VLAR tasks:330
Number of VHAR tasks:42
Total tasks: 736

You can't run it while downloading new WU, as it can't open the WU files to read them if they aren't there yet.

Right now (330 VLAR tasks) I've moved VHAR to GPU hence it complains about it, but you can see CPU has 413 tasks and there's only 330 VLAR so I should run the rebrand script again soon.

Here's the script:

$path="client_state.xml"; 

open (IN, $path);  

$NumOfCPUTasks=0;
$NumOfGPUTasks=0;
$NumVLAR=0;
$NumVHAR=0;
$NumGPUToNumCPU_high_limit=25;
$NumGPUToNumCPU_low_limit=0.5;

while (<IN>) {   
		if( /<workunit>/ ){
#parsing result 
			$trueAR=-1;#error condition
			$WUname="";
			while(<IN>){
				if( /<\/workunit>/ ){
					open (WU, "projects\/setiathome.berkeley.edu\/" .$WUname) || die "ERROR: cant open task file " . $WUname;
					while(<WU>){#reading task file and deciding where it should go
						if( /<true_angle_range>(.*)<\/true_angle_range>/ ){
							$trueAR=$1;
							if( $trueAR == -1 ){
								die "ERROR detected - cant determine AR value\n";
							}
        						if($trueAR < 0.13){
								$tasks{$WUname}=1;
								$NumVLAR++;}
							elsif($trueAR > 1.127){
								$tasks{$WUname}=2;
								$NumVHAR++;
							}else{$tasks{$WUname}=3;}
							last;
						}
						
					}
					close(WU);
					last;
				}

				if( /<name>(.*)<\/name>/ ){
					$WUname=$1;
					#print "task:\\".$1."\\ \n";
				}
				elsif( /<version_num>603<\/version_num>/ ){$NumOfCPUTasks++;}
  				elsif( /<version_num>608<\/version_num>/ ){$NumOfGPUTasks++;}
				
				
			}

		}

}
close(IN);
open (IN, $path);
while (<IN>) {
	if( /<name>(.*)<\/name>/ ){
		$WUname=$1;
		#print "task:\\".$1."\\ \n";
	}
	elsif( /<version_num>608<\/version_num>/ ){
		if($tasks{$WUname}){
			if($tasks{$WUname} == 1){
				print "VLAR on GPU: " . $WUname ."\n";}
			elsif($tasks{$WUname} == 2){
				print "VHAR on GPU: " . $WUname ."\n";}
		}
	}
}
close(IN);

print "Number of CPU tasks:".$NumOfCPUTasks."\n";
print "Number of GPU tasks:".$NumOfGPUTasks."\n";
print "Number of VLAR tasks:".$NumVLAR."\n";
print "Number of VHAR tasks:".$NumVHAR."\n";
if($NumOfCPUTasks!=0){ $GPU_to_CPU_ratio=$NumOfGPUTasks/$NumOfCPUTasks;}
else{ $GPU_to_CPU_ratio=1;}
if($GPU_to_CPU_ratio >$NumGPUToNumCPU_high_limit){
	print "Too many tasks allocated to GPU already ".$GPU_to_CPU_ratio."\n";}
if($GPU_to_CPU_ratio <$NumGPUToNumCPU_low_limit){
	print "Too many tasks allocated to CPU already " .$GPU_to_CPU_ratio ."\n";}
	print "Total tasks: ".($NumOfCPUTasks+$NumOfGPUTasks)."\n";
ID: 920841 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Questions and Answers : Unix/Linux : 2 video cards in linux. Boinc sees them as same device!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.