Questions and Answers :
Unix/Linux :
2 video cards in linux. Boinc sees them as same device!
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Have you noticed that sometimes 6.6.11 stops processing CUDA for no particular reason? It's done it a few times, I've just had to restart BOINC and it starts up again... very odd. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Did you know that 6.6.11 is, despite its numbering showing it to be a release version, in reality an ALPHA version? In other words that it is one of the development versions with test- and bug fixes going up to the nearest Public Release version 6.6.20? 6.6.12, the one following 6.6.11, has as possible fix for this problem you're talking about, namely amongst others: - client: fix bug where if a GPU job is running, and a 2nd GPU job with an earlier deadline arrives, neither job is executed ever. Reorganized things so that scheduling of GPU jobs is done independently of CPU jobs. All changes from 6.6.11 onwards can be found in the Change Log thread. Where you see when it says "This is a development version of BOINC." that it may indeed be a development version of BOINC and when it says "Public release", it isn't. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Did you know that 6.6.11 is, despite its numbering showing it to be a release version, in reality an ALPHA version? In other words that it is one of the development versions with test- and bug fixes going up to the nearest Public Release version 6.6.20? Right, we understand that, but 6.4.5 has issues and we're trying to find the best version that solves the issues we've mentioned in this thread. I've gone through the code changes from 6.4.5 to 6.6.11 to 6.6.36 and I *think* I've found where the problem (original issue in this thread) is, but every attempt to compile the code (per: http://boinc.berkeley.edu/trac/wiki/CompileClient) has failed on the make in sea directory: cp ../../../stage//usr/local/bin/boinc BOINC/boinc cp: cannot stat `../../../stage//usr/local/bin/boinc': No such file or directory make: *** [BOINC/boinc] Error 1 If I could get the compile to work I could run a few tests and hammer out the specific problem and report the fix back to the developers. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Did you know that 6.6.11 is, despite its numbering showing it to be a release version, in reality an ALPHA version? In other words that it is one of the development versions with test- and bug fixes going up to the nearest Public Release version 6.6.20? yes but some tests were done and 6.6.12 begins broken app_info.xml code. 6.6.11 is the newest version to properly support 2 devices. we understand there are irregularities in behavior since it is not a production release, but 6.6.20 and up is severely broken concerning multiple devices. it is something we are willing to put up with until a proper fix is done on the newest versions. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Did you know that 6.6.11 is, despite its numbering showing it to be a release version, in reality an ALPHA version? In other words that it is one of the development versions with test- and bug fixes going up to the nearest Public Release version 6.6.20? i had that too when i tried. had to manually search out the compiled binaries in the source dirs and then move them where i wanted them. that was a while ago though so unfortunately i dont remember a lot but the above was a familiar error with me too. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Tried that, errored right away looking for: cp: cannot stat `../../../stage//usr/local/bin/boincmgr': No such file or directory But boincmgr was never created, so can't copy it over... |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Ha, got it working... now to run some tests and see if I can get it fixed! |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
So... got past that, but can't recompile the boinc client. Keep getting: boinc_client-client_state.o: In function `CLIENT_STATE::init()': client_state.cpp:(.text+0x5f08): undefined reference to `curl_version' boinc_client-http_curl.o: In function `HTTP_OP::set_speed_limit(bool, double)': http_curl.cpp:(.text+0x10d): undefined reference to `curl_easy_setopt' http_curl.cpp:(.text+0x132): undefined reference to `curl_easy_setopt' But I have curl installed... |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
So... got past that, but can't recompile the boinc client. Keep getting: maybe your path to curl or to curl headers is different than what is in the boinc code? seems like a header is missing or maybe a different version. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Here's the modified script I use, it's pretty simple. Just run it (I've seen no harm in running while BOINC is, as it doesn't change anything) and it spits out something like: ok i'm still trying to comprehend things. this script above changes nothing it simply is a more detailed reporter. here is what i get when i run the V5 script it shows: Number of CPU tasks before rescheduling:223 Number of GPU tasks before rescheduling:194 Number of CPU tasks after rescheduling:223 Number of GPU tasks after rescheduling:194 there are no changes because i ran it a short while ago. when i run your reporting script it tells me: Number of CPU tasks:223 Number of GPU tasks:194 Number of VLAR tasks:197 Number of VHAR tasks:26 Total tasks: 417 so from this i can safely assume that at this present time before any more downloads, the gpu has no vlar or vhar workunits that need moving? |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Yup, if there were any VLAR or VHAR on the GPU it would spit out a line saying <WU name> VLAR on GPU. Since you have 223 on CPU and 196 VLAR/27 VHAR that means only VHAR and VLAR are on CPU, so no mid range ones that need to move either. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Yup, if there were any VLAR or VHAR on the GPU it would spit out a line saying <WU name> VLAR on GPU. Since you have 223 on CPU and 196 VLAR/27 VHAR that means only VHAR and VLAR are on CPU, so no mid range ones that need to move either. cool so maybe then i wont piss people off with the vlar killer any more :) i guess the few computation error ones i am seeing are either vlars that are not caught by the script or true errors. i suspect it may be the tesla since i reviewed a few of the errors and found all of them were on the tesla. might wind up replacing that. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
i noticed a download and ran the script and it had a large amount of both vlar and vhar listed for the gpu. ran the V5 and it fixed it :) now all i have to do is figure out a way to run the script automatically. it would be a really nice addition to have an option to have boinc run an external script after downloading before starting any new tasks. (i know dream on) :) i hate cycling boinc often but since the downloads are asked for at random times it almost seems to minimize problems i should run a cron job once an hour that will stop boinc, run the V5 script and replace the old state file and then restart boinc.. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
wonder if i should change the topic name? maybe to something like 2 devices in linux and keeping gpu free of hassles or something :P |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
What I've done is set mine to 10 days cache, wait until I have a bunch and then set the cache back to 5 days. Then I shut down, run the script, check to make sure it looks good (I have another modified one that will move back X VLAR or VHAR to the GPU if I think the CPU has too much work) and then restart it. Once I get down to a couple days work I'll put the cache back up and download more. Right now I'm letting my cache clear so I can set my pflops right. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
What I've done is set mine to 10 days cache, wait until I have a bunch and then set the cache back to 5 days. Then I shut down, run the script, check to make sure it looks good (I have another modified one that will move back X VLAR or VHAR to the GPU if I think the CPU has too much work) and then restart it. Once I get down to a couple days work I'll put the cache back up and download more. that sounds like it could work however it requires manual intervention. :) i'm lazy when it comes to computers, i deal with them all day and the last thing i like is messing with my own computer so i try to automate mine as much as possible to keep it hassle free for me. so far this hourly script run is working ok. i only had 2 computation errors overnight which is better than the 10 or 15 i used to get. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
We're looking good now. Steady work coming in (broke 7k RAC so far). Just upgraded to CUDA 2.3 and OC both my 260s. Thu 30 Jul 2009 12:47:25 AM KST CUDA devices: GeForce GTX 260 (driver version 0, CUDA version 1.3, 895MB, est. 125GFLOPS), GeForce GTX 260 (driver version 0, CUDA version 1.3, 896MB, est. 121GFLOPS) Couldn't get the 2nd card stable at the same clocks :( Going to run tests for a bit more, but so far it looks good. Just need to confirm long term stability at these clocks. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
We're looking good now. Steady work coming in (broke 7k RAC so far). Just upgraded to CUDA 2.3 and OC both my 260s. wow if you can keep them there thats incredible. my gtx285 which is an xfx oc black edition shows 127gflops. |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
Been going steady overnight, so think I've got it stable. Primary card has a core of 755Mhz, secondary couldn't handle that so it's at 725Mhz (I was too lazy to find the exact max for it). I'm very happy with these clocks, think I'll stick with this company for my next purchase. Heat is still barely over what it was before, which strikes me as odd but I guess those coolers work well. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Been going steady overnight, so think I've got it stable. Primary card has a core of 755Mhz, secondary couldn't handle that so it's at 725Mhz (I was too lazy to find the exact max for it). excellent! yes the coolers nvidia has standardized on are quite good. a bit close on the top end cards but still sufficient.. you probably have the 260-216sp editions. they run considerably cooler than the standard 216. see this chart: http://en.wikipedia.org/wiki/Comparison_of_NVIDIA_Graphics_Processing_Units#cite_note-22 |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.