Would need some help with app_info.xml under Linux (MB+AP+CUDA)


log in

Advanced search

Message boards : Number crunching : Would need some help with app_info.xml under Linux (MB+AP+CUDA)

Author Message
Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1351461 - Posted: 28 Mar 2013, 12:13:06 UTC
Last modified: 28 Mar 2013, 12:15:21 UTC

Greetings!

I have tried for days, but I just can't get it to work properly. Either my GPU is crunching together with AstroPulse, or only the GPU is crunching, AP is being reset, or I change a few bits in my app_info.xml and it just restarts my CUDA work units on the CPU, what the hell?!

Also, I got some reeaaally weird behavior, where the CUDA worker seemingly crunches on the CPU! How is this even possible? Must be a wrong WU? GPU load drops to zero, 1 CPU core is constantly being used 100% by the CUDA (!!) worker and crunching speed is about that of a normal CPU worker. lol.

Ok, I just can't do this alone, even googling and searching around didn't really help.

What I tried to do is use several of Lunatics app_info.xml examples and adapt them to run the following worker applications together on one system:

    AK_V8_linux32_ssse3 (Multibeam, x86_32)
    setiathome-CUDA-6.08.x86_64-pc-linux-gnu (CUDA, x86_64)
    ap_6.01r546_sse3_linux64 (Astropulse V6, x86_64)


Also, lol, BOINC now deleted my "setiathome-CUDA-6.08.x86_64-pc-linux-gnu" binary after removing app_info.xml and restarting BOINC. It also sometimes did that for AK_V8_linux32_ssse3, I really hate it when BOINC just deletes worker apps. I haven't backupped the setiathome-CUDA-6.08.x86_64-pc-linux-gnu, hope I can find it again.

I wouldn't want to try the new "setiathome_x41g_x86_64-pc-linux-gnu_cuda32" from Lunatics, as this seems to be 6.11 / Fermi-based.

My machine configuration:

    Core i7 950
    GeForce GTS 250 (driver: 310.32, CUDA Toolkit installed)
    CentOS 6.3 Linux, x86_64
    BOINC 6.4.4 (newer versions won't run because of the dynamic linking..)


And here is my totally broken app_info.xml, this is just one version, i tried many little changes like <platform> specification, changes in ncpus and some minor stuff. Can somebody help me to repair my app_info.xml so I can run SETI@Home MB on CPU+GPU and AstroPulse on CPU, all together?

<app_info>

<app>
<name>setiathome_enhanced</name>
</app>

<file_info>
<name>AK_V8_linux32_ssse3</name>
<executable/>
</file_info>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<platform>i686-pc-linux-gnu</platform>
<plan_class>sse3</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<file_ref>
<file_name>AK_V8_linux32_ssse3</file_name>
<main_program/>
</file_ref>
</app_version>

<file_info>
<name>setiathome-CUDA-6.08.x86_64-pc-linux-gnu</name>
<executable/>
</file_info>
<file_info>
<name>libcudart.so.2</name>
<executable/>
</file_info>
<file_info>
<name>libcufft.so.2</name>
<executable/>
</file_info>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>528</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda</plan_class>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>setiathome-CUDA-6.08.x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>605</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda</plan_class>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>setiathome-CUDA-6.08.x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>606</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda</plan_class>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>setiathome-CUDA-6.08.x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>608</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda</plan_class>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>setiathome-CUDA-6.08.x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>609</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda</plan_class>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>setiathome-CUDA-6.08.x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

<app>
<name>astropulse_v6</name>
</app>
<file_info>
<name>ap_6.01r546_sse3_linux64</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v6</app_name>
<version_num>601</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>sse3</plan_class>
<file_ref>
<file_name>ap_6.01r546_sse3_linux64</file_name>
<main_program/>
</file_ref>
</app_version>

</app_info>

Help, please! I can't figure it out. :(
____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4048
Credit: 32,693,315
RAC: 531
United Kingdom
Message 1351465 - Posted: 28 Mar 2013, 12:25:41 UTC - in response to Message 1351461.

I wouldn't want to try the new "setiathome_x41g_x86_64-pc-linux-gnu_cuda32" from Lunatics, as this seems to be 6.11 / Fermi-based.

It's a Cuda app for all Cuda GPUs, not just Fermi GPUs, better to use x41g than the Obsolete setiathome-CUDA-6.08.x86_64-pc-linux-gnu app, there's been hundreds of improvements and fixes in the x4* series of Cuda apps.

Claggy

Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1351483 - Posted: 28 Mar 2013, 13:28:51 UTC

Ok, will do! I thought the cuda_fermi might crash on an older G92b GPU or produce invalid results.

Still, that leaves the problem with my obviously buggy app_info.xml!

Oh, also: My BOINC is x86_32. And yes, I do have all the necessary 32-Bit libraries on my system to run both BOINC itself and the 32-Bit SETI MB worker.

Has to be something in app_info.xml, right?
____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1668
Credit: 203,546,471
RAC: 24,923
Australia
Message 1351528 - Posted: 28 Mar 2013, 16:38:20 UTC

Have a look at This Thread. It's 2 years old but contains a lot of useful tips.

The app_info file I posted in it was a goer. The executable and *.so.3 file names will need updating and you'll also have to add a section for AP but for MB and CUDA once the correct file names are plugged in it should be ok for test purposes. At the time I was also using GTS250's

Also check the info on "re-nicing". It certainly made a difference in the GPU crunching times.

HTH
T.A.

P.S. Does anyone know what happened to Aaron ? He did some good work back then.

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1177
Credit: 41,563,119
RAC: 109,597
United States
Message 1351539 - Posted: 28 Mar 2013, 17:31:19 UTC
Last modified: 28 Mar 2013, 17:58:13 UTC

Here's my app_info from this Host;

<app_info>
<app>
<name>astropulse_v6</name>
</app>
<file_info>
<name>ap_6.01r546_sse3_linux64</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v6</app_name>
<version_num>603</version_num>
<flops>60000000000</flops>
<file_ref>
<file_name>ap_6.01r546_sse3_linux64</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</name>
<executable/>
</file_info>
<file_info>
<name>libcudart.so.3</name>
<executable/>
</file_info>
<file_info>
<name>libcufft.so.3</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>611</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda_fermi</plan_class>
<flops>79000000000</flops>
<avg_ncpus>0.2</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1.0</count>
</coproc>
<file_ref>
<file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.3</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.3</file_name>
</file_ref>
</app_version>
</app_info>


That Host was set up by installing BOINC .27 from the default Ubuntu repository, deleting the BOINC folder from the default Ubuntu location, and then installing the downloaded 7.0.28 in my home folder. I have to double-click the /home/tbar/BOINC/boincmgr to launch the program, but it works fine otherwise. I would like to update to a newer version of BOINC, but, I'm unsure how to accomplish that feat given the way it was installed. I'd really hate to disturb a perfectly working system. I had to free a CPU core to have my GTS 250 work at full speed with driver 304. I used the Apps from here, Linux Seti@Home apps

Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1351551 - Posted: 28 Mar 2013, 17:58:21 UTC
Last modified: 28 Mar 2013, 17:59:16 UTC

Thanks for the link @Terror Australis, I will study the thread!

@TBar: Looking at your app_info.xml, I can't help but notice you don't seem to have a S@H app for the CPU, only AP. Is this intentional? I'd like to run S@H both on CPU and GPU plus AP on CPU (AP WU's aren't always available it seems, plus they're harder to download, so the CPU should also do S@H WUs, when no AP WUs are there to crunch).

I wonder whether I could just add a CPU worker to that. Also I may try to get a x86_64 version of BOINC that still runs on my OS and also the x86_64 S@H/MB CPU worker, there are some linked in that other thread. Maybe it works better without bitness or "<platform></platform>" mixing.
____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1177
Credit: 41,563,119
RAC: 109,597
United States
Message 1351559 - Posted: 28 Mar 2013, 18:15:05 UTC - in response to Message 1351551.
Last modified: 28 Mar 2013, 18:28:35 UTC

I'd rather crunch AstroPulses on the CPU. Even with the limit of 100, that has been more than enough to weather any recent AP outage. I have a CPU MB App in my folder, I'm just not using it. It would only take a couple minutes to add the App to my app_info, should I ever need it. AstroPulse files are a more efficient use of your limited number of tasks...they go a lot further. Why waste space with a file that only takes 26 minutes to complete when you can use one that takes 7.5 hours, and nets more points? I suppose it might make a difference if you have problems downloading an AP, I can download 8 APs in around 12 minutes, on all my Hosts.

Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1351736 - Posted: 29 Mar 2013, 8:44:50 UTC - in response to Message 1351559.
Last modified: 29 Mar 2013, 9:42:48 UTC

Wow, I'd love that. It takes me hours to download just 2-3. Even on my high bandwidth workplace (100Mbit up/down) and also on my "low" bandwidth home workstation with 8Mbit both up and down.

Ok, I had a hard time finding a proper 64-Bit BOINC build that would work on my system. There basically is NONE. Why the hell is everybody building for Ubuntu (ew..) as a target system instead of just linking statically? This makes me really angry...

So I went through the trouble of compiling my own custom version. To my surprise I got a pre-release BOINC 7.1.0 from git, and to my even bigger surprise it DID compile without trouble on CentOS 6.3. To my MASSIVE surprise it even runs without any linking trouble at runtime.

I just need to figure out where it puts stuff (data directory) and then I'm running the most modern BOINC and can try running 64-Bit only with your app_info-xml!

Edit: To minimze possible error sources, I used the same worker versions and your app_info.xml, TBar.

However, a new problem appears now. While it does work fine for AP, the CUDA WUs all go into "scheduler wait" mode, which seems to be a problem with my GPU load? There is actually no GPU load other than my X11 server running (no 3D stuff on my desktop).

Graphics card has 512MB VRAM.
____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1351750 - Posted: 29 Mar 2013, 10:30:17 UTC
Last modified: 29 Mar 2013, 10:39:46 UTC

Ew, where did the Edit button go? Am I blind? Weird.

Ok, the VRAM was actually my fault! I had the experimental hardware acceleration / WebGL active in my Opera browser on Linux. As I am running proprietary nVidia drivers again now (for CUDA..) instead of nv open source drivers (no nouveau for me), the setting in Opera kicked back in and ate almost all my VRAM according to the nvidia-smi cli tool. ;)

So that was entirely my fault.

Now let's see if it works now...

Edit: I don't get it. SAME app_info.xml (copy&paste), resetted project, same apps, and now:

Fri 29 Mar 2013 11:38:41 AM CET | SETI@home | Message from server: Your app_info.xml file doesn't have a usable version of SETI@home Enhanced.

____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1351756 - Posted: 29 Mar 2013, 11:39:33 UTC - in response to Message 1351750.

Sorry for posting so much, but the Edit button seems to go away after some while, maybe to prevent stupid edits?

It seems with all the playing around I may have triggered some kind of regret/punishment mechanism in BOINC or on the server. I hope the Your app_info.xml file doesn't have a usable version of SETI@home Enhanced message from the server is linked to that, as I also get this:

Fri 29 Mar 2013 12:34:18 PM CET | SETI@home | This computer has finished a daily quota of 1 tasks

Means I cannot continue testing I suppose.
____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1177
Credit: 41,563,119
RAC: 109,597
United States
Message 1351884 - Posted: 29 Mar 2013, 18:02:45 UTC - in response to Message 1351756.

You are close to running out of video ram with a 512mb card. My install says I'm running close to 500mb with just the BOINC Manager running with one screen and one instance of CUDA. I'm not sure about the app_info message, I've never received that message. As for Ubuntu, it is sorta ugly, but, it kinda grows on you after a while. It is probably the easiest Linux to run SETI on though. Good luck on finding what it installs. I found I have to use the repository install of BOINC or else the downloaded version in my Home folder doesn't work. The repository obviously installs something somewhere that the downloaded BOINC doesn't.

For the average person wanting to run SETI on Linux, it's not that bad. Use the nVidia driver from the System settings, install BOINC from the Repository, install the more recent BOINC in your Home folder, delete the old BOINC folder from /etc (you will need to use 'gksu nautilus' for that), and you are good to go. Having the BOINC folder in your Home folder is a major improvement, in my opinion. I used the same method for installing on the old Dell someone gave me. After I replaced the bad ram that was causing the problems on the Dell, I needed to test it. The Windows XP test was 'inconclusive', the BOINC manager kept freezing after a couple days. After installing Ubuntu on the Dell, no problems. The last SETI run on the Dell was over 28 days without a problem running Ubuntu.

I'm getting ready to install Ubuntu/SETI on my Mac, using the same method.

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1668
Credit: 203,546,471
RAC: 24,923
Australia
Message 1351892 - Posted: 29 Mar 2013, 18:34:42 UTC

The problem is that with all your testing you have errored out too many tasks.

Your computer No. 6121392 shows nearly 1000 errors.

BOINC has a protection mechanism to stop rogue hosts from trashing an unseemly number of units. However for every valid unit you return the number of units you can download increases (I forget the exact number)until you're back to normal.

It looks like it's the units you have lost due to swapping BOINC versions and having tasks abandoned because BOINC deleted your apps that is the problem. There are no units that actually computed and finished with an error.

I would guess there is definitely a problem with your app_info file.

The usual trick is to download some units, backup your BOINC directories and disconnect from the network. That way if the units get trashed you can restore from the backup and try again. When things are working, reconnect to your network and report them in.

T.A.

Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1351907 - Posted: 29 Mar 2013, 19:12:08 UTC

I thought as much. So I'm just gonna wait a little bit and see if it works maybe tomorrow. I have to check that remotely though as I can't physically access the machine until Tuesday. But my custom compiled BOINC build is still running and it might fetch work tomorrow if it's just that protection mechanism kicking in.

I also did one hell of a lot of project resets which seems to have worsened the situation considerably..

I assume that the Opera WebGL/OpenGL acceleration was the culprit for the first attempt (where BOINC told me that CUDA WU's were in "scheduler wait" mode). And then I started messing everything up without noticing soon enough were the REAL problem was.

That means, that TBar's app_info.xml might just work perfectly, so me blaming it might have been in error. Could have just been Opera filling up my VRAM with its hardware accelerated rendering crap (the Browser is almost always open!), thus preventing CUDA SETI from doing its work.

We will see, on Tuesday at the latest.
____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1177
Credit: 41,563,119
RAC: 109,597
United States
Message 1351924 - Posted: 29 Mar 2013, 19:40:56 UTC - in response to Message 1351907.
Last modified: 29 Mar 2013, 19:48:51 UTC

Actually, it's the App_Info from the Download package 'Lunatics_x41g_linux64_cuda32'. There are app_infos in there. Look at the one titled 'app_info.xml-simple_cuda_only'. It's exactly the same CUDA section I posted, except for adding a little more cpu, '<avg_ncpus>0.05</avg_ncpus>', which apparently doesn't make any difference. There are also a couple others in there, 'app_info.xml-complex_cuda_and_cpu' & 'app_info.xml-cuda_and_stock_cpu'. The one I posted works fine for 'cuda_and_astropulse_cpu', which is missing from the included examples, probably because the AstroPulse App didn't exist when the x41g_linux package was released. Works for me.

Profile Grand Admiral Thrawn
Avatar
Send message
Joined: 19 Feb 01
Posts: 53
Credit: 22,263,601
RAC: 0
Austria
Message 1352093 - Posted: 30 Mar 2013, 12:04:48 UTC

So this is one from the Lunatics Package basically.. ok. Works for me. :)

I just logged in via SSH and found that the CUDA worker is currently running on the machine! And CPU load is miniscule, which looks really good! Maybe I'll try adding CPU MB on Tuesday, maybe not, we'll see.

But for today it seems that machine is only allowed to turn in 2 results, so I guess it may take some time before it's treated as a trusted cruncher again.

It seems to work so far though, thanks! :)
____________
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4048
Credit: 32,693,315
RAC: 531
United Kingdom
Message 1352102 - Posted: 30 Mar 2013, 12:35:02 UTC - in response to Message 1352093.

I just logged in via SSH and found that the CUDA worker is currently running on the machine! And CPU load is miniscule, which looks really good! Maybe I'll try adding CPU MB on Tuesday, maybe not, we'll see.

You're got a funny Boinc version running there, for some reason it's not reporting Run Time, Runtime is used along with the APR for Credit calculations, you might find you get Zero Credit for some Wu's.

You can add CPU MB on Tuesday, But you won't get any work, The servers are getting Shutdown on Monday for three days while they move them to a new location.

But for today it seems that machine is only allowed to turn in 2 results, so I guess it may take some time before it's treated as a trusted cruncher again.

You're received 100 GPU tasks today, that'll keep you going for a while, wether you get any more today is another matter.

Claggy

Message boards : Number crunching : Would need some help with app_info.xml under Linux (MB+AP+CUDA)

Copyright © 2014 University of California