Posts by robl

1) Message boards : Number crunching : Linux/NVIDIA/AP questions (Message 1472492)
Posted 3 Feb 2014 by Profile robl
Post:
Ok, I reloaded with the bare minimum and I'm getting really close now. I'm thinking there's something wrong with my app_info.xml file but I can't spot it because my eyes are crossing now because I've been staring at this monitor all day.

In my /var/lib/boinc/projects/setiathome.berkeley.edu directory, I have:

-rw-rw-r--. 1 guy guy 2488 Feb 1 14:51 app_info.xml
-rw-rw-r--. 1 guy guy 2438 Feb 1 14:46 app_info.xml~
-rw-r--r--. 1 boinc boinc 52779 Feb 1 11:21 arecibo_181.png
-rwxr-xr-x. 1 root root 313872 Feb 1 10:42 libcudart.so.3
-rwxr-xr-x. 1 root root 28996288 Feb 1 10:42 libcufft.so.3
-rw-r--r--. 1 boinc boinc 2536 Feb 1 11:21 sah_40.png
-rw-r--r--. 1 boinc boinc 25488 Feb 1 11:21 sah_banner_290.png
-rw-r--r--. 1 boinc boinc 35399 Feb 1 11:21 sah_ss_290.png
-rwxr-xr-x. 1 root root 1564204 Feb 1 14:34 setiathome_7.01_x86_64-pc-linux-gnu
-rwxr-xr-x. 1 root root 6373944 Feb 1 10:43 setiathome_x41g_x86_64-pc-linux-gnu_cuda32
-rw-r--r--. 1 boinc boinc 71 Feb 1 14:52 slideshow_setiathome_enhanced_00
-rw-r--r--. 1 boinc boinc 72 Feb 1 14:52 slideshow_setiathome_enhanced_01
-rw-r--r--. 1 boinc boinc 75 Feb 1 14:52 slideshow_setiathome_enhanced_02
-rw-r--r--. 1 boinc boinc 67 Feb 1 14:52 stat_icon

See anything wrong with my permissions (edit: or owners)? (I wish I could figure out a way to maintain the columns for you readers)
Yes, I'm running it as root. No big deal. Not concerned about security at this point.


I read you comment on security but for future info I believe that all ownership should be "boinc:boinc".
2) Message boards : Number crunching : Linux/NVIDIA/AP questions (Message 1472489)
Posted 3 Feb 2014 by Profile robl
Post:


I just may break down and install the latest version of Ubuntu if I throw in the towel on Fedora. Been with them since Redhat 6, so I probably have an unhealthy attachment to it. I may need to see a therapist to fix that.



If you do that, go with Mint instead of Ubuntu.
http://www.linuxmint.com/


About a week ago I downloaded Mint (version based on Ubuntu 12.04 LTS - Maya I believe) for the purpose of installing it on a VM machine. It installed ok but I could not update the software. Apparently the distribution has a screwed up source.list. After googling and trying to fix this problem within the source.list I gave up and reverted back to Ubuntu. Since Mint seems to be based upon Ubuntu why not use Ubuntu. I have been extremely pleased with Ubuntu's Linux. Certainly can't say the same for Mint however.

Google "mint source.list" to get some idea.
3) Message boards : Number crunching : Is there a problem? I cannot receive any new tasks since 1:15pm EST on 11-27-13. (Message 1447489)
Posted 27 Nov 2013 by Profile robl
Post:
I did a manual update on Seti and nothing. The server status page is showing 59 seti@home results ready to send with 0 Astro pulse. Its been "stuck" for quite a while.
4) Message boards : Number crunching : server status page shows no astropulse jobs (Message 1441704)
Posted 12 Nov 2013 by Profile robl
Post:
Is there insufficient data or is the outage the result of administration? Not sure I understand why there is an outage.
5) Message boards : Number crunching : server status page shows no astropulse jobs (Message 1441671)
Posted 12 Nov 2013 by Profile robl
Post:
how long do these "outages" last?
6) Message boards : Number crunching : server status page shows no astropulse jobs (Message 1441458)
Posted 12 Nov 2013 by Profile robl
Post:
what is the status on astropulse jobs? server page is showing 0 available.
7) Message boards : Number crunching : how to enforce "resource share" (Message 1438391)
Posted 5 Nov 2013 by Profile robl
Post:
I am going to suspend Rosetta and "no more WUs" for S@H. When the S@H jobs are complete I will accept Astropulse jobs only and place a <max_concurrent>3</max_concurrent> statement in my app_config.xml file. My understanding is this will prevent me from receiving the astropulse cpu jobs as long as I have 3 gpu Astropulse jobs running. When I have worked off all of the current S@H jobs I would "resume" Rosetta and "allow more WUs" for S@H. I am thinking that I would now be processing CPU WUs for Rosetta and receiving/crunching Astropulse GPU jobs only for S@H.

Am I correct?

The max_concurrent setting won't impact what tasks you receive, it just impacts what tasks run. And the AP cpu and gpu applications have the same short name, so it could run cpu as well as gpu tasks within that max concurrent setting. If you don't want SETI cpu tasks, turn off cpu fetch in your website SETI@Home preferences.


Thanks. I turned off "cpu fetch" at the website. I am now waiting to complete the pending S@H WUs before accepting more S@H GPU Wus and before "resuming" Rosetta.
8) Message boards : Number crunching : how to enforce "resource share" (Message 1438371)
Posted 5 Nov 2013 by Profile robl
Post:
WOW!! Thanks for the reply. Things are never as simple as you think. A great explanation for how resource sharing works.

I am going to suspend Rosetta and "no more WUs" for S@H. When the S@H jobs are complete I will accept Astropulse jobs only and place a <max_concurrent>3</max_concurrent> statement in my app_config.xml file. My understanding is this will prevent me from receiving the astropulse cpu jobs as long as I have 3 gpu Astropulse jobs running. When I have worked off all of the current S@H jobs I would "resume" Rosetta and "allow more WUs" for S@H. I am thinking that I would now be processing CPU WUs for Rosetta and receiving/crunching Astropulse GPU jobs only for S@H.

Am I correct?
9) Message boards : Number crunching : how to enforce "resource share" (Message 1438362)
Posted 5 Nov 2013 by Profile robl
Post:
I am sharing a Linux with a NVIDIA 650 Ti box between S@H and Rosetta. Both projects have a resource share of 100. S@H is always running GPU WUs since Rosetta does not support GPUs. My concern is that Rosetta seems to be getting priority for the CPU WUs while S@H's V7 and Astropulse CPU WUs are in a "Ready to Start" state. I put Rosetta in a suspend state causing S@H to get the CPUs and then put Rosetta in a "Resume" state. Within an hour the S@H CPU WUs were put in "waiting" and Rosetta got the CPUs.

Is there a way to enforce the 50/50 share?
10) Message boards : Number crunching : How to run more than one GPU WU on a Nvidia 650 Ti (Message 1436999)
Posted 2 Nov 2013 by Profile robl
Post:
Fred,

Thanks for explaining the "max_concurrent" attribute. I did remove it. I had an AP cpu job in a wait state and it never ran when cpus became available. Now I know why.
11) Message boards : Number crunching : How to run more than one GPU WU on a Nvidia 650 Ti (Message 1436987)
Posted 2 Nov 2013 by Profile robl
Post:
Thanks for the input all. I changed my app_config.xml file to look like this:

<app_config>
<app>
<name>astropulse_v6</name>
<max_concurrent>3</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
</app_config>

I now have 3 GPU WUs crunching away. The main confusion was over the version of boinc/boinc-manager. Also the "max_concurrent" attribute seems required to run more than 1 GPU.

FYI for newbies: If you change the app_config.xml file you need to get BOINC to re-read the file. Do this by selecting "advanced" ---> "read config file".
No need to restart boinc-client.
12) Message boards : Number crunching : How to run more than one GPU WU on a Nvidia 650 Ti (Message 1436673)
Posted 1 Nov 2013 by Profile robl
Post:
I have now installed version 7.0.65. No GPU jobs are running.

The event log is showing:

"app astropulse_v6 not found in app_config.xml"

Here is my app_config.xml file's contents:

<app_config>
<app>
<name>astropulse_v6</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
</app_config>

I have downloaded one AstroPulse WU but it remains in a "Ready to Start" state.

My app_config.xml file is located at:
/var/lib/boinc-client/projects/setiathome.berkeley.edu
13) Message boards : Number crunching : How to run more than one GPU WU on a Nvidia 650 Ti (Message 1436602)
Posted 1 Nov 2013 by Profile robl
Post:
I had been sharing this node (Ubuntu 12.04 with NVIDIA drivers 304.88) with S&H and E&H. Both projects could run 1,2, or 3 GPU tasks in any order/combination. I recently decide to "rearrange" a couple of nodes with respect to project assignments. This is one such node. It now supports Rosetta and S&H.

When it ran S&H and E&H I had a file called /var/lib/boinc-client/projects/setiathome.berkeley.edu/app_config.xml. Its contents looked like the following:

<app_config>
<app>
<name>astropulse_v6</name>
<max_concurrent>6</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
</app_config>

I reinstalled this same file but can only get one GPU WU going. It looks like this in boinc-manager:

seti@home some% Running (0.317 CPUs + 1 NVIDIA GPU)

I have looked around and tried different gpu_usage/cpu_usage values but none seem to make a difference. I did restart boinc-client after installing/changing these values. It seems that this file is not being read.

I tried the same app_config.xml file without the concurrent. No difference.
14) Message boards : Number crunching : OpenCL Astropulse on Linux, high CPU usage (Message 1403603)
Posted 15 Aug 2013 by Profile robl
Post:
I'm running a 'new' ATI card on Linux and it's CPU usage is much higher than I would think it should be. I'm about to try and change an XP host to Linux, so, I'll see how the 6850 does in Linux.

BTW, I found an App that shows GPU load in Linux. I know it says Windows, but, it runs in Ubuntu with the MonoDevelop package. It's the first one I've found;

...
Open Hardware Monitor

Thanks for that hint. Do you know if it will also show GPU load for NV GPUs ? It is (maybe) the first one with a GUI for Linux.

For ATI/AMD GPUs the GPU load can be displayed on the commandline using the driver configuration commandline tool aticonfig. The command

> aticonfig --adapter=all --odgc --odgt

will show actual frequencies, temperatures and GPU loads of all installed ATI/AMD GPUs working with that driver.


I will try tomorrow after I get off work on my 650Ti.


arkayn,

did you ever get a chance to try the "open hardware monitor" on your 650Ti?
15) Message boards : Number crunching : parse_init_data_file: no end tag (Message 1383475)
Posted 21 Jun 2013 by Profile robl
Post:
I had to migrate from FC17 to Ubuntu 12.04 LTS to get back into the game.

Thanks to everyone who responded.
16) Message boards : Number crunching : parse_init_data_file: no end tag (Message 1383078)
Posted 20 Jun 2013 by Profile robl
Post:
Looks like folks at Einstein found the problem.

What they didn't tell you that the bug was introduced in 7.0.30 and fixed in 7.0.31. (I don't know how Fedora managed to get it into 7.0.29.) Going back to an earlier release might be the solution for you.

This job was estimated at 6 hours but is now running at the 8:52 hour mark. boinc manager's "remaining estimated time" is showing as "---".

Since the app thinks it's running standalone it doesn't send any progress info to BOINC so BOINC can't compute any meaningful remaining estimated time.


Juha,

Thanks for your input, especially the part about "can't compute any meaningful estimated time".
17) Message boards : Number crunching : parse_init_data_file: no end tag (Message 1382902)
Posted 20 Jun 2013 by Profile robl
Post:
You could try looking into what exactly is it that the apps don't like in the init data file.

The file is called init_data.xml. Each slot directory has its own file. Since the apps that show the error are from Seti and Einstein try to pick a slot that either one is using.

If you think it's best to copy-paste the file here do note that the file contains your authenticator which is something you'll want to blank first.

You may also want to make sure file and/or directory permissions are ok.


I checked the directory and file permissions and ownership. They seem to be fine.

I checked a currently running E&H wu's init_data.xml file and looking through it I did not see anything which indicated an error condition, but then again I am not sure what I would see. This job was estimated at 6 hours but is now running at the 8:52 hour mark. boinc manager's "remaining estimated time" is showing as "---".

I will let it continue and see if it ever completes.
18) Message boards : Number crunching : parse_init_data_file: no end tag (Message 1382862)
Posted 19 Jun 2013 by Profile robl
Post:
All I meant was IIRC, after 7.0.28 was placed as the recommended version back in the day the alpha builds progressed pretty rapidly at first while they worked towards the last recommended which was 7.0.44.

The Linux recommended through RPM sites has long time been 7.0.27, which had trouble on its own.

the whole problem with the older distros is that the libraries aren't updated towards their version. So in this case, even though for instance the BOINC wiki states that you have to have certain library files, it doesn't mean that the library files in fedora 17 are all up to date. Even while their name may be the same. Like libc.so.6, you can't see on the outside which version that is, but it is a requirement that BOINC has.

Building BOINC yourself is going to be problematic, although not impossible. Not something I've ever done on Linux, I must say. I did build for Windows. A long time ago. When Visual Studio 2005 was still the thing to use. ;-)

Biggest problem with Fedora 17 and present BOINC is the lack of compatible or up-to-date GlibC libraries. The only way around it that I see, is to update to fedora 18 or 19.

The APP_INIT_DATA.xml file keeps track of the elapsed time that a task took. Seeing the BOINC Wiki, it looks to be a problem with the science application, or at least with the BOINC API. Not saying it still can't be a problem with BOINC, but problems with it were fixed in 6.13/7.0.3.


Ageless, I am not questioning your skillset but before I migrate to fedora 18/19 can you explain why those releases of fedora would eliminate the problem I am currently having, i.e., what makes them more BOINC friendly. I believe that FC 17 is coming to an end but I would like to be reasonably sure that such an effort would produce better results than what I am having with FC 17. I have never successfully upgraded a Fedora release. It has always been a virgin install so there is time involved and time seems to always be an elusive bird.
19) Message boards : Number crunching : parse_init_data_file: no end tag (Message 1382770)
Posted 19 Jun 2013 by Profile robl
Post:
I saw the reply to your post over at EAH. I'm going to say that the problem is with the version of BOINC you are running, since 7.0.29 isn't even the suggested older version for any platform.


As posted on the EAH website the binary downloaded from the boinc site was generated under an UBUNTU distro. The libraries used are not part of FC17 and my attempts at locating them have failed.

Even with this error jobs run to completion, and credits are granted. This is a bit confusing. I would think that all jobs would fail.

Earlier I killed the 24 hour seti WU in the hopes of installing a new BOINC but now I am back to the original. I will look further into locating the missing libs but until then ....

20) Message boards : Number crunching : parse_init_data_file: no end tag (Message 1382698)
Posted 19 Jun 2013 by Profile robl
Post:
System is Linux with a NVIDIA 650ti GPU

I am seeing the following in job outputs:
<core_client_version>7.0.29</core_client_version>
<![CDATA[
<stderr_txt>
parse_init_data_file: no end tag
22:14:16 (24029): Can't parse init data file - running in standalone mode
parse_init_data_file: no end tag
22:14:16 (24029): Can't parse init data file - running in standalone mode
setiathome_v7 7.00 Revision: 1782 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Any idea what is causing this?


Next 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.