Stock AP app having an issue with a batch of work?

Message boards : Number crunching : Stock AP app having an issue with a batch of work?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1165017 - Posted: 24 Oct 2011, 13:23:33 UTC

I noticed an invalid task pop up today, 837596801. Which is not unexpected from time to time, but when I looked at the task I thought it looked funny.
Task		Computer	Sent				Time reported or deadline	Status				Run time (sec)	CPU time (sec)	Credit	Application
2113573897	4725574		14 Oct 2011 | 10:42:54 UTC	14 Oct 2011 | 15:45:45 UTC	Error while computing		0.00		0.00		---	Astropulse v505 v5.06
2113573898	6191261		14 Oct 2011 | 10:42:55 UTC	14 Oct 2011 | 10:48:11 UTC	Error while computing		1.07		0.00		---	Astropulse v505 v5.06
2114177255	6186874		14 Oct 2011 | 21:22:36 UTC	14 Oct 2011 | 21:27:47 UTC	Error while downloading		0.00		0.00		---	Astropulse v505 v5.05
2114467297	6067658		15 Oct 2011 | 2:41:46 UTC	15 Oct 2011 | 2:46:52 UTC	Error while computing		0.00		0.00		---	Astropulse v505 v5.06
2114779667	6185696		15 Oct 2011 | 9:02:55 UTC	15 Oct 2011 | 9:09:03 UTC	Error while computing		0.00		0.00		---	Astropulse v505 v5.05
2115128329	5012752		15 Oct 2011 | 14:00:37 UTC	24 Oct 2011 | 5:57:12 UTC	Completed, can't validate	63,245.10	63,240.96	0.00	Astropulse v505 Anonymous platform (CPU)
2115428038	6180271		15 Oct 2011 | 20:23:45 UTC	16 Oct 2011 | 14:55:09 UTC	Error while computing		1.03		0.00		---	Astropulse v505 v5.06


All of the stock app machines had an error with the task. So I looked at the 5 'Error while computing' tasks and found them all to have:

<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
</stderr_txt>
]]>


So I looked at the tasks for each of the machines at it seems that all of the AP tasks on those machines are ending with 'Error while computing'.

I then looked through my valid tasks and have found several where there is a 3rd result with the 'Error while computing' status. Some of them were "process got signal 8" instead of 11. Admittedly I don't have a clue what all the various application exit codes mean. These might be as relevant as the -9 overflow message, but I thought it seemed odd.

Perhaps this is just an example of the lunatics code handling things better than the stock app? Maybe there is just some wonky data out there? Who knows.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1165017 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1165030 - Posted: 24 Oct 2011, 14:39:29 UTC - in response to Message 1165017.  
Last modified: 24 Oct 2011, 15:09:54 UTC

Haven't seen any AstroPulses, lately Running 1 on 0.5 GPU or 1, ended up in a -177
error.
ID: 1165030 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1165031 - Posted: 24 Oct 2011, 14:44:37 UTC - in response to Message 1165017.  

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?
ID: 1165031 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1165039 - Posted: 24 Oct 2011, 15:49:57 UTC - in response to Message 1165031.  

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?


At first I saw Linux box, Linux box, Linux box... and started to think "Linux machines are broken". Then I saw the windows machine and thought "ah OK it is more than just Linux".

The windows box looks like it just broken as all of the tasks, MB & AP, are spitting out "too many exit(0)s". It was probably just a fluke that it happen to be in there.

I have often thought that some kind of platform mechanism should be used on the back end. So if a tasks gets errors on a specific platform stop sending it to that one. That might be in place to some extent or it could be a total mess to do.

Just the little things like this that point out the holes in the current system that could be worked on. Where this potentially valid data is probably going to the bit bucket.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1165039 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1165070 - Posted: 24 Oct 2011, 17:15:44 UTC - in response to Message 1165031.  

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?


Why were so many channels 'bad', channels ended in error: 0 MB 19 AstroPulse.
As of 24 Oct 2011 | 17:00:08 UTC, according to the SERVER Page?

Too much RFI, well I'm guesssing, RFI & RADAR Blanking, is present almost all the time,
maybe just a bad series of channels?

ID: 1165070 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1165099 - Posted: 24 Oct 2011, 19:38:50 UTC - in response to Message 1165031.  

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?

I thought Urs Echternacht had a fix for the signal 11 problem, and it was in the repository, but a new app was never built, couldn't find a post the last time i looked.

Claggy
ID: 1165099 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1165148 - Posted: 24 Oct 2011, 23:57:07 UTC

I know Urs has built a AP app that does run just fine on Linux, but it is only available as a optimized app.

ID: 1165148 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1165183 - Posted: 25 Oct 2011, 2:31:12 UTC - in response to Message 1165148.  

I've also noticed that plenty of Linux hosts have been erroring out AP work.

Cheers.
ID: 1165183 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1165327 - Posted: 25 Oct 2011, 18:59:23 UTC - in response to Message 1165183.  

I've also noticed that plenty of Linux hosts have been erroring out AP work.

Cheers.

Same, nearly all AP errors I've seen from my wingmen runs linux.
ID: 1165327 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1170112 - Posted: 11 Nov 2011, 15:52:03 UTC
Last modified: 11 Nov 2011, 15:57:58 UTC

Each time I upgrade to latest Ubuntu Linux (at 11.10 now,
every 6 months there is a new release) I turn on AP,
get 4 or so AP failures with signal 11
(with no AP successes), and turn it off again.
Running stock apps, not optimized apps. No app_info.xml file.
MultiBeam works fine -- I get an error or two (signal 11 or
whatever) four or five times a year (between 2 machines).
One machine x86, the other machine x86_64.
ID: 1170112 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1170127 - Posted: 11 Nov 2011, 16:18:48 UTC - in response to Message 1165183.  
Last modified: 11 Nov 2011, 16:21:56 UTC

I've also noticed that plenty of Linux hosts have been erroring out AP work.

I'm noticing this a lot and noticed it on my machines, too, when I was still running stock.
Stock 64 bit AP application fails on newer Linux distributions (glibc incompatibility?) and admins don't care enough to *finally* remove it. 32 bit app. works just fine on 64 bit distros.

I know Urs has built a AP app that does run just fine on Linux, but it is only available as a optimized app

And it works awesomely and much faster. There is no reason to run stock :)
ID: 1170127 · Report as offensive
Woofie

Send message
Joined: 11 Jan 12
Posts: 4
Credit: 68,135
RAC: 0
Czech Republic
Message 1185889 - Posted: 17 Jan 2012, 8:29:56 UTC

Hi I'm new here and I have similar problem with AP on my PC running Gentoo 32bit. Maybe I'm not so good in searchnig but where I can find this Ursa app for AP can someone point me? Thanks
ID: 1185889 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1185894 - Posted: 17 Jan 2012, 9:13:07 UTC - in response to Message 1185889.  
Last modified: 17 Jan 2012, 9:14:06 UTC

Hi I'm new here and I have similar problem with AP on my PC running Gentoo 32bit. Maybe I'm not so good in searchnig but where I can find this Ursa app for AP can someone point me? Thanks

In this sub-forum, at the top is a number of sticky threads, one of them is the Optimised Apps Release News thread, which have links, here are the links anyway:

Lunatics downloads Where the latest apps are available

Crunchers Anonymous Where the latest apps, installers (for Windows and Mac) and SSE-bitness Packages are available

Claggy
ID: 1185894 · Report as offensive
Woofie

Send message
Joined: 11 Jan 12
Posts: 4
Credit: 68,135
RAC: 0
Czech Republic
Message 1187225 - Posted: 21 Jan 2012, 19:35:02 UTC - in response to Message 1185894.  

Thanks Claggy and sorry for that noob question I try better searching next time :)
ID: 1187225 · Report as offensive

Message boards : Number crunching : Stock AP app having an issue with a batch of work?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.