Stock AP app having an issue with a batch of work?


log in

Advanced search

Message boards : Number crunching : Stock AP app having an issue with a batch of work?

Author Message
Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4167
Credit: 113,995,305
RAC: 142,962
United States
Message 1165017 - Posted: 24 Oct 2011, 13:23:33 UTC

I noticed an invalid task pop up today, 837596801. Which is not unexpected from time to time, but when I looked at the task I thought it looked funny.

Task Computer Sent Time reported or deadline Status Run time (sec) CPU time (sec) Credit Application 2113573897 4725574 14 Oct 2011 | 10:42:54 UTC 14 Oct 2011 | 15:45:45 UTC Error while computing 0.00 0.00 --- Astropulse v505 v5.06 2113573898 6191261 14 Oct 2011 | 10:42:55 UTC 14 Oct 2011 | 10:48:11 UTC Error while computing 1.07 0.00 --- Astropulse v505 v5.06 2114177255 6186874 14 Oct 2011 | 21:22:36 UTC 14 Oct 2011 | 21:27:47 UTC Error while downloading 0.00 0.00 --- Astropulse v505 v5.05 2114467297 6067658 15 Oct 2011 | 2:41:46 UTC 15 Oct 2011 | 2:46:52 UTC Error while computing 0.00 0.00 --- Astropulse v505 v5.06 2114779667 6185696 15 Oct 2011 | 9:02:55 UTC 15 Oct 2011 | 9:09:03 UTC Error while computing 0.00 0.00 --- Astropulse v505 v5.05 2115128329 5012752 15 Oct 2011 | 14:00:37 UTC 24 Oct 2011 | 5:57:12 UTC Completed, can't validate 63,245.10 63,240.96 0.00 Astropulse v505 Anonymous platform (CPU) 2115428038 6180271 15 Oct 2011 | 20:23:45 UTC 16 Oct 2011 | 14:55:09 UTC Error while computing 1.03 0.00 --- Astropulse v505 v5.06


All of the stock app machines had an error with the task. So I looked at the 5 'Error while computing' tasks and found them all to have:

<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
</stderr_txt>
]]>


So I looked at the tasks for each of the machines at it seems that all of the AP tasks on those machines are ending with 'Error while computing'.

I then looked through my valid tasks and have found several where there is a 3rd result with the 'Error while computing' status. Some of them were "process got signal 8" instead of 11. Admittedly I don't have a clue what all the various application exit codes mean. These might be as relevant as the -9 overflow message, but I thought it seemed odd.

Perhaps this is just an example of the lunatics code handling things better than the stock app? Maybe there is just some wonky data out there? Who knows.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3238
Credit: 31,757,650
RAC: 4,378
Netherlands
Message 1165030 - Posted: 24 Oct 2011, 14:39:29 UTC - in response to Message 1165017.
Last modified: 24 Oct 2011, 15:09:54 UTC

Haven't seen any AstroPulses, lately Running 1 on 0.5 GPU or 1, ended up in a -177
error.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8491
Credit: 49,753,367
RAC: 55,270
United Kingdom
Message 1165031 - Posted: 24 Oct 2011, 14:44:37 UTC - in response to Message 1165017.

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4167
Credit: 113,995,305
RAC: 142,962
United States
Message 1165039 - Posted: 24 Oct 2011, 15:49:57 UTC - in response to Message 1165031.

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?


At first I saw Linux box, Linux box, Linux box... and started to think "Linux machines are broken". Then I saw the windows machine and thought "ah OK it is more than just Linux".

The windows box looks like it just broken as all of the tasks, MB & AP, are spitting out "too many exit(0)s". It was probably just a fluke that it happen to be in there.

I have often thought that some kind of platform mechanism should be used on the back end. So if a tasks gets errors on a specific platform stop sending it to that one. That might be in place to some extent or it could be a total mess to do.

Just the little things like this that point out the holes in the current system that could be worked on. Where this potentially valid data is probably going to the bit bucket.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3238
Credit: 31,757,650
RAC: 4,378
Netherlands
Message 1165070 - Posted: 24 Oct 2011, 17:15:44 UTC - in response to Message 1165031.

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?


Why were so many channels 'bad', channels ended in error: 0 MB 19 AstroPulse.
As of 24 Oct 2011 | 17:00:08 UTC, according to the SERVER Page?

Too much RFI, well I'm guesssing, RFI & RADAR Blanking, is present almost all the time,
maybe just a bad series of channels?

____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,996,133
RAC: 5,658
United Kingdom
Message 1165099 - Posted: 24 Oct 2011, 19:38:50 UTC - in response to Message 1165031.

"process got signal ..." is a characteristic error code for Linux machines only.

If you look that the right-hand column of your screen-grab, four of the computing errors were with application v5.06 - the stock Linux app (the v5.05 error - on Windows Vista - was "too many exit(0)s", probably unrelated).

I suppose the two questions are:

1) Why is stock app v5.06 still in use, when it has been so problematic for so long? (I seem to remember Urs Echternacht and others having trouble with it in early AP beta testing)
2) Why did this WU end up being allocated to so many Linux hosts?

I thought Urs Echternacht had a fix for the signal 11 problem, and it was in the repository, but a new app was never built, couldn't find a post the last time i looked.

Claggy

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3636
Credit: 48,593,411
RAC: 7,957
United States
Message 1165148 - Posted: 24 Oct 2011, 23:57:07 UTC

I know Urs has built a AP app that does run just fine on Linux, but it is only available as a optimized app.
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6918
Credit: 94,266,069
RAC: 75,384
Australia
Message 1165183 - Posted: 25 Oct 2011, 2:31:12 UTC - in response to Message 1165148.

I've also noticed that plenty of Linux hosts have been erroring out AP work.

Cheers.
____________

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 842
Credit: 44,165,502
RAC: 74,636
Denmark
Message 1165327 - Posted: 25 Oct 2011, 18:59:23 UTC - in response to Message 1165183.

I've also noticed that plenty of Linux hosts have been erroring out AP work.

Cheers.

Same, nearly all AP errors I've seen from my wingmen runs linux.

Profile David Anderson (not *that* DA)Project donor
Avatar
Send message
Joined: 5 Dec 09
Posts: 108
Credit: 22,954,793
RAC: 5,829
United States
Message 1170112 - Posted: 11 Nov 2011, 15:52:03 UTC
Last modified: 11 Nov 2011, 15:57:58 UTC

Each time I upgrade to latest Ubuntu Linux (at 11.10 now,
every 6 months there is a new release) I turn on AP,
get 4 or so AP failures with signal 11
(with no AP successes), and turn it off again.
Running stock apps, not optimized apps. No app_info.xml file.
MultiBeam works fine -- I get an error or two (signal 11 or
whatever) four or five times a year (between 2 machines).
One machine x86, the other machine x86_64.

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1170127 - Posted: 11 Nov 2011, 16:18:48 UTC - in response to Message 1165183.
Last modified: 11 Nov 2011, 16:21:56 UTC

I've also noticed that plenty of Linux hosts have been erroring out AP work.

I'm noticing this a lot and noticed it on my machines, too, when I was still running stock.
Stock 64 bit AP application fails on newer Linux distributions (glibc incompatibility?) and admins don't care enough to *finally* remove it. 32 bit app. works just fine on 64 bit distros.

I know Urs has built a AP app that does run just fine on Linux, but it is only available as a optimized app

And it works awesomely and much faster. There is no reason to run stock :)
____________

Woofie
Send message
Joined: 11 Jan 12
Posts: 4
Credit: 7,159
RAC: 0
Czech Republic
Message 1185889 - Posted: 17 Jan 2012, 8:29:56 UTC

Hi I'm new here and I have similar problem with AP on my PC running Gentoo 32bit. Maybe I'm not so good in searchnig but where I can find this Ursa app for AP can someone point me? Thanks

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,996,133
RAC: 5,658
United Kingdom
Message 1185894 - Posted: 17 Jan 2012, 9:13:07 UTC - in response to Message 1185889.
Last modified: 17 Jan 2012, 9:14:06 UTC

Hi I'm new here and I have similar problem with AP on my PC running Gentoo 32bit. Maybe I'm not so good in searchnig but where I can find this Ursa app for AP can someone point me? Thanks

In this sub-forum, at the top is a number of sticky threads, one of them is the Optimised Apps Release News thread, which have links, here are the links anyway:

Lunatics downloads Where the latest apps are available

Crunchers Anonymous Where the latest apps, installers (for Windows and Mac) and SSE-bitness Packages are available

Claggy

Woofie
Send message
Joined: 11 Jan 12
Posts: 4
Credit: 7,159
RAC: 0
Czech Republic
Message 1187225 - Posted: 21 Jan 2012, 19:35:02 UTC - in response to Message 1185894.

Thanks Claggy and sorry for that noob question I try better searching next time :)

Message boards : Number crunching : Stock AP app having an issue with a batch of work?

Copyright © 2014 University of California