False "abandoned" tasks


log in

Advanced search

Message boards : Number crunching : False "abandoned" tasks

Author Message
SAHJ@H
Send message
Joined: 19 Mar 03
Posts: 5
Credit: 1,370,528
RAC: 0
France
Message 1339405 - Posted: 19 Feb 2013, 16:57:38 UTC

I have an issue with one computer for which 55 tasks were wrongly reported as "abandoned" on 18 Feb 2013, 15:06:59 UTC, though the completed ones were regularly reported during the black out period. The others are still being executed. One exemple is:

http://setiathome.berkeley.edu/result.php?resultid=2836266935

Anyway to correct the situation?

JL

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1339421 - Posted: 19 Feb 2013, 20:18:21 UTC - in response to Message 1339405.
Last modified: 19 Feb 2013, 20:22:26 UTC

Not your computer's fault, it's seti@home.
Nothing you can do on your side... except abort those tasks on your computer, because even if you complete them, they'll get ignored.
This happened to me 3 times already, starting after problems with the scheduler timing out began.
Not so much set & forget anymore... :(
____________

SAHJ@H
Send message
Joined: 19 Mar 03
Posts: 5
Credit: 1,370,528
RAC: 0
France
Message 1339626 - Posted: 20 Feb 2013, 8:10:28 UTC - in response to Message 1339421.

Thanks Khangollo.

JL

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 247
Credit: 103,872,601
RAC: 29,537
India
Message 1339695 - Posted: 20 Feb 2013, 15:38:47 UTC

This occurred to one of my machines two days back but I only noticed it today. Since I access it via remote desktop, it is really a painful task to check individual tasks to see which have been abandoned but still being processed by the machine. Is resetting a good option to clear this issue? And will the tasks be re-downloaded after the reset?
______________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4067
Credit: 32,904,600
RAC: 7,843
United Kingdom
Message 1339753 - Posted: 20 Feb 2013, 18:28:55 UTC - in response to Message 1339695.

This occurred to one of my machines two days back but I only noticed it today. Since I access it via remote desktop, it is really a painful task to check individual tasks to see which have been abandoned but still being processed by the machine. Is resetting a good option to clear this issue? And will the tasks be re-downloaded after the reset?

Yes, reset the project, Only the tasks that haven't been abandoned will be resent, 20 at a time,

Claggy

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 247
Credit: 103,872,601
RAC: 29,537
India
Message 1339758 - Posted: 20 Feb 2013, 18:50:05 UTC - in response to Message 1339753.
Last modified: 20 Feb 2013, 19:05:18 UTC

Yes, reset the project, Only the tasks that haven't been abandoned will be resent, 20 at a time,

Claggy

Thanks for clarifying Claggy.

The machine is now downloading all the lost tasks except for those that are expired. Also noticed that all the MB tasks are now just labelled seti_enhanced 6.03 and there is no identification for the CUDA tasks. But the system does seem to recognize them and the WUs are being properly processed by the CPUs and GPUs.
______________

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38923
Credit: 578,830,087
RAC: 514,430
United States
Message 1339760 - Posted: 20 Feb 2013, 18:53:47 UTC

Anybody worried whether they have such 'abandoned' tasks can check by clicking on the 'tasks' link on your main account page.
Then click on the 'error' link at the top of the page, if you have any showing. It will then show only the error tasks, and the reason they were errored.
Abandoned tasks should show up there.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 71,920,965
RAC: 77,099
Argentina
Message 1339793 - Posted: 20 Feb 2013, 20:03:56 UTC

All my hosts have been "abandoning" all their tasks in the last days... one of them did it twice just today...

This is not funny... when the issue is that the project can't supply work my hosts can help another project or in the worst case just sit idle and save electricity... But with this issue Im wasting a lot of energy which goes directly to the drain as there is no usefull science neither credits for the electricity paid doing those tasks...

Any idea why is this happening? Is there something we can do on client side (other than hire somebody to babysit the hosts or opt-out of SETI)?

TBH, If it's something on server side, I think, it should be fixed ASAP... besides me wasting money, I think that thousands of abandoned tasks and a continuos download of already downloaded tasks is not good for the server's load neither for the saturated pipes...
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8465
Credit: 48,960,011
RAC: 75,230
United Kingdom
Message 1339805 - Posted: 20 Feb 2013, 20:27:47 UTC

When a task is marked 'abandoned', there's a server timestamp for the event on the website task list.

If you convert that to local time, can you match it against anything unusual in your boinc message/event log?

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 71,920,965
RAC: 77,099
Argentina
Message 1339815 - Posted: 20 Feb 2013, 20:55:58 UTC - in response to Message 1339805.
Last modified: 20 Feb 2013, 20:57:15 UTC

In the host that did it twice today, the last batch of abandoned where at 19:37:04 (UTC) Im at -3 UTC so it was at 16:37:04 my time, this time matches (more or less) a RPC and there is nothing unusual in the log... (there are a lot of "project comunication failed" entries, but those are related to the downloads and are a very common, frequent and consistent issue downloading SETI WUs...)

20/02/2013 16:33:50 | | Internet access OK - project servers may be temporarily down.
20/02/2013 16:36:08 | | Project communication failed: attempting access to reference site
20/02/2013 16:36:08 | SETI@home | Backing off 1 min 0 sec on download of 02dc12ad.12608.4157.9.10.248
20/02/2013 16:36:10 | | Internet access OK - project servers may be temporarily down.
20/02/2013 16:37:37 | SETI@home | Sending scheduler request: To fetch work.
20/02/2013 16:37:37 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and GPU
20/02/2013 16:37:49 | SETI@home | Scheduler request completed: got 1 new tasks
20/02/2013 16:37:49 | SETI@home | Message from server: Resent lost task 02dc12ad.12608.366236.9.10.164_1
20/02/2013 16:40:43 | | Project communication failed: attempting access to reference site
20/02/2013 16:40:43 | SETI@home | Backing off 1 min 0 sec on download of 29dc12ae.9337.6611.13.10.144
20/02/2013 16:40:52 | | Internet access OK - project servers may be temporarily down.


Indeed, there is something weird: it says that is sending a ghost but the previous RPCs didn't failed...
____________

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 71,920,965
RAC: 77,099
Argentina
Message 1339856 - Posted: 20 Feb 2013, 22:10:08 UTC

It happened again, in another host, for the second time today.
As in my previous post nothing unusual on the event log, but there is a RPC answer (just after the "abandoning event") saying that the last RPC was too recent: 66 Secs, when in fact the previous one was more than 5 mins before (18:27:02 and 18:32:27 makes 325 seconds) and both were normal updates automatically triggered by BOINC...

Something smells really bad... it seems like the scheduller is kind of confusing the RPC of different hosts ... (or having halucinations... they should stop smoking things in the lab ... :b )
____________

hbomber
Volunteer tester
Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 15
Bulgaria
Message 1339861 - Posted: 20 Feb 2013, 22:34:49 UTC
Last modified: 20 Feb 2013, 22:36:45 UTC

OMG, got 150+ abandoned AP tasks, all but two on my machine...This worths 1.2 GB downloads.
And now:
"21/02/2013 00:17:23 SETI@home Message from server: This computer has reached a limit on tasks in progress
"
Awesome.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8465
Credit: 48,960,011
RAC: 75,230
United Kingdom
Message 1339862 - Posted: 20 Feb 2013, 22:35:36 UTC - in response to Message 1339856.

That could conceivably happen if two different machines were reporting in and claiming to have the same HostID.

You haven't ever set up a new machine by cloning a BOINC data directory from another machine, have you?

hbomber
Volunteer tester
Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 15
Bulgaria
Message 1339863 - Posted: 20 Feb 2013, 22:39:01 UTC
Last modified: 20 Feb 2013, 22:51:34 UTC

Negative. At least not this particular BOINC installation. It resides here, on this machine, from years.
But - I did some things which may cause it, although I did same before, but this never happened.
I experimented with different AMD OpenCL SDK installations. Before this I copied BOINC directory(Data directory is inside it) on another drive. When OpenCL failed and BOINC started to show "computation error" on tasks, did so with 5-6 of them, I set network activity to none. This few seconds, perhaps BOINC did send some request, but had nothing to report, I'm certain on it, I always report before doing crazy and dangerous stuff.
Then I deleted faulted installation and put reserve copy from other drive and everything continued happily.
I don't see any way to cause situation you describe.
Btw, it happened, not about exact order, when I removed one of the 5850 cards and generally reordered video devices on motherboard slots. I doubt it has some relation, but must be mentioned.

Suddenly, few ppl got these false abandoned tasks at same time, its likely not to be local problem.

P.S. At least new tasks refill is going.
____________

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 71,920,965
RAC: 77,099
Argentina
Message 1339864 - Posted: 20 Feb 2013, 22:50:40 UTC - in response to Message 1339862.

That could conceivably happen if two different machines were reporting in and claiming to have the same HostID.

You haven't ever set up a new machine by cloning a BOINC data directory from another machine, have you?

Nope, never... my current hosts have not "suffered" any upgrade recently... last upgrade was the addition of a new GPU about 4 months ago... besides that all of them have been working as they are now for almost a year... and all my hosts were installed using the long way (installers, manual attach to projects, etc...)

By the way, I think that if two computers were really having the same ID one of them should have a wrong number in the RPC sequence which should make BOINC to assign a new ID... which is not happening in my hosts...
____________

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1339881 - Posted: 21 Feb 2013, 0:42:25 UTC - in response to Message 1339805.
Last modified: 21 Feb 2013, 0:51:24 UTC

When a task is marked 'abandoned', there's a server timestamp for the event on the website task list.

If you convert that to local time, can you match it against anything unusual in your boinc message/event log?

It happens during scheduler requests that time out (when S@H scheduler is misbehaving).
Server accepts request and sends response, but it gets lost and never gets back to the client.
Then the next request attempt has a chance of triggering this task abandonment bug.

This is at least how I observed it happened to me.
____________

Message boards : Number crunching : False "abandoned" tasks

Copyright © 2014 University of California