Message boards :
Number crunching :
False "abandoned" tasks
Message board moderation
Author | Message |
---|---|
SAHJ@H Send message Joined: 19 Mar 03 Posts: 5 Credit: 4,739,753 RAC: 13 |
I have an issue with one computer for which 55 tasks were wrongly reported as "abandoned" on 18 Feb 2013, 15:06:59 UTC, though the completed ones were regularly reported during the black out period. The others are still being executed. One exemple is: http://setiathome.berkeley.edu/result.php?resultid=2836266935 Anyway to correct the situation? JL |
Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0 |
Not your computer's fault, it's seti@home. Nothing you can do on your side... except abort those tasks on your computer, because even if you complete them, they'll get ignored. This happened to me 3 times already, starting after problems with the scheduler timing out began. Not so much set & forget anymore... :( |
SAHJ@H Send message Joined: 19 Mar 03 Posts: 5 Credit: 4,739,753 RAC: 13 |
Thanks Khangollo. JL |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
This occurred to one of my machines two days back but I only noticed it today. Since I access it via remote desktop, it is really a painful task to check individual tasks to see which have been abandoned but still being processed by the machine. Is resetting a good option to clear this issue? And will the tasks be re-downloaded after the reset? ______________ |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
This occurred to one of my machines two days back but I only noticed it today. Since I access it via remote desktop, it is really a painful task to check individual tasks to see which have been abandoned but still being processed by the machine. Is resetting a good option to clear this issue? And will the tasks be re-downloaded after the reset? Yes, reset the project, Only the tasks that haven't been abandoned will be resent, 20 at a time, Claggy |
Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0 |
Yes, reset the project, Only the tasks that haven't been abandoned will be resent, 20 at a time, Thanks for clarifying Claggy. The machine is now downloading all the lost tasks except for those that are expired. Also noticed that all the MB tasks are now just labelled seti_enhanced 6.03 and there is no identification for the CUDA tasks. But the system does seem to recognize them and the WUs are being properly processed by the CPUs and GPUs. ______________ |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Anybody worried whether they have such 'abandoned' tasks can check by clicking on the 'tasks' link on your main account page. Then click on the 'error' link at the top of the page, if you have any showing. It will then show only the error tasks, and the reason they were errored. Abandoned tasks should show up there. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
All my hosts have been "abandoning" all their tasks in the last days... one of them did it twice just today... This is not funny... when the issue is that the project can't supply work my hosts can help another project or in the worst case just sit idle and save electricity... But with this issue Im wasting a lot of energy which goes directly to the drain as there is no usefull science neither credits for the electricity paid doing those tasks... Any idea why is this happening? Is there something we can do on client side (other than hire somebody to babysit the hosts or opt-out of SETI)? TBH, If it's something on server side, I think, it should be fixed ASAP... besides me wasting money, I think that thousands of abandoned tasks and a continuos download of already downloaded tasks is not good for the server's load neither for the saturated pipes... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
When a task is marked 'abandoned', there's a server timestamp for the event on the website task list. If you convert that to local time, can you match it against anything unusual in your boinc message/event log? |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
In the host that did it twice today, the last batch of abandoned where at 19:37:04 (UTC) Im at -3 UTC so it was at 16:37:04 my time, this time matches (more or less) a RPC and there is nothing unusual in the log... (there are a lot of "project comunication failed" entries, but those are related to the downloads and are a very common, frequent and consistent issue downloading SETI WUs...) 20/02/2013 16:33:50 | | Internet access OK - project servers may be temporarily down. 20/02/2013 16:36:08 | | Project communication failed: attempting access to reference site 20/02/2013 16:36:08 | SETI@home | Backing off 1 min 0 sec on download of 02dc12ad.12608.4157.9.10.248 20/02/2013 16:36:10 | | Internet access OK - project servers may be temporarily down. 20/02/2013 16:37:37 | SETI@home | Sending scheduler request: To fetch work. 20/02/2013 16:37:37 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and GPU 20/02/2013 16:37:49 | SETI@home | Scheduler request completed: got 1 new tasks 20/02/2013 16:37:49 | SETI@home | Message from server: Resent lost task 02dc12ad.12608.366236.9.10.164_1 20/02/2013 16:40:43 | | Project communication failed: attempting access to reference site 20/02/2013 16:40:43 | SETI@home | Backing off 1 min 0 sec on download of 29dc12ae.9337.6611.13.10.144 20/02/2013 16:40:52 | | Internet access OK - project servers may be temporarily down. Indeed, there is something weird: it says that is sending a ghost but the previous RPCs didn't failed... |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
It happened again, in another host, for the second time today. As in my previous post nothing unusual on the event log, but there is a RPC answer (just after the "abandoning event") saying that the last RPC was too recent: 66 Secs, when in fact the previous one was more than 5 mins before (18:27:02 and 18:32:27 makes 325 seconds) and both were normal updates automatically triggered by BOINC... Something smells really bad... it seems like the scheduller is kind of confusing the RPC of different hosts ... (or having halucinations... they should stop smoking things in the lab ... :b ) |
hbomber Send message Joined: 2 May 01 Posts: 437 Credit: 50,852,854 RAC: 0 |
OMG, got 150+ abandoned AP tasks, all but two on my machine...This worths 1.2 GB downloads. And now: "21/02/2013 00:17:23 SETI@home Message from server: This computer has reached a limit on tasks in progress " Awesome. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
That could conceivably happen if two different machines were reporting in and claiming to have the same HostID. You haven't ever set up a new machine by cloning a BOINC data directory from another machine, have you? |
hbomber Send message Joined: 2 May 01 Posts: 437 Credit: 50,852,854 RAC: 0 |
Negative. At least not this particular BOINC installation. It resides here, on this machine, from years. But - I did some things which may cause it, although I did same before, but this never happened. I experimented with different AMD OpenCL SDK installations. Before this I copied BOINC directory(Data directory is inside it) on another drive. When OpenCL failed and BOINC started to show "computation error" on tasks, did so with 5-6 of them, I set network activity to none. This few seconds, perhaps BOINC did send some request, but had nothing to report, I'm certain on it, I always report before doing crazy and dangerous stuff. Then I deleted faulted installation and put reserve copy from other drive and everything continued happily. I don't see any way to cause situation you describe. Btw, it happened, not about exact order, when I removed one of the 5850 cards and generally reordered video devices on motherboard slots. I doubt it has some relation, but must be mentioned. Suddenly, few ppl got these false abandoned tasks at same time, its likely not to be local problem. P.S. At least new tasks refill is going. |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
That could conceivably happen if two different machines were reporting in and claiming to have the same HostID. Nope, never... my current hosts have not "suffered" any upgrade recently... last upgrade was the addition of a new GPU about 4 months ago... besides that all of them have been working as they are now for almost a year... and all my hosts were installed using the long way (installers, manual attach to projects, etc...) By the way, I think that if two computers were really having the same ID one of them should have a wrong number in the RPC sequence which should make BOINC to assign a new ID... which is not happening in my hosts... |
Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0 |
When a task is marked 'abandoned', there's a server timestamp for the event on the website task list. It happens during scheduler requests that time out (when S@H scheduler is misbehaving). Server accepts request and sends response, but it gets lost and never gets back to the client. Then the next request attempt has a chance of triggering this task abandonment bug. This is at least how I observed it happened to me. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.