Abandoned tasks - Ongoing issue |
![]() |
| log in |
Message boards : Number crunching : Abandoned tasks - Ongoing issue
1 · 2 · 3 · 4 . . . 5 · Next
| Author | Message |
|---|---|
|
I dont know if Im the only one suffering this, but my 3 SETI hosts have been abandoning tasks constantly on the last 2 or 3 weeks... | |
| ID: 1341373 · | |
|
I forget the triggers that cause tasks to be abandoned. You can check your stdoutdae.txt or stdoutdae.old to see if the reason was stated locally in your log. | |
| ID: 1341639 · | |
|
In the logs, ussually, it doesnt appear anything weird or related to this, except the things noted in my previous post, but those might be just coincidences, as those things dont happen regularly... | |
| ID: 1341645 · | |
|
One of my hosts abandoned all its tasks on February 15 at approximately 12:06:35 local time (11:06:35 UTC). Here is the somewhat contradictory log fragment surrounding the abandonement: 15-Feb-2013 11:56:38 [SETI@home] Sending scheduler request: To fetch work. 15-Feb-2013 11:56:38 [SETI@home] Requesting new tasks for CPU and NVIDIA 15-Feb-2013 11:56:43 [SETI@home] Scheduler request completed: got 0 new tasks 15-Feb-2013 11:56:43 [SETI@home] No tasks sent 15-Feb-2013 11:56:43 [SETI@home] This computer has reached a limit on tasks in progress 15-Feb-2013 11:56:43 [SETI@home] Project has no tasks available 15-Feb-2013 12:06:52 [SETI@home] Sending scheduler request: To fetch work. 15-Feb-2013 12:06:52 [SETI@home] Requesting new tasks for CPU 15-Feb-2013 12:06:58 [SETI@home] Scheduler request completed: got 0 new tasks 15-Feb-2013 12:06:58 [SETI@home] Not sending work - last request too recent: 23 sec Does that make any sense? I can't find any event log messages indicating computer clock adjustment around that time, and even if the time had been adjusted it certainly would only have been by a few seconds, not ten minutes. And also; a too recent request should only be ignored, not punished like this :-) It kind of points to the borderline paranoid assumption that someone else has called the scheduler, reporting that all tasks for this host have been abandoned, 23 seconds prior to my completely legitimate call at 12:06:52. ____________ | |
| ID: 1341647 · | |
|
And it happened again just a couple of minutes ago in another host: 28-Feb-2013 13:04:13 [SETI@home] Sending scheduler request: To fetch work. 28-Feb-2013 13:04:13 [SETI@home] Reporting 1 completed tasks, requesting new tasks for CPU and GPU 28-Feb-2013 13:04:49 [SETI@home] Scheduler request completed: got 1 new tasks 28-Feb-2013 13:09:52 [SETI@home] Sending scheduler request: To fetch work. 28-Feb-2013 13:09:52 [SETI@home] Reporting 5 completed tasks, requesting new tasks for CPU and GPU 28-Feb-2013 13:09:56 [SETI@home] Scheduler request completed: got 0 new tasks 28-Feb-2013 13:09:56 [SETI@home] Message from server: Not sending work - last request too recent: 66 sec And the tasks were abandoned on UTC 16:09:00 as Im at -3 its around 13:09:00 which is close to the time in which the servers think that the host has made the last contact... Im not the only one, it seems like the servers are mixing different hosts RPCs or, may be, there is a bug or error in the databases or in the queries... ____________ | |
| ID: 1341656 · | |
|
The "last request too recent" messages are certainly suggestive of some other computer attempting to contact the scheduler with the same HostID number. For those off you afflicted with this problem - it doesn't seem to affect all of us - it just might be helpful to keep an eye on the IP addresses shown on the host details page for that host (available to logged-in users only): if there is a 'ghost host' contacting the scheduler, that should change, and the interloper's IP address might help the staff to track it down. | |
| ID: 1341666 · | |
|
| |
| ID: 1341678 · | |
|
In the host that has recently failed the IP address shown is the right one the host is using inside the local lan and it says it was the same the last 31565 times... | |
| ID: 1341681 · | |
Last week it failed without using a proxy, today failed while using a proxy. ____________ | |
| ID: 1341684 · | |
I've seen the same, in opposite sequence; last November it failed twice while using a proxy, on February 14th and again on the 15th it failed without a proxy. ____________ | |
| ID: 1341727 · | |
|
It happened again last night in one of the hosts... | |
| ID: 1342071 · | |
|
I think nobody knows what is going on Horatio, that's why you are getting no answer. | |
| ID: 1342082 · | |
I think nobody knows what is going on Horatio, that's why you are getting no answer. I apreciate your time to bear with me, and Im thankfull for that, but: As is stated, in the IP logs there is no other hosts contacting the servers as if they were mines. Indeed, if that were happening my hosts should get a new computer ID due to the sequencial number of the request beeing lower than expected, and this is not happening. Anyway if the issue is about someone cracking in the servers faking my ID in a clever way to not be noticed, then, first this should be addressed ASAP by the staff as this is a big security issue that is not going to be good for the credibility of the project and could be a serious risk for the accuracy of the data processed, but even if they dont care about that, I dont see how changing the ID of my hosts will help in this case, as the IDs of the hosts are public, unless, I made a completely new account with the hosts hidden from start... but Ive been crunching for SETI for more than 13 years with my current account and I dont want to lose it. If there is no solution to my issue right now, Im sure that sooner or later this is going to explode in someone's face and they will fix it... Or in the best case they will be upgrading the software for something else and that will fix this as a collateral... So if the only solutions are to stop crunching for a time or losing my account, for sure Ill choose to stop crunching. Sorry if this sounds agressive or wrong in any way, it's not the intention, it's just that Im very frustrated and due to english not beeing my main language Im not sure if Im not writting this using wrong words. ____________ | |
| ID: 1342105 · | |
|
I have a (non paranoid) conjecture about what is happening here, but its based on certain things I dont know how they work... | |
| ID: 1342166 · | |
|
Once the tasks are flagged as abandoned are they not removed from your client on that or the next update? | |
| ID: 1342174 · | |
Once the tasks are flagged as abandoned are they not removed from your client on that or the next update? Nope, they are kept in the hosts and they are crunched normally... My hosts are more or less synchronized now because I check the errors tabs on the web looking for recent "abandonements" and if there is a new one then I reset the project to resynch the tasks and stop crunching the abandoned ones. I do a reset because doing a manual selection of the ones that were abandoned would take a lot of work and time :( If I miss the event of the abandoned tasks (after all I have to sleep sometimes or work), and I dont do a reset then all the crunching is going to be wasted along with the electricity used that Ill have to pay anyway. ____________ | |
| ID: 1342181 · | |
|
One of my hosts did have an issue with several abandoned a day some time ago. However the tasks were not retained on the system. I'm not sure if that was due to the nature how those tasks were flagged as abandoned or if the newer BOINC 6.12 client takes care of cleaning out abandoned tasks. | |
| ID: 1342188 · | |
If a reset works you could automate running it on a schedule every 6 or 12 hours. The command line syntax would be: boinccmd --project http://setiathome.berkeley.edu/ reset Ive thought on something like that, but with the current issues to download work, its not as simple, if I do a reset on regular basis then Ill be effectivelly not crunching for SETI... Ive thought about making a little app polling the web page looking for abandoned task on each host so the reset can be triggered only when is really needed... But if a newer version will take care of the cleaning of abandoned tasks then it might be worth to try that first... the only drawback is that as Ive said before with the longer backoffs and retry times of the new versions added to my slow (and worse, highly unreliable) conection to internet Im sure it will be the same as stop crunching... Is anybody able to confirm me that newer versions will take care of the cleaning? EDIT: Are you sure they were cleaned? Is not possible that by the time you noticed the issue, they were all already crunched and reported?... Once the tasks is abandoned even in the case it gets reported, the web page doesnt change, neither gives any clue that the task was crunched anyway... ____________ | |
| ID: 1342192 · | |
Is anybody able to confirm me that newer versions will take care of the cleaning? I can confirm that it doesn't. I was using 7.0.x when it happened to me (four times so far) and all tasks stayed, wasting time and electricity like there is no tomorrow. In my case, it always happened during constant scheduler timeouts. And no, I totally don't believe it's a security issue (as in someone trying to fake your host ID). Definitely another bug in server software. It all started happening suddenly, exactly at the same time than scheduler troubles, months ago. ____________ | |
| ID: 1342197 · | |
|
Thanks! | |
| ID: 1342201 · | |
Message boards : Number crunching : Abandoned tasks - Ongoing issue
| Copyright © 2013 University of California |