Message boards :
Number crunching :
Suddenly BOINC Decides to Abandon 71 APs...WTH?
Message board moderation
Previous · 1 . . . 12 · 13 · 14 · 15
Author | Message |
---|---|
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Just had 273 APs abandoned due to to short a deadline. Shortest time to report was 12 minutes. Machine number 7248689. The column that you are looking at and calling the deadline time is dual use, at the top of the column it says: Time reported or deadline, Where before the task is reported it'll show the deadline time, after it is reported it shows the time it is reported. Claggy |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
my client trigger just leaves a reportable task off the server request every so often (reporting it in a later request). Haven't tried it lately, because I haven't had ghost tasks, so your guess is as good as mine as to whether that logic is intact. The double reporting of a task does still trigger a resend lost tasks event: Here i reported a task and got a task (without resend lost tasks occuring): 02-Jul-2015 08:33:42 [SETI@home] update requested by user 02-Jul-2015 08:33:42 [SETI@home] Sending scheduler request: Requested by user. 02-Jul-2015 08:33:42 [SETI@home] Reporting 1 completed tasks 02-Jul-2015 08:33:42 [SETI@home] Requesting new tasks for CPU 02-Jul-2015 08:33:43 [SETI@home] Scheduler request completed: got 1 new tasks 02-Jul-2015 08:35:16 [SETI@home] Started download of 31dc12ad.30467.464287.438086664204.12.215.vlar 02-Jul-2015 08:35:19 [SETI@home] Finished download of 31dc12ad.30467.464287.438086664204.12.215.vlar After restoring the reported task back into the CS, and reporting it again (with suitably increased cache values), my four ghosts are resent: Thu Jul 2 08:44:52 2015 | SETI@home | update requested by user Thu Jul 2 08:44:55 2015 | SETI@home | sched RPC pending: Requested by user Thu Jul 2 08:44:55 2015 | SETI@home | [sched_op] Starting scheduler request Thu Jul 2 08:44:55 2015 | SETI@home | Sending scheduler request: Requested by user. Thu Jul 2 08:44:55 2015 | SETI@home | Reporting 1 completed tasks Thu Jul 2 08:44:55 2015 | SETI@home | Requesting new tasks for CPU Thu Jul 2 08:44:55 2015 | SETI@home | [sched_op] CPU work request: 997588.31 seconds; 0.00 devices Thu Jul 2 08:44:57 2015 | SETI@home | Scheduler request completed: got 4 new tasks Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] Server version 707 Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 13se12af.32371.215837.438086664207.12.127.vlar_1 Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 19se12af.753.16836.438086664199.12.153_1 Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 19se12af.753.16836.438086664199.12.215_1 Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 20au12ag.15429.18063.438086664201.12.128_0 Thu Jul 2 08:44:57 2015 | SETI@home | Project requested delay of 303 seconds Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] estimated total CPU task duration: 883795 seconds Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 11se12ab.7663.16427.438086664201.12.135_0 Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] Deferring communication for 00:05:03 Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] Reason: requested by project Thu Jul 2 08:45:25 2015 | SETI@home | Started download of 13se12af.32371.215837.438086664207.12.127.vlar Thu Jul 2 08:45:25 2015 | SETI@home | Started download of 19se12af.753.16836.438086664199.12.153 Thu Jul 2 08:45:28 2015 | SETI@home | Finished download of 13se12af.32371.215837.438086664207.12.127.vlar Thu Jul 2 08:45:28 2015 | SETI@home | Finished download of 19se12af.753.16836.438086664199.12.153 Thu Jul 2 08:45:28 2015 | SETI@home | Started download of 19se12af.753.16836.438086664199.12.215 Thu Jul 2 08:45:28 2015 | SETI@home | Started download of 20au12ag.15429.18063.438086664201.12.128 Thu Jul 2 08:45:36 2015 | SETI@home | Finished download of 19se12af.753.16836.438086664199.12.215 Thu Jul 2 08:45:36 2015 | SETI@home | Finished download of 20au12ag.15429.18063.438086664201.12.128 http://setiathome.berkeley.edu/results.php?hostid=7506529 Claggy |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
hoohoo great, feature stands "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
So, after a nice email we've had the fix applied. I had a quick test and it doesn't look like it has been deployed yet though. A person who won't read has no advantage over one who can't read. (Mark Twain) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
That's good news. It still doesn't look as if it's been deployed yet though. This host has been suffering multiple Abandon Events a day, he just received another, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71714 So, I'll keep watching to see if it helps that host. Thanks for getting something on the table, hopefully it will work as expected. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
I just had 173 abandonded tasks on my vista # 5065145 machine. Stdder just states outcome- abandoned. Client state new. What ever that means. Ive had it happen before. But it still sucks. [/quote] Old James |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I just had 173 abandonded tasks on my vista # 5065145 machine. Were those tasks still being crunched on the host afterwards? What does the Event Log say for that time period? Claggy |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
I just had 173 abandonded tasks on my vista # 5065145 machine. Like an idiot I rebooted the computer before I saw your message. So event log just shows the last half hour. Sorry. [/quote] Old James |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I just had 173 abandonded tasks on my vista # 5065145 machine. You'll find the Event Log information stored in the stdoutdae.txt and stdoutdae.old files in the Boinc Data directory. Claggy |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It appears the problem with Abandoned tasks may be over. This host has gone almost 2 days without an Abandoned task, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71714 Unfortunately he just suffered a horde of 5 minute Time-outs, and they don't appear to be VLARs. One down one to go? Well, at least his machine won't waste time working those Timed-out tasks... ;-) |
woohoo Send message Joined: 30 Oct 13 Posts: 972 Credit: 165,671,404 RAC: 5 |
i had that problem a few weeks ago but i haven't had it since |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
It appears the problem with Abandoned tasks may be over. This host has gone almost 2 days without an Abandoned task, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71714 That's certainly very encouraging, if not entirely definitive. Perhaps the fix got deployed during this week's outage. I guess if we hear no more anguished cries of abandonment, it'll be good news! |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
It appears the problem with Abandoned tasks may be over. This host has gone almost 2 days without an Abandoned task, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71714 There was supposed to be a note in Eric's lunchbox on Monday, so a "final abandonment" time of 13 Jul 2015, 22:42:03 UTC (15:43 PDT) sounds about right for a late lunch. If anyone spots an abandonment significantly later than that (say 15 July onwards), could they draw it to our attention, please? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
At the risk of invoking Murphy, or gremlins lurking in that spaghetti-code, I'd say the same mode of failure can't happen now. There's a fair amount of DoS vulnerability in that authentication routine, and still chances of spontaneous rashes of unwanted new hostids and similar, but I suspect that particular piece of duct tape will hold. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.