Author | Message |
TBar Volunteer tester
Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768
|
I've seen this with other Hosts, Now it hits Me. One host on Beta is constantly having tasks abandoned, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71714&offset=40
It would be nice if this were fixed, Error tasks for computer 6796479
Thu Jun 25 14:26:47 2015 | SETI@home | [sched_op] Starting scheduler request
Thu Jun 25 14:26:47 2015 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Jun 25 14:26:47 2015 | SETI@home | Reporting 1 completed tasks
Thu Jun 25 14:26:47 2015 | SETI@home | Requesting new tasks for AMD/ATI GPU
Thu Jun 25 14:26:47 2015 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Jun 25 14:26:47 2015 | SETI@home | [sched_op] AMD/ATI GPU work request: 659269.91 seconds; 0.00 devices
Thu Jun 25 14:32:01 2015 | SETI@home | Scheduler request failed: Timeout was reached
Thu Jun 25 14:32:01 2015 | SETI@home | [sched_op] Deferring communication for 00:01:25
Thu Jun 25 14:32:01 2015 | SETI@home | [sched_op] Reason: Scheduler request failed
Thu Jun 25 14:33:31 2015 | SETI@home | [sched_op] Starting scheduler request
Thu Jun 25 14:33:31 2015 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Jun 25 14:33:31 2015 | SETI@home | Reporting 1 completed tasks
Thu Jun 25 14:33:31 2015 | SETI@home | Requesting new tasks for AMD/ATI GPU
Thu Jun 25 14:33:31 2015 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Jun 25 14:33:31 2015 | SETI@home | [sched_op] AMD/ATI GPU work request: 660690.66 seconds; 0.00 devices
Thu Jun 25 14:33:34 2015 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Jun 25 14:33:34 2015 | SETI@home | [sched_op] Server version 707
Thu Jun 25 14:33:34 2015 | SETI@home | No tasks sent
Thu Jun 25 14:33:34 2015 | SETI@home | No tasks are available for AstroPulse v7
Thu Jun 25 14:33:34 2015 | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
Thu Jun 25 14:33:34 2015 | SETI@home | Project requested delay of 303 seconds
Thu Jun 25 14:33:34 2015 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task ap_07fe15aa_B2_P1_00161_20150624_15414.wu_1
Thu Jun 25 14:33:34 2015 | SETI@home | [sched_op] Deferring communication for 00:05:03
Thu Jun 25 14:33:34 2015 | SETI@home | [sched_op] Reason: requested by project
Thu Jun 25 14:38:19 2015 | SETI@home | Message from task: 0
Thu Jun 25 14:38:19 2015 | SETI@home | Computation for task ap_07fe15aa_B3_P0_00092_20150624_17213.wu_1 finished
Thu Jun 25 14:38:19 2015 | SETI@home | Starting task ap_06fe15aa_B6_P1_00291_20150624_15160.wu_1
Thu Jun 25 14:38:22 2015 | SETI@home | Started upload of ap_07fe15aa_B3_P0_00092_20150624_17213.wu_1_0
Thu Jun 25 14:38:25 2015 | SETI@home | Finished upload of ap_07fe15aa_B3_P0_00092_20150624_17213.wu_1_0
Thu Jun 25 14:38:40 2015 | SETI@home | [sched_op] Starting scheduler request
Thu Jun 25 14:38:40 2015 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Jun 25 14:38:40 2015 | SETI@home | Reporting 1 completed tasks
Thu Jun 25 14:38:40 2015 | SETI@home | Requesting new tasks for AMD/ATI GPU
Thu Jun 25 14:38:40 2015 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Jun 25 14:38:40 2015 | SETI@home | [sched_op] AMD/ATI GPU work request: 662485.94 seconds; 0.00 devices
Thu Jun 25 14:38:42 2015 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Jun 25 14:38:42 2015 | SETI@home | [sched_op] Server version 707
Thu Jun 25 14:38:42 2015 | SETI@home | Not sending work - last request too recent: 149 sec
Thu Jun 25 14:38:42 2015 | SETI@home | Project requested delay of 303 seconds
Thu Jun 25 14:38:42 2015 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task ap_07fe15aa_B3_P0_00092_20150624_17213.wu_1
Thu Jun 25 14:38:42 2015 | SETI@home | [sched_op] Deferring communication for 00:05:03
Thu Jun 25 14:38:42 2015 | SETI@home | [sched_op] Reason: requested by project
Thu Jun 25 14:41:21 2015 | SETI@home | Message from task: 0
Thu Jun 25 14:41:21 2015 | SETI@home | Computation for task ap_07fe15aa_B2_P0_00226_20150624_28439.wu_1 finished
Thu Jun 25 14:41:21 2015 | SETI@home | Starting task ap_07ja15aa_B1_P0_00189_20150624_29454.wu_1
Thu Jun 25 14:41:23 2015 | SETI@home | Started upload of ap_07fe15aa_B2_P0_00226_20150624_28439.wu_1_0
Thu Jun 25 14:41:25 2015 | SETI@home | Finished upload of ap_07fe15aa_B2_P0_00226_20150624_28439.wu_1_0
Thu Jun 25 14:43:47 2015 | SETI@home | [sched_op] Starting scheduler request
Thu Jun 25 14:43:47 2015 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Jun 25 14:43:47 2015 | SETI@home | Reporting 1 completed tasks
Thu Jun 25 14:43:47 2015 | SETI@home | Requesting new tasks for AMD/ATI GPU
Thu Jun 25 14:43:47 2015 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Jun 25 14:43:47 2015 | SETI@home | [sched_op] AMD/ATI GPU work request: 663512.99 seconds; 0.00 devices
Thu Jun 25 14:43:49 2015 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Jun 25 14:43:49 2015 | SETI@home | [sched_op] Server version 707
Thu Jun 25 14:43:49 2015 | SETI@home | No tasks sent
Thu Jun 25 14:43:49 2015 | SETI@home | No tasks are available for AstroPulse v7
Thu Jun 25 14:43:49 2015 | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
Thu Jun 25 14:43:49 2015 | SETI@home | Project requested delay of 303 seconds
It would also be nice if you were notified when BOINC trashes All your work so you at least have a Clue about what just happened.
Oh, you can forget about any 'flakie' internet connections on this end. This machine is connected to Verizon FIOS with newish cat5 cable with my junction box about 70 feet from a State highway. Not to mention my other 3 machines didn't have any problems around the same time.
ID: 1695660 · |
|
TBar Volunteer tester
Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768
|
This host had contact with the server mere seconds before the server decided to trash the tasks on the other machine;
25 Jun 2015, 18:36:13 UTC Abandoned
Thu 25 Jun 2015 02:35:47 PM EDT | SETI@home | Computation for task ap_04fe15ab_B3_P0_00067_20150624_07836.wu_0 finished
Thu 25 Jun 2015 02:35:47 PM EDT | SETI@home | Starting task ap_05fe15aa_B4_P0_00122_20150624_01881.wu_0 using astropulse_v7 version 701 in slot 1
Thu 25 Jun 2015 02:35:49 PM EDT | SETI@home | Started upload of ap_04fe15ab_B3_P0_00067_20150624_07836.wu_0_0
Thu 25 Jun 2015 02:35:53 PM EDT | SETI@home | Finished upload of ap_04fe15ab_B3_P0_00067_20150624_07836.wu_0_0
Thu 25 Jun 2015 02:35:54 PM EDT | SETI@home | [sched_op] Starting scheduler request
Thu 25 Jun 2015 02:35:54 PM EDT | SETI@home | Sending scheduler request: To fetch work.
Thu 25 Jun 2015 02:35:54 PM EDT | SETI@home | Reporting 1 completed tasks
Thu 25 Jun 2015 02:35:54 PM EDT | SETI@home | Requesting new tasks for CPU
Thu 25 Jun 2015 02:35:54 PM EDT | SETI@home | [sched_op] CPU work request: 1600230.31 seconds; 0.00 devices
Thu 25 Jun 2015 02:35:54 PM EDT | SETI@home | [sched_op] ATI work request: 0.00 seconds; 0.00 devices
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | Scheduler request completed: got 0 new tasks
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | [sched_op] Server version 707
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | No tasks sent
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | No tasks are available for AstroPulse v7
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | Project requested delay of 303 seconds
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task ap_04fe15ab_B3_P0_00067_20150624_07836.wu_0
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | [sched_op] Deferring communication for 00:05:03
Thu 25 Jun 2015 02:35:56 PM EDT | SETI@home | [sched_op] Reason: requested by project
Thu 25 Jun 2015 02:46:02 PM EDT | SETI@home | [sched_op] Starting scheduler request
Thu 25 Jun 2015 02:46:02 PM EDT | SETI@home | Sending scheduler request: To fetch work.
Both machines are connected to the same router.
ID: 1695687 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
The control mechanism driving task scheduling (and abort etc.), is completely driven by the task estimates, which are complete rubbish, with the addition of several failure modes possible with resend lost tasks inactive.
Q: Given the bounds are usually set to 10x estimate, would you say they might have kicked in (falsely), or that somehow your scheduler request missed a bunch of tasks ? (such as by a dicey wifi connection etc) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695698 · |
|
Zalster Volunteer tester
Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242
|
I've seen this happen when there is some disruption in the download of the data. I have about 100 MB error out the other day. I just happen to check the manager and saw the "some downloads have stalled" in the event log.
ID: 1695699 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
Note that "such as by a dicey wifi connection etc" includes anything your client never received/downloaded. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695700 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
I've seen this happen when there is some disruption in the download of the data. I have about 100 MB error out the other day. I just happen to check the manager and saw the "some downloads have stalled" in the event log.
That's what I'm thinking. That shouldn't happen [but obviously does..].
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695701 · |
|
TBar Volunteer tester
Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768
|
As noted previously, my machines are connected to FIOS with a straight connection to the main trunk on a State Highway.
This host had contact with the server while the other host was being timed out.
Can't happen that way...the hosts are connected to the same router.
http://setiathome.berkeley.edu/results.php?hostid=6797524
6/25/2015 2:23:48 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:23:48 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:23:48 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:28:21 PM | SETI@home | Computation for task ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1 finished
6/25/2015 2:28:21 PM | SETI@home | Starting task ap_06fe15aa_B4_P0_00012_20150624_06136.wu_1 using astropulse_v7 version 704 (opencl_ati_100) in slot 0
6/25/2015 2:28:23 PM | SETI@home | Started upload of ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1_0
6/25/2015 2:28:27 PM | SETI@home | Finished upload of ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1_0
6/25/2015 2:28:52 PM | SETI@home | [sched_op] Starting scheduler request
6/25/2015 2:28:52 PM | SETI@home | Sending scheduler request: To report completed tasks.
6/25/2015 2:28:52 PM | SETI@home | Reporting 1 completed tasks
6/25/2015 2:28:52 PM | SETI@home | Requesting new tasks for ATI
6/25/2015 2:28:52 PM | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
6/25/2015 2:28:52 PM | SETI@home | [sched_op] ATI work request: 89559.22 seconds; 0.00 devices
6/25/2015 2:28:54 PM | SETI@home | Scheduler request completed: got 0 new tasks
6/25/2015 2:28:54 PM | SETI@home | [sched_op] Server version 707
6/25/2015 2:28:54 PM | SETI@home | No tasks sent
6/25/2015 2:28:54 PM | SETI@home | No tasks are available for AstroPulse v7
6/25/2015 2:28:54 PM | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
6/25/2015 2:28:54 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:28:54 PM | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1
6/25/2015 2:28:54 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:28:54 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:29:09 PM | SETI@home | Computation for task ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2 finished
6/25/2015 2:29:09 PM | SETI@home | Starting task ap_04fe15ab_B6_P1_00320_20150624_24572.wu_0 using astropulse_v7 version 704 (opencl_ati_100) in slot 1
6/25/2015 2:29:11 PM | SETI@home | Started upload of ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2_0
6/25/2015 2:29:15 PM | SETI@home | Finished upload of ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2_0
6/25/2015 2:33:58 PM | SETI@home | [sched_op] Starting scheduler request
6/25/2015 2:33:58 PM | SETI@home | Sending scheduler request: To report completed tasks.
6/25/2015 2:33:58 PM | SETI@home | Reporting 1 completed tasks
6/25/2015 2:33:58 PM | SETI@home | Requesting new tasks for ATI
6/25/2015 2:33:58 PM | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
6/25/2015 2:33:58 PM | SETI@home | [sched_op] ATI work request: 103828.18 seconds; 0.00 devices
6/25/2015 2:34:00 PM | SETI@home | Scheduler request completed: got 0 new tasks
6/25/2015 2:34:00 PM | SETI@home | [sched_op] Server version 707
6/25/2015 2:34:00 PM | SETI@home | No tasks sent
6/25/2015 2:34:00 PM | SETI@home | No tasks are available for AstroPulse v7
6/25/2015 2:34:00 PM | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
6/25/2015 2:34:00 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:34:00 PM | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2
6/25/2015 2:34:00 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:34:00 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:39:05 PM | SETI@home | [sched_op] Starting scheduler request
6/25/2015 2:39:05 PM | SETI@home | Sending scheduler request: To fetch work.
6/25/2015 2:39:05 PM | SETI@home | Requesting new tasks for ATI
6/25/2015 2:39:05 PM | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
6/25/2015 2:39:05 PM | SETI@home | [sched_op] ATI work request: 114553.69 seconds; 0.00 devices
6/25/2015 2:39:07 PM | SETI@home | Scheduler request completed: got 0 new tasks
6/25/2015 2:39:07 PM | SETI@home | [sched_op] Server version 707
6/25/2015 2:39:07 PM | SETI@home | No tasks sent
6/25/2015 2:39:07 PM | SETI@home | No tasks are available for AstroPulse v7
6/25/2015 2:39:07 PM | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
6/25/2015 2:39:07 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:39:07 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:39:07 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:48:14 PM | SETI@home | [sched_op] Starting scheduler request
The host that had the Abandoned tasks was being Time Out between 2:26:47 EDT and 2:33:31 EDT
This host was talking to the server during that time. Sorry, the bad internet ploy doesn't cut it this time. I think we can rule out 'bad internet',
ID: 1695704 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
While the 'Bad internet Ploy' might seem a reasonable first tackle at the situation, It's just that, a first stop. I'd have to agree there are more vagaries going on in there.
To be clear I'm not a big fan of the Boinc codebase (client or server), by the methodologies employed, nor the attitudes of the people involved (toward users).
Ultimately I suspect you will see weird behaviour like this until the Boinc development team learn to speak to real developers in a civil manner. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695709 · |
|
TBar Volunteer tester
Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768
|
ID: 1695716 · |
|
Rasputin42 Volunteer tester
Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0
|
Maybe a fault in the router?
ID: 1695718 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
Well, there is a host at Beta suffering Abandoned tasks multiple times daily. I think it would be wise to put a watch on that machine;
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71714&offset=60
I'll get my team on this. Do you know any background or details that may help them ?
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695719 · |
|
TBar Volunteer tester
Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768
|
Well, there is a host at Beta suffering Abandoned tasks multiple times daily. I think it would be wise to put a watch on that machine;
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71714&offset=60
I'll get my team on this. Do you know any background or details that may help them ?
Go for it, also I'll mention it to Eric the next time I request one of my faulty Apps on Beta be replaced. The schedule on that, for the past few weeks, has been every Monday...
ID: 1695723 · |
|
Richard Haselgrove Volunteer tester
Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874
|
"Abandonment" is done by the routine mark_results_over() - line 197 of handle_request.cpp (server code, in \sched\)
This is called from precisely two places - lines 403 and 426 of the same file. One or other of those two must have been triggered.
You might like to read the preceding comment:
// Called when there's evidence that the host has detached.
// Mark in-progress results for the given host
// as server state OVER, outcome CLIENT_DETACHED.
// This serves two purposes:
// 1) make sure we don't resend these results to the host
// (they may be the reason the user detached)
// 2) trigger the generation of new results for these WUs
//
ID: 1695726 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
lol, thanks for the schedule. Yeah I have a personal affinity for Eric, but not in a homo way. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695727 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
I find interesting the stressor on that when it's a host has detached. The mind naturally wanders to the old spontaneous detach problem. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695728 · |
|
Richard Haselgrove Volunteer tester
Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874
|
I find interesting the stressor on that when it's a host has detached. The mind naturally wanders to the old spontaneous detach problem.
I think the crucial (but probably false) word is 'evidence'.
ID: 1695730 · |
|
Zalster Volunteer tester
Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242
|
As noted previously, my machines are connected to FIOS with a straight connection to the main trunk on a State Highway.
This host had contact with the server while the other host was being timed out.
Can't happen that way...the hosts are connected to the same router.
http://setiathome.berkeley.edu/results.php?hostid=6797524
6/25/2015 2:23:48 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:23:48 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:23:48 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:28:21 PM | SETI@home | Computation for task ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1 finished
6/25/2015 2:28:21 PM | SETI@home | Starting task ap_06fe15aa_B4_P0_00012_20150624_06136.wu_1 using astropulse_v7 version 704 (opencl_ati_100) in slot 0
6/25/2015 2:28:23 PM | SETI@home | Started upload of ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1_0
6/25/2015 2:28:27 PM | SETI@home | Finished upload of ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1_0
6/25/2015 2:28:52 PM | SETI@home | [sched_op] Starting scheduler request
6/25/2015 2:28:52 PM | SETI@home | Sending scheduler request: To report completed tasks.
6/25/2015 2:28:52 PM | SETI@home | Reporting 1 completed tasks
6/25/2015 2:28:52 PM | SETI@home | Requesting new tasks for ATI
6/25/2015 2:28:52 PM | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
6/25/2015 2:28:52 PM | SETI@home | [sched_op] ATI work request: 89559.22 seconds; 0.00 devices
6/25/2015 2:28:54 PM | SETI@home | Scheduler request completed: got 0 new tasks
6/25/2015 2:28:54 PM | SETI@home | [sched_op] Server version 707
6/25/2015 2:28:54 PM | SETI@home | No tasks sent
6/25/2015 2:28:54 PM | SETI@home | No tasks are available for AstroPulse v7
6/25/2015 2:28:54 PM | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
6/25/2015 2:28:54 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:28:54 PM | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task ap_04fe15ab_B6_P0_00266_20150624_23207.wu_1
6/25/2015 2:28:54 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:28:54 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:29:09 PM | SETI@home | Computation for task ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2 finished
6/25/2015 2:29:09 PM | SETI@home | Starting task ap_04fe15ab_B6_P1_00320_20150624_24572.wu_0 using astropulse_v7 version 704 (opencl_ati_100) in slot 1
6/25/2015 2:29:11 PM | SETI@home | Started upload of ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2_0
6/25/2015 2:29:15 PM | SETI@home | Finished upload of ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2_0
6/25/2015 2:33:58 PM | SETI@home | [sched_op] Starting scheduler request
6/25/2015 2:33:58 PM | SETI@home | Sending scheduler request: To report completed tasks.
6/25/2015 2:33:58 PM | SETI@home | Reporting 1 completed tasks
6/25/2015 2:33:58 PM | SETI@home | Requesting new tasks for ATI
6/25/2015 2:33:58 PM | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
6/25/2015 2:33:58 PM | SETI@home | [sched_op] ATI work request: 103828.18 seconds; 0.00 devices
6/25/2015 2:34:00 PM | SETI@home | Scheduler request completed: got 0 new tasks
6/25/2015 2:34:00 PM | SETI@home | [sched_op] Server version 707
6/25/2015 2:34:00 PM | SETI@home | No tasks sent
6/25/2015 2:34:00 PM | SETI@home | No tasks are available for AstroPulse v7
6/25/2015 2:34:00 PM | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
6/25/2015 2:34:00 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:34:00 PM | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task ap_07fe15aa_B1_P0_00348_20150624_24414.wu_2
6/25/2015 2:34:00 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:34:00 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:39:05 PM | SETI@home | [sched_op] Starting scheduler request
6/25/2015 2:39:05 PM | SETI@home | Sending scheduler request: To fetch work.
6/25/2015 2:39:05 PM | SETI@home | Requesting new tasks for ATI
6/25/2015 2:39:05 PM | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
6/25/2015 2:39:05 PM | SETI@home | [sched_op] ATI work request: 114553.69 seconds; 0.00 devices
6/25/2015 2:39:07 PM | SETI@home | Scheduler request completed: got 0 new tasks
6/25/2015 2:39:07 PM | SETI@home | [sched_op] Server version 707
6/25/2015 2:39:07 PM | SETI@home | No tasks sent
6/25/2015 2:39:07 PM | SETI@home | No tasks are available for AstroPulse v7
6/25/2015 2:39:07 PM | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
6/25/2015 2:39:07 PM | SETI@home | Project requested delay of 303 seconds
6/25/2015 2:39:07 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
6/25/2015 2:39:07 PM | SETI@home | [sched_op] Reason: requested by project
6/25/2015 2:48:14 PM | SETI@home | [sched_op] Starting scheduler request
The host that had the Abandoned tasks was being Time Out between 2:26:47 EDT and 2:33:31 EDT
This host was talking to the server during that time. Sorry, the bad internet ploy doesn't cut it this time. I think we can rule out 'bad internet',
TBar, I never said the problem was on your end. I could say the same about my connection, but I know it happens. So if it's not on your end and it's not on my end... Isn't there another end???? just saying.....
ID: 1695734 · |
|
jason_gee Volunteer developer Volunteer tester
Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
Isn't there another end???? just saying.....
Yeah there's lots of other ends.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1695735 · |
|
TBar Volunteer tester
Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768
|
"Abandonment" is done by the routine mark_results_over() - line 197 of handle_request.cpp (server code, in \sched\)
This is called from precisely two places - lines 403 and 426 of the same file. One or other of those two must have been triggered.
You might like to read the preceding comment:
// Called when there's evidence that the host has detached.
// Mark in-progress results for the given host
// as server state OVER, outcome CLIENT_DETACHED.
// This serves two purposes:
// 1) make sure we don't resend these results to the host
// (they may be the reason the user detached)
// 2) trigger the generation of new results for these WUs
//
As I recall, you were the one investigating these Abandonments. I think you were having trouble because it was an intermittent problem. Well the problem isn't intermittent for the host at Beta, he got whacked about an hour before I did. He will probably be whacked again before too long. As for me, this was the first time I suffered this problem, although I fear it will not be the last. The problem does seem to be getting worse, and I don't think it has anything to do with 'bad internet'. What could be the problem is a mystery, I just know what it isn't.
ID: 1695736 · |
|
Richard Haselgrove Volunteer tester
Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874
|
"Abandonment" is done by the routine mark_results_over() - line 197 of handle_request.cpp (server code, in \sched\)
This is called from precisely two places - lines 403 and 426 of the same file. One or other of those two must have been triggered.
You might like to read the preceding comment:
// Called when there's evidence that the host has detached.
// Mark in-progress results for the given host
// as server state OVER, outcome CLIENT_DETACHED.
// This serves two purposes:
// 1) make sure we don't resend these results to the host
// (they may be the reason the user detached)
// 2) trigger the generation of new results for these WUs
//
As I recall, you were the one investigating these Abandonments. I think you were having trouble because it was an intermittent problem. Well the problem isn't intermittent for the host at Beta, he got whacked about an hour before I did. He will probably be whacked again before too long. As for me, this was the first time I suffered this problem, although I fear it will not be the last. The problem does seem to be getting worse, and I don't think it has anything to do with 'bad internet'. What could be the problem is a mystery, I just know what it isn't.
Agreed. IIRC, one of the cases which triggers mark_results_over() is processing requests with non-monotonic RPCseqnos - in other words, processing scheduler requests in the wrong order. If that happens once in a blue moon, it's Eddie in the space-time continuum. If it happens consistently, then there's a cause, and it will show up in the logs - the server logs, that is. One (highly speculative) suggestion as to a possible cause is if a user has a working computer: buys a second similar machine: and shortcuts the installation/setup procedure by copying the existing BOINC data directory to the second machine. That creates two hosts with the same HostID, and of course the RPC sequence number gets de-synchronised. A scan of the server logs would reveal that by the alternating IP addresses.
Other theories are available...
We do have one member of the 'team' who has access to server logs, and I caught his attention with a possibly related case a couple of weeks ago: Immediate timeout? Missing deadline?. But so far, as you can see, no diagnosis or resolution.
ID: 1695741 · |
|