Panic Mode On (108) Server Problems?

Message boards : Number crunching : Panic Mode On (108) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 29 · Next

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34394
Credit: 79,922,639
RAC: 80
Germany
Message 1900516 - Posted: 11 Nov 2017, 13:31:42 UTC

Uploads are working fine here, just download once in a while.

11.11.2017 13:18:45 | SETI@home | [sched_op] CPU work request: 3878897.73 seconds; 0.00 devices
11.11.2017 13:18:45 | SETI@home | [sched_op] AMD/ATI GPU work request: 651031.34 seconds; 0.00 devices
11.11.2017 13:18:47 | SETI@home | Scheduler request completed: got 7 new tasks
11.11.2017 13:18:47 | SETI@home | [sched_op] Server version 707
11.11.2017 13:18:47 | SETI@home | Project requested delay of 303 seconds
11.11.2017 13:18:47 | SETI@home | [sched_op] estimated total CPU task duration: 18009 seconds
11.11.2017 13:18:47 | SETI@home | [sched_op] estimated total AMD/ATI GPU task duration: 1309 seconds
11.11.2017 13:18:47 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 14fe07aa.4256.10706.5.32.128_0
11.11.2017 13:18:47 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 14fe07aa.24680.890.9.36.104_0
11.11.2017 13:18:47 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 11mr07ac.30053.72.6.33.107_0
11.11.2017 13:18:47 | SETI@home | [sched_op] Deferring communication for 00:05:03
11.11.2017 13:18:47 | SETI@home | [sched_op] Reason: requested by project
11.11.2017 13:18:49 | SETI@home | Started download of 14fe07aa.16643.11115.12.39.126
11.11.2017 13:18:49 | SETI@home | Started download of 11ja07ab.2450.8252.3.30.249
11.11.2017 13:18:53 | SETI@home | Finished download of 14fe07aa.16643.11115.12.39.126
11.11.2017 13:18:53 | SETI@home | Finished download of 11ja07ab.2450.8252.3.30.249
11.11.2017 13:18:53 | SETI@home | Started download of 04ja07ab.16675.10706.9.36.114.vlar
11.11.2017 13:18:53 | SETI@home | Started download of 04ja07ab.16675.10706.9.36.107.vlar
11.11.2017 13:18:57 | SETI@home | Finished download of 04ja07ab.16675.10706.9.36.114.vlar
11.11.2017 13:18:57 | SETI@home | Finished download of 04ja07ab.16675.10706.9.36.107.vlar
11.11.2017 13:18:57 | SETI@home | Started download of 04ja07ab.16675.10706.9.36.104.vlar
11.11.2017 13:18:57 | SETI@home | Started download of 04ja07ab.16675.10706.9.36.103.vlar
11.11.2017 13:19:00 | SETI@home | Finished download of 04ja07ab.16675.10706.9.36.103.vlar
11.11.2017 13:19:00 | SETI@home | Started download of 04ja07ab.16675.10706.9.36.109.vlar
11.11.2017 13:19:02 | SETI@home | Finished download of 04ja07ab.16675.10706.9.36.104.vlar
11.11.2017 13:19:03 | SETI@home | Finished download of 04ja07ab.16675.10706.9.36.109.vlar
11.11.2017 13:20:27 | SETI@home | Computation for task 08ja07ad.4244.10706.10.37.135_0 finished
11.11.2017 13:20:27 | SETI@home | Starting task 08ja07ad.4244.10706.10.37.26_1
11.11.2017 13:20:28 | SETI@home | Started upload of 08ja07ad.4244.10706.10.37.135_0_r1032251672_0
11.11.2017 13:20:32 | SETI@home | Finished upload of 08ja07ad.4244.10706.10.37.135_0_r1032251672_0
11.11.2017 13:23:53 | SETI@home | [sched_op] Starting scheduler request
11.11.2017 13:23:53 | SETI@home | Sending scheduler request: To fetch work.
11.11.2017 13:23:53 | SETI@home | Reporting 1 completed tasks
11.11.2017 13:23:53 | SETI@home | Requesting new tasks for CPU and AMD/ATI GPU
11.11.2017 13:23:53 | SETI@home | [sched_op] CPU work request: 3862504.63 seconds; 0.00 devices
11.11.2017 13:23:53 | SETI@home | [sched_op] AMD/ATI GPU work request: 649975.19 seconds; 0.00 devices
11.11.2017 13:23:55 | SETI@home | Scheduler request completed: got 0 new tasks
11.11.2017 13:23:55 | SETI@home | [sched_op] Server version 707
11.11.2017 13:23:55 | SETI@home | Project has no tasks available
11.11.2017 13:23:55 | SETI@home | Project requested delay of 303 seconds
11.11.2017 13:23:55 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 08ja07ad.4244.10706.10.37.135_0
11.11.2017 13:23:55 | SETI@home | [sched_op] Deferring communication for 00:05:03
11.11.2017 13:23:55 | SETI@home | [sched_op] Reason: requested by project
11.11.2017 13:25:12 | SETI@home | Computation for task 08ja07ad.4244.10706.10.37.26_1 finished
11.11.2017 13:25:12 | SETI@home | Starting task 08ja07ad.4244.10706.10.37.22_1
11.11.2017 13:25:14 | SETI@home | Started upload of 08ja07ad.4244.10706.10.37.26_1_r1682938500_0
11.11.2017 13:25:17 | SETI@home | Finished upload of 08ja07ad.4244.10706.10.37.26_1_r1682938500_0
11.11.2017 13:25:26 | SETI@home | Computation for task 08ja07ad.4244.10706.10.37.22_1 finished
11.11.2017 13:25:26 | SETI@home | Starting task 14fe07aa.4256.10706.5.32.119_1
11.11.2017 13:25:29 | SETI@home | Started upload of 08ja07ad.4244.10706.10.37.22_1_r1149134663_0
11.11.2017 13:25:32 | SETI@home | Finished upload of 08ja07ad.4244.10706.10.37.22_1_r1149134663_0
11.11.2017 13:29:01 | SETI@home | [sched_op] Starting scheduler request
11.11.2017 13:29:01 | SETI@home | Sending scheduler request: To fetch work.
11.11.2017 13:29:01 | SETI@home | Reporting 2 completed tasks
11.11.2017 13:29:01 | SETI@home | Requesting new tasks for CPU and AMD/ATI GPU
11.11.2017 13:29:01 | SETI@home | [sched_op] CPU work request: 3864762.18 seconds; 0.00 devices
11.11.2017 13:29:01 | SETI@home | [sched_op] AMD/ATI GPU work request: 650496.87 seconds; 0.00 devices
11.11.2017 13:29:03 | SETI@home | Scheduler request completed: got 0 new tasks
11.11.2017 13:29:03 | SETI@home | [sched_op] Server version 707
11.11.2017 13:29:03 | SETI@home | Project has no tasks available
11.11.2017 13:29:03 | SETI@home | Project requested delay of 303 seconds
11.11.2017 13:29:03 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 08ja07ad.4244.10706.10.37.26_1
11.11.2017 13:29:03 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 08ja07ad.4244.10706.10.37.22_1
11.11.2017 13:29:03 | SETI@home | [sched_op] Deferring communication for 00:05:03
11.11.2017 13:29:03 | SETI@home | [sched_op] Reason: requested by project


With each crime and every kindness we birth our future.
ID: 1900516 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51484
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1900518 - Posted: 11 Nov 2017, 13:40:07 UTC

Something just broke loose.
Got a boatload of work on all crunchers in the last 20 minutes or so.

Meow!!!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1900518 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1900532 - Posted: 11 Nov 2017, 15:54:35 UTC

My Windows machines seem to have received work during the night and are close to full. But the Linux machine is still way down in work with about a quarter of what it is supposed to have. I think that part of the issue is that at each no tasks are available message, it just keeps incrementing the Nvidia gpu backoff interval.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1900532 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1900537 - Posted: 11 Nov 2017, 16:33:26 UTC

Well I got this

13755 SETI@home 11/11/2017 3:52:48 PM Scheduler request completed: got 71 new tasks

On my main machine and caches are now full

2nd machine has been downloading in 1's amd 2's and also now has a full cache.
ID: 1900537 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14680
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1900538 - Posted: 11 Nov 2017, 16:34:17 UTC - in response to Message 1900532.  

My Windows machines seem to have received work during the night and are close to full. But the Linux machine is still way down in work with about a quarter of what it is supposed to have. I think that part of the issue is that at each no tasks are available message, it just keeps incrementing the Nvidia gpu backoff interval.
Interesting point. But I think we worked very hard on tweaking those backoffs, and as far as I know, the current one is still the compromise we reached at v6.11.8:

client: fix bug that cause wasted scheduler RPC.

Old: when a job finished, we cleared the backoffs for the resources it used. The idea was to get more jobs immediately in the case where the client was at a jobs-in-progress limit.
Problem: this resulted in an RPC immediately, typically before the output files were uploaded. So the client is still at the limit, and doesn't get jobs.

New: clear the backoffs at the point when output files have been uploaded and the job is ready to report.
So, iff you have any NVidia tasks left, every time one of them completes (successfully), you should upload the result file and then do a scheduler contact. And every scheduler contact should combine reporting results (if any) with requesting new work (if needed).

That's the way I've always observed my Windows machines to work, and from the sound of it your Windows machines do the same. So, why should your Linux machine behave differently? It describes itself as v7.8.3, the same as the Windows machines, so it should be the same codebase and the same behaviour.

Could there be anything odd about the version numbering for your Linux build, or are we looking for a misplaced 'Windows only' wrapper round that 2010 tweak?
ID: 1900538 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1900541 - Posted: 11 Nov 2017, 16:48:50 UTC - in response to Message 1900538.  

So Richard, is there anything I can set in cc_config or logging options that can pinpoint why I keep getting larger and larger backoff intervals? What about the report_tasks_immediately flag in cc_config? Would that prevent the backoff?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1900541 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1900543 - Posted: 11 Nov 2017, 17:06:29 UTC
Last modified: 11 Nov 2017, 17:08:29 UTC

I had my #1 and #2 machines shut down all night, but my #3 machine was left running (w/ a backup project) and by this morning had gradually filled the queue. So, about 45 minutes ago, I fired up my #1 machine. It got 3 tasks, all Arecibo VLARs, adding to the 16 of same that it already had. Nothing for the GPUs. After half a dozen non-productive scheduler requests, I finally decided to reschedule all non-running Arecibo VLARs to the GPUs, just to give them something to do. When BOINC restarted, the first scheduler request immediately snagged 111 new tasks. Go figure!

EDIT: And it looks like the second request got 135 more.
ID: 1900543 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14680
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1900544 - Posted: 11 Nov 2017, 17:15:27 UTC - in response to Message 1900541.  
Last modified: 11 Nov 2017, 18:04:35 UTC

So Richard, is there anything I can set in cc_config or logging options that can pinpoint why I keep getting larger and larger backoff intervals? What about the report_tasks_immediately flag in cc_config? Would that prevent the backoff?
You can see the backoffs using the work_fetch_debug Event Log flag, although you need your thinking head on - it's very dense and technical. I'd be more interested in doing that first, to find where the problem lies, rather than guess at potential fixes without fully understanding what's going on.

I'll try and force a WFD log with backoffs, and annotate it.

Edit - here's a simple one, with all the other projects removed.

11/11/2017 17:43:10 |  | [work_fetch] ------- start work fetch state -------
11/11/2017 17:43:10 |  | [work_fetch] target work buffer: 108000.00 + 864.00 sec
11/11/2017 17:43:10 |  | [work_fetch] --- project states ---
11/11/2017 17:43:10 | SETI@home | [work_fetch] REC 392100.423 prio -0.019 can't request work: scheduler RPC backoff (297.81 sec)
11/11/2017 17:43:10 |  | [work_fetch] --- state for CPU ---
11/11/2017 17:43:10 |  | [work_fetch] shortfall 257739.36 nidle 0.00 saturated 41144.36 busy 0.00
11/11/2017 17:43:10 | SETI@home | [work_fetch] share 0.000 blocked by project preferences
11/11/2017 17:43:10 |  | [work_fetch] --- state for NVIDIA GPU ---
11/11/2017 17:43:10 |  | [work_fetch] shortfall 60371.40 nidle 0.00 saturated 78256.81 busy 0.00
11/11/2017 17:43:10 | SETI@home | [work_fetch] share 0.000 project is backed off  (resource backoff: 552.68, inc 600.00)
11/11/2017 17:43:10 |  | [work_fetch] --- state for Intel GPU ---
11/11/2017 17:43:10 |  | [work_fetch] shortfall 87875.71 nidle 0.00 saturated 20988.29 busy 0.00
11/11/2017 17:43:10 | SETI@home | [work_fetch] share 0.000 blocked by project preferences
11/11/2017 17:43:10 |  | [work_fetch] ------- end work fetch state -------
Took a while to force it, because every request got work, and I was in the middle of a batch of shorties which reset the backoffs as quickly as I could fetch work.

So, the data lines on the order they appear.

target work buffer - what you ask for. 1.25 days plus 0.01 days, in this case. No work request unless you're below the sum of these two.

Project state - still early in the 5:03 server backoff. Won't ask by itself (and no point in pressing 'update') until this is zero.

No CPU (or iGPU) requests for SETI on this machine - my preference.

state for NVIDIA GPU - the one we're interested in. Showing a shortfall, so it would fetch work if it could. But it's in resource backoff, because I've reached a quota limit in this case - the same would show for 'no tasks available'.

The two figures showing for backoff are:

First - the current 'how long to wait' - will count down by 60 seconds every minute.
Second (inc) - the current baseline for the backoff. Will double at each consecutive failure to get work until it reaches (I think) 4 hours / 14,400 seconds. The actual backoff will be set to a random number of roughly the same magnitude as 'inc', so the machines don't get into lockstep.

The theory is that resource backoff should be set to zero after successful task completion, and 'inc' should be set to zero after every task allocation. You're saying that the first half of that statement doesn't apply under Linux?
ID: 1900544 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1900550 - Posted: 11 Nov 2017, 17:52:03 UTC - in response to Message 1900544.  

I do that all the time. Sometimes it seems to help unplug the servers and make them recognize that my machines are in need of work. It wasn't working on the Linux machine at all. I got desperate as I was down to less than a dozen tasks and the cpu cores were going cold so I decided to use the kick the server protocol. The next request after bringing BOINC back online snagged 119 tasks. And the next few requests got me back to full cache level. Now the Win 10 machine is getting low and I am doing the same with it. That procedure is the only thing that seems to regularly work in getting the servers to recognize a task cache deficiency for me.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1900550 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14680
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1900556 - Posted: 11 Nov 2017, 18:14:41 UTC - in response to Message 1900550.  

Yes, it's documented that restarting the BOINC client should clear any backoffs. But they should be cleared during running, too. See big edit to my last post.
ID: 1900556 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1900557 - Posted: 11 Nov 2017, 18:29:15 UTC - in response to Message 1900544.  
Last modified: 11 Nov 2017, 19:07:26 UTC

Thanks for the detailed explanation for the WFD option. I understand what shortfall is. But what does "saturated" mean? Is that a stand-in for quota you mention?

And another VERY interesting comment you make ....
Second (inc) - the current baseline for the backoff. Will double at each consecutive failure to get work until it reaches (I think) 4 hours / 14,400 seconds. The actual backoff will be set to a random number of roughly the same magnitude as 'inc', so the machines don't get into lockstep.


That doesn't seem to work on my machines. Or I am not understanding the comment. What I observe all the time every day is that if I have all my machines initially staggered in their 305 second request intervals, shortly within an hour or so, all machines are "synched" up in request interval countdown. I am sure that only aggravates getting work for a machine since whichever machine beats the others by a few milliseconds to the RTS buffer, depletes it for all the other machines next in the queue. I am constantly having to pause machines by stopping and restarting BOINC to get their request timings staggered apart.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1900557 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14680
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1900561 - Posted: 11 Nov 2017, 18:45:20 UTC - in response to Message 1900557.  

Thanks for the detailed explanation for the WFD option. I understand what shortfall is. But what does "saturated" mean? Is that a stand-in for quota you mention?
'saturated' would be the total estimated remaining runtime for all work cached for the resource. In my case, I've got two GPUs in that machine, so shortfall would be twice the target work buffer (217,728 seconds) if I had no NVidia work at all. But for that snapshot the 'saturated' number was the combined result of 200 SETI tasks and two-thirds of a GPUGrid task (edited out for clarity) with about 5 hours remaining.

And another VERY interesting comment you make ....
Second (inc) - the current baseline for the backoff. Will double at each consecutive failure to get work until it reaches (I think) 4 hours / 14,400 seconds. The actual backoff will be set to a random number of roughly the same magnitude as 'inc', so the machines don't get into lockstep.
No, there's no randomising on the server-requested backoffs - "Project requested delay of 303 seconds" it says, and 303 seconds it gets. Randomisation only applies to the internally-generated resource backoff, which you only see if you have WFD active.
ID: 1900561 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1900566 - Posted: 11 Nov 2017, 19:05:31 UTC - in response to Message 1900561.  
Last modified: 11 Nov 2017, 19:08:37 UTC

So what is the cause of the sync that happens on my machines? If all machines are initially staggered as to when their 303 second interval ends, they should maintain their staggered countdown since the request interval is static and never changes. What causes my machines to eventually sync together so that they are no longer staggered out when they hit the scheduler and hit the scheduler at exactly the same time?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1900566 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14680
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1900568 - Posted: 11 Nov 2017, 19:11:09 UTC - in response to Message 1900566.  

Pass. Could be something on your local network, could be the variable length of time it takes to connect to the server and process the request. I'm not interested in that: the question is - "Why does the Linux box request less often than the Windows boxes?", and I'm wondering if the answer might be "because the Windows version of v7.8.3 clears resource backoffs on task completion but the Linux version of v7.8.3 doesn't"?
ID: 1900568 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1900571 - Posted: 11 Nov 2017, 19:46:38 UTC - in response to Message 1900568.  

OK, I'll accept the pass and just accept the situation. Just don't like all the hand-holding I have to do on the machines to keep them fed.

Is the issue with the Linux box something I need to put to the BOINC-developers website as a new issue? What kind of data dumps would I need to do to show that the Linux sources differ from the Windows ones with regard to the backoff issue?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1900571 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14680
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1900572 - Posted: 11 Nov 2017, 19:54:25 UTC - in response to Message 1900571.  

I would like to study a contiguous segment of message log from that machine, with WFD active, showing resource backoff at the beginning, a task completion and upload, and the next WFD afterwards. What we do next depends on what we see there - if I see anything suspicious, I'll have a dig through the source code before writing anything on github.

If this is a bug, it's existed for 7 years without anyone noticing. Another couple of days is preferable to going off half-cocked and making fools of both of us.
ID: 1900572 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1900581 - Posted: 11 Nov 2017, 20:40:01 UTC - in response to Message 1900512.  

Don't worry about it, Stephen, you are not missing much! I may still be running and have a reasonable amount of work, but now I can't report the tasks! Uploads, are fine, but that's as far as it goes.


. . Hi Iona,

. . That was where it was at when I fired the rigs up last time. But ... this morning .. Eureka :)

. . Four work requests and this rig has gone from bone dry to a full fuel tank. I am taking that as a sign the problem has been found and kicked. Time to fire up the other two.

Stephen

:)
ID: 1900581 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1900584 - Posted: 11 Nov 2017, 20:46:51 UTC - in response to Message 1900532.  

My Windows machines seem to have received work during the night and are close to full. But the Linux machine is still way down in work with about a quarter of what it is supposed to have. I think that part of the issue is that at each no tasks are available message, it just keeps incrementing the Nvidia gpu backoff interval.


. . Hi Keith,

. . Oh it does that when the system is being worked on and the servers are not playing ball at all. When work requests are not being answered at all or with system shut down response the back off increases, with each subsequent such response the increase gets longer and longer. Another reason I reach for the button with the funny symbol on it.

Stephen

:(

. . BUT! The work is now coming through AOK on this Linux rig so I will be firing up the other two again.

Stephen

:)
ID: 1900584 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1900585 - Posted: 11 Nov 2017, 20:51:32 UTC - in response to Message 1900537.  

Well I got this

13755 SETI@home 11/11/2017 3:52:48 PM Scheduler request completed: got 71 new tasks

On my main machine and caches are now full

2nd machine has been downloading in 1's and 2's and also now has a full cache.


. . Isn't it odd how one machine will get work in large batches when it is flowing, but another will only get dribbles. But dribbles are better than nothing ... :)

Stephen

:)
ID: 1900585 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1900592 - Posted: 11 Nov 2017, 21:12:02 UTC - in response to Message 1900566.  

So what is the cause of the sync that happens on my machines? If all machines are initially staggered as to when their 303 second interval ends, they should maintain their staggered countdown since the request interval is static and never changes. What causes my machines to eventually sync together so that they are no longer staggered out when they hit the scheduler and hit the scheduler at exactly the same time?


. . Perhaps because the task duration takes a work request past the 303 sec mark? So the steps are not equal on each machine and can eventual coincide?

Stephen

?
ID: 1900592 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (108) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.