Deferred communications and Resource share.

Message boards : Number crunching : Deferred communications and Resource share.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993126 - Posted: 8 May 2019, 15:40:15 UTC

Due to the deferred comms at 5:00 and nearly all Seti GPU tasks taking less than 5:00 to complete, when a task completes BOINC goes to the other Project for replacement work, as Seti is blocked.

So far in 9 days* on this new computer, RTX 2060 GPU, the secondary Project has completed 1250 tasks, when if the resource share was observed it should have only completed 650. (* the first 2 days it only crunched Seti)

Should the project only enforce the Communication deferred timeout for a limited period after outages?
ID: 1993126 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1993150 - Posted: 8 May 2019, 18:12:49 UTC - in response to Message 1993126.  
Last modified: 8 May 2019, 18:13:28 UTC

When the your client has issues contacting the Seti project, the normal "backoff" for communication is 60 minutes. If it fails again, the backoff increases by a nominal value of 40 - 60 minutes in my observation. Not sure what your "5:00" is referencing. Are you referring to the standard 305 second scheduler reply interval?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1993150 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993152 - Posted: 8 May 2019, 18:27:52 UTC - in response to Message 1993150.  

Are you referring to the standard 305 second scheduler reply interval?

Probably, can't say that I watched it that closely.
ID: 1993152 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1993154 - Posted: 8 May 2019, 18:40:45 UTC

Until all attached projects have a steady state developed REC, the scheduler makes best guess choices about scheduling and resource commitments. Sounds like at least one project (Seti) was just added and had to make up a lot of ground to your other mature projects.

There have been a lot of changes made to work fetch in the upcoming BOINC release 7.16 client that will and should address some of the scheduler deficiencies with regard to resource allocation with multiple attached projects running.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1993154 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993200 - Posted: 9 May 2019, 0:34:11 UTC - in response to Message 1993154.  
Last modified: 9 May 2019, 0:44:10 UTC

Until all attached projects have a steady state developed REC, the scheduler makes best guess choices about scheduling and resource commitments. Sounds like at least one project (Seti) was just added and had to make up a lot of ground to your other mature projects.

There have been a lot of changes made to work fetch in the upcoming BOINC release 7.16 client that will and should address some of the scheduler deficiencies with regard to resource allocation with multiple attached projects running.

It's a new computer 27th April, for two days Seti was the only project running before the side panel was screwed on.
Delayed due needing extension cables for 12V 8 pin and two of the 4 pin fans. After which I added other projects. Einstein as backup with 0 resource share and Seti Beta with 10% share, as expected Beta ran for reasonable period to catch up, but since then has been downloading more tasks than required due to Seti being effectively disabled because of the "communication deferred" timeout.

P.S. AKA WinterKnight, https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1023&postid=22983#22983 the ID 666 is from BOINC.
ID: 1993200 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1993205 - Posted: 9 May 2019, 1:29:31 UTC - in response to Message 1993200.  

Did you dump a bunch of errors on the Seti account. That would account for the communication deferred as you got put into the penalty box and will have to return validated work before the project will give you some more.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1993205 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993208 - Posted: 9 May 2019, 2:52:19 UTC - in response to Message 1993205.  
Last modified: 9 May 2019, 3:05:32 UTC

Did you dump a bunch of errors on the Seti account. That would account for the communication deferred as you got put into the penalty box and will have to return validated work before the project will give you some more.

I'm not sure why that happened, a few strange things happened while Win 10 pro decided the latest version needed a big pile of updates. This caused multiple restarts and restarts within restarts while I was busy doing essentials like cooking and cleaning. I live alone.

As far as I can tell, Seti requested new work and almost immediately Win 10 decided to re-boot or disable comms for a period, so that the requested tasks never reached the computer. As you can see they were timed out after about 5 mins.

edit] I've PM'd Event log
ID: 1993208 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1993211 - Posted: 9 May 2019, 3:19:35 UTC - in response to Message 1993208.  

I don't see anything done by Windows. I see tasks suspended by user and then NNT set by user. Everything looks normal in the log. To get a better idea of scheduling, work_fetch_debug and at minimum sched_op_debug would have had to be set beforehand to get a better idea of what the client was requesting for work and what the scheduler's responses would have been.

I would recommend that you set sched_op_debug flag option for the Event Log. It doesn't throw all that much extra output into the Event Log but does give you an indication of exactly how much work you are requesting at each scheduler contact. I would only set work_fetch_debug for one scheduler connection cycle as the amount of output it creates is too much for permanency in the Log.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1993211 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993221 - Posted: 9 May 2019, 4:29:51 UTC - in response to Message 1993211.  

I don't see anything done by Windows. I see tasks suspended by user and then NNT set by user. Everything looks normal in the log. To get a better idea of scheduling, work_fetch_debug and at minimum sched_op_debug would have had to be set beforehand to get a better idea of what the client was requesting for work and what the scheduler's responses would have been.

I would recommend that you set sched_op_debug flag option for the Event Log. It doesn't throw all that much extra output into the Event Log but does give you an indication of exactly how much work you are requesting at each scheduler contact. I would only set work_fetch_debug for one scheduler connection cycle as the amount of output it creates is too much for permanency in the Log.

I don't think windows has had any recent effect on BOINC and the projects since the last restart.
What I am seeing if I allow Beta to download, is that when Main completes a task,
BOINC decides it needs more work,
if the 'comms deferred' countdown has completed
it asks for work from Main
if the 'comms deferred' countdown has not completed
it asks for work from Beta.
This work from Beta just piles up and is not processed as the resource share for Beta has been exceeded, I decided to get rid of it by suspending Main and setting Beta to 'no new work'. The Beta 'no new work' is still in place.
I suspect that if I allow Beta to download, which as you can see from the first part of that log, is approx. one Beta task for each Main task.
With a resource share of Main 90:10 Beta, this is not right.

If it remains as it is, with Main crunching Green Bank VLAR's (~04m:30s) and Beta Arecibo mid range AR (~2m:00s) I would expect to see a count of units of Main 20:5 Beta if the resource share is to be maintained.
This is NOT happening.
ID: 1993221 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1993222 - Posted: 9 May 2019, 4:53:13 UTC

The time constant for your 20:5 ratio is weeks, not days, hours or minutes.
With a fairly new machine that has had a re-booting issue like yours then it is perfectly normal to get a gross imbalance for a few days until BOINC settles down and sorts things out.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1993222 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993223 - Posted: 9 May 2019, 5:02:42 UTC - in response to Message 1993222.  

The time constant for your 20:5 ratio is weeks, not days, hours or minutes.
With a fairly new machine that has had a re-booting issue like yours then it is perfectly normal to get a gross imbalance for a few days until BOINC settles down and sorts things out.

We are talking 10 days now.
ID: 1993223 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993234 - Posted: 9 May 2019, 8:09:56 UTC
Last modified: 9 May 2019, 8:48:31 UTC

Try telling me that this is what is expected with a Main 90:10 Beta resource share.

09/05/2019 09:03:34 | SETI@home | Sending scheduler request: To fetch work.
09/05/2019 09:03:34 | SETI@home | Reporting 1 completed tasks
09/05/2019 09:03:34 | SETI@home | Requesting new tasks for NVIDIA GPU
09/05/2019 09:03:36 | SETI@home | Scheduler request completed: got 1 new tasks
09/05/2019 09:03:38 | SETI@home | Started download of blc33_2bit_guppi_58406_01258_HIP116971_0032.12731.818.21.44.231.vlar
09/05/2019 09:03:42 | SETI@home | Finished download of blc33_2bit_guppi_58406_01258_HIP116971_0032.12731.818.21.44.231.vlar
09/05/2019 09:04:52 | SETI@home | Computation for task blc33_2bit_guppi_58405_86306_HIP85612_0029.30056.818.22.45.189.vlar_1 finished
09/05/2019 09:04:52 | SETI@home | Starting task blc33_2bit_guppi_58406_01590_HIP116245_0033.30624.818.22.45.8.vlar_1
09/05/2019 09:04:54 | SETI@home | Started upload of blc33_2bit_guppi_58405_86306_HIP85612_0029.30056.818.22.45.189.vlar_1_r1562831619_0
09/05/2019 09:04:58 | SETI@home | Finished upload of blc33_2bit_guppi_58405_86306_HIP85612_0029.30056.818.22.45.189.vlar_1_r1562831619_0
09/05/2019 09:05:16 | SETI@home | Computation for task blc33_2bit_guppi_58406_01590_HIP116245_0033.30624.818.22.45.8.vlar_1 finished
09/05/2019 09:05:16 | SETI@home | Starting task blc33_2bit_guppi_58405_86306_HIP85612_0029.32743.0.21.44.46.vlar_1
09/05/2019 09:05:16 | SETI@home Beta Test | Sending scheduler request: To fetch work.
09/05/2019 09:05:16 | SETI@home Beta Test | Requesting new tasks for NVIDIA GPU
09/05/2019 09:05:18 | SETI@home | Started upload of blc33_2bit_guppi_58406_01590_HIP116245_0033.30624.818.22.45.8.vlar_1_r982591748_0
09/05/2019 09:05:18 | SETI@home Beta Test | Scheduler request completed: got 1 new tasks
09/05/2019 09:05:20 | SETI@home Beta Test | Started download of 30dc06ah.25134.12751.10.44.73
09/05/2019 09:05:21 | SETI@home | Finished upload of blc33_2bit_guppi_58406_01590_HIP116245_0033.30624.818.22.45.8.vlar_1_r982591748_0
09/05/2019 09:05:25 | SETI@home Beta Test | Finished download of 30dc06ah.25134.12751.10.44.73
09/05/2019 09:06:28 | SETI@home Beta Test | Sending scheduler request: To fetch work.
09/05/2019 09:06:28 | SETI@home Beta Test | Requesting new tasks for NVIDIA GPU
09/05/2019 09:06:30 | SETI@home Beta Test | Scheduler request completed: got 1 new tasks
09/05/2019 09:06:32 | SETI@home Beta Test | Started download of 30dc06ah.25134.12342.10.44.50
09/05/2019 09:06:36 | SETI@home Beta Test | Finished download of 30dc06ah.25134.12342.10.44.50
09/05/2019 09:07:40 | SETI@home Beta Test | Sending scheduler request: To fetch work.
09/05/2019 09:07:40 | SETI@home Beta Test | Requesting new tasks for NVIDIA GPU
09/05/2019 09:07:41 | SETI@home Beta Test | Scheduler request completed: got 1 new tasks
09/05/2019 09:07:43 | SETI@home Beta Test | Started download of 30dc06ah.25134.12342.10.44.193
09/05/2019 09:07:47 | SETI@home Beta Test | Finished download of 30dc06ah.25134.12342.10.44.193
ID: 1993234 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1993236 - Posted: 9 May 2019, 8:25:13 UTC - in response to Message 1993234.  

It can happen, yes. If that first request at 09:03:34 just topped you up to the 'limit of tasks in progress' (100 per GPU), you'd be backed off until 09:08:39 - 303 seconds from the reply time. So you'd be prevented from fetching from the main project throughout the period of your log.

But if your cache length specification ('Store at least --- days of work') hadn't been used up already, BOINC would need to find extra work from somewhere else. Beta only enforces a limit of 7 seconds between requests, so it's an easy target.
ID: 1993236 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993237 - Posted: 9 May 2019, 8:40:14 UTC - in response to Message 1993236.  
Last modified: 9 May 2019, 8:45:14 UTC

The cache is virtually full, each request is just to top it up.

At the moment because Beta is not getting processed, the cache is >60% Beta and < 40% Main, by time remaining.

Beta presumably not being processed because it has already done about 40 hrs since the 29th April, normally with the resoure share it should have only done 24 hrs of work. 1 day out of 10.
ID: 1993237 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1993238 - Posted: 9 May 2019, 8:46:45 UTC - in response to Message 1993237.  

But are your requests being inhibited by the 'limit on tasks in progress'? Main certainly has one, Beta probably does (though I'm not sure). If those are kicking in, BOINC will always reach out for the low-hanging fruit - whichever project is free to request work first. Most of the time, that will be Beta.
ID: 1993238 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993239 - Posted: 9 May 2019, 8:50:26 UTC - in response to Message 1993238.  
Last modified: 9 May 2019, 9:03:17 UTC

But are your requests being inhibited by the 'limit on tasks in progress'? Main certainly has one, Beta probably does (though I'm not sure). If those are kicking in, BOINC will always reach out for the low-hanging fruit - whichever project is free to request work first. Most of the time, that will be Beta.

My cache is small enough that the 100 tasks limit is not a factor**, I'm trying to catch Astropulse.

edit] ** on main
ID: 1993239 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993249 - Posted: 9 May 2019, 10:57:41 UTC

Is there a 50 task limit at Beta, one request to there got 0 tasks and none since.
ID: 1993249 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1993251 - Posted: 9 May 2019, 11:09:34 UTC - in response to Message 1993249.  

I think I've seen that number, yes, but I don't have logged evidence.

Hitting a limit wouldn't (by itself) stop BOINC asking, but after each attempt to fetch, you'd see a line saying

This computer has reached a limit on tasks in progress
ID: 1993251 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1993256 - Posted: 9 May 2019, 12:10:25 UTC - in response to Message 1993251.  

I think I've seen that number, yes, but I don't have logged evidence.

Hitting a limit wouldn't (by itself) stop BOINC asking, but after each attempt to fetch, you'd see a line saying

This computer has reached a limit on tasks in progress

Didn't get that

09/05/2019 11:41:21 | SETI@home | Started upload of blc33_2bit_guppi_58406_02255_HIP116258_0035.1782.818.21.44.126.vlar_1_r1882011805_0
09/05/2019 11:41:24 | SETI@home | Finished upload of blc33_2bit_guppi_58406_02255_HIP116258_0035.1782.818.21.44.126.vlar_1_r1882011805_0
09/05/2019 11:42:25 | SETI@home Beta Test | Sending scheduler request: To fetch work.
09/05/2019 11:42:25 | SETI@home Beta Test | Requesting new tasks for NVIDIA GPU
09/05/2019 11:42:27 | SETI@home Beta Test | Scheduler request completed: got 0 new tasks
09/05/2019 11:43:33 | SETI@home | Sending scheduler request: To fetch work.
09/05/2019 11:43:33 | SETI@home | Reporting 1 completed tasks
09/05/2019 11:43:33 | SETI@home | Requesting new tasks for NVIDIA GPU
09/05/2019 11:43:35 | SETI@home | Scheduler request completed: got 1 new tasks


Since then no more requests to Beta. Check with Beta,
State: All (1301) · In progress (50) · Validation pending (4) · Validation inconclusive (1) · Valid (1246) · Invalid (0) · Error (0
ID: 1993256 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24875
Credit: 3,081,182
RAC: 7
Ireland
Message 1993257 - Posted: 9 May 2019, 12:12:49 UTC - in response to Message 1993249.  

Is there a 50 task limit at Beta, one request to there got 0 tasks and none since.
Yes.
ID: 1993257 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Deferred communications and Resource share.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.