When should SETI start canceling Unneeded Tasks, to reduce Wasted Effort and electricity ?

Message boards : Number crunching : When should SETI start canceling Unneeded Tasks, to reduce Wasted Effort and electricity ?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13824
Credit: 208,696,464
RAC: 304
Australia
Message 2047847 - Posted: 1 May 2020, 22:57:39 UTC - in response to Message 2047797.  

Well, maybe next step would be increase initial replication to 8? But don't forget to increase total number and number of errors too
Why not send those resends to host with a fast turnaround time? Let's say <3 days?
Maybe just cause such logic absent in project server settings? That is, they can't do it just "by button click". And then the question arise worth it time to spend to do additional coding or not.
Think that's the reason.
Nope, those options are available in the BOINC Project options.

Accelerating retries
   The goal of this mechanism is to send timeout-generated retries to hosts that are likely to finish them fast. Here's how it works:

      Hosts are deemed "reliable" (a slight misnomer) if they satisfy turnaround time and error rate criteria.
      A job instance is deemed "need-reliable" if its priority is above a threshold.
      The scheduler tries to send need-reliable jobs to reliable hosts. When it does, it reduces the delay bound of the job.
      When job replicas are created in response to errors or timeouts, their priority is raised relative to the job's base priority.

   The configurable parameters are:

   <reliable_on_priority>X</reliable_on_priority>
      Results with priority at least reliable_on_priority are treated as "need-reliable". They'll be sent preferentially to reliable hosts.

   <reliable_max_avg_turnaround>secs</reliable_max_avg_turnaround>
      Hosts whose average turnaround is at most reliable_max_avg_turnaround and that have at least 10 consecutive valid results e are considered 'reliable'. Make sure you set this low enough that a significant fraction (e.g. 25%) of your hosts qualify.

   <reliable_reduced_delay_bound>X</reliable_reduced_delay_bound>
      When a need-reliable result is sent to a reliable host, multiply the delay bound by reliable_reduced_delay_bound (typically 0.5 or so).

   <reliable_priority_on_over>X</reliable_priority_on_over>
   <reliable_priority_on_over_except_error>X</reliable_priority_on_over_except_error>
      If reliable_priority_on_over is nonzero, increase the priority of duplicate jobs by that amount over the job's base priority. Otherwise, if reliable_priority_on_over_except_error is nonzero, increase the priority of duplicates caused by timeout (not error) by that amount. (Typically only one of these is nonzero, and is equal to reliable_on_priority.)

      NOTE: this mechanism can be used to preferentially send ANY job, not just retries, to fast/reliable hosts. To do so, set the workunit's priority to reliable_on_priority or greater.



And given the low load on the server they could do the automatic Ghost recovery without issue now. And they can abort jobs as well.
Job retransmission
   <resend_lost_results> 0|1 </resend_lost_results>
      If set, and a <other_results> list is present in scheduler request, resend any in-progress results not in the list. This is recommended; it may increase the efficiency of your project. For reasons that are not well understood, a BOINC client sometimes fails to receive the scheduler reply. This flag addresses that issue: it causes the SAME results to be resent by the scheduler, if the client has failed to receive them. Note: this will increase the load on your DB server; you can minimize this by creating an index:
alter table result add index res_host_state (hostid, server_state);
   <send_result_abort>0|1</send_result_abort>
      If set, and the client is processing a result for a WU that has been canceled or is not in the DB (i.e. there's no chance of getting credit), tell the client to abort the result regardless of state. If client is processing a result for a WU that has been assimilated or is overdue (i.e. there's a chance of not getting credit) tell the client to abort the result if it hasn't started yet. Note: this will increase the load on your DB server.


and it would probably make sense to stop people from being able to join the project now that it's no longer producing new work.
   <disable_account_creation/>
      If present, disallow account creation via Web and RPC. See also <no_web_account_creation>.

Grant
Darwin NT
ID: 2047847 · Report as offensive     Reply Quote
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 2047946 - Posted: 3 May 2020, 8:13:43 UTC - in response to Message 2047847.  

Yes, all needed options are available, I can't understand why they don't use them. They could be propably be way below 100000 results out in the field now, would not waste other peoples electricity and in many cases also not slow down useful work from other BOINC projects. This is really not a clean shutdown anymore.
ID: 2047946 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14667
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2047947 - Posted: 3 May 2020, 8:57:25 UTC

Despite occasional grumbles, David Anderson's BOINC actually handles the steady state, middle of a run, quite well. But I've been concerned for some time that it doesn't handle startup very well, and this experience shows that it doesn't handle 'wrap up and close down' very gracefully, either.

We can't do much about the people who install BOINC, join a project and grab some tasks, and then decide that they don't like the extra noise/heat/power/whatever, and uninstall it. But there are things we could do something about.

* When detaching from a project, no attempt is made to abort/report uncompleted tasks.
* When attaching to a new project, Resource Share is out of balance, and the new project gets the lion's share of the effort.
* The REC half-life of 10 days means the new project stays dominant for too long.
* The assessment of 'reliable' hosts for the wrap-up is weak (although we didn't even try).
* We didn't try any of the other Accelerating retries tools either - perhaps they drop out of mind when you're used to the steady state.

I managed to say most of those to David last week, but there are probably more we could add to the list - if anyone can add them, we could write them up for a "How to close down your research cleanly" guide.
ID: 2047947 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22404
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2047948 - Posted: 3 May 2020, 9:15:33 UTC
Last modified: 3 May 2020, 9:15:44 UTC

There is a bit of an issue with "reliable host", or at least how they are defined.
Does one use the proportion of correct vs incorrect returns?
Does one use the turn-around time (for correct results)?
Does one use some combination of the above?
Does one use something else?

Each has its own set of advantages and disadvantages.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2047948 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2048553 - Posted: 8 May 2020, 13:17:50 UTC - in response to Message 2047947.  


I managed to say most of those to David last week,


Another topic but too performance-degradation one: any chance to attract attention to BOINC for Android issues? Namely - very low share of time when client actually does work (though it could).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2048553 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14667
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2048577 - Posted: 8 May 2020, 17:03:20 UTC - in response to Message 2048553.  

Another topic but too performance-degradation one: any chance to attract attention to BOINC for Android issues? Namely - very low share of time when client actually does work (though it could).
There's a huge effort being put into Android at the moment by Vitalii Koshura and Isira Seneviratnev - basically, a complete re-write from the ground up using modern development tools and libraries. I'm not an expert on Android, but if you think it's important - get yourself over to GitHub and get stuck in.
ID: 2048577 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : When should SETI start canceling Unneeded Tasks, to reduce Wasted Effort and electricity ?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.