Message boards :
Number crunching :
When should SETI start canceling Unneeded Tasks, to reduce Wasted Effort and electricity ?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13824 Credit: 208,696,464 RAC: 304 |
Nope, those options are available in the BOINC Project options.Maybe just cause such logic absent in project server settings? That is, they can't do it just "by button click". And then the question arise worth it time to spend to do additional coding or not.Well, maybe next step would be increase initial replication to 8? But don't forget to increase total number and number of errors tooWhy not send those resends to host with a fast turnaround time? Let's say <3 days? Accelerating retries The goal of this mechanism is to send timeout-generated retries to hosts that are likely to finish them fast. Here's how it works: Hosts are deemed "reliable" (a slight misnomer) if they satisfy turnaround time and error rate criteria. A job instance is deemed "need-reliable" if its priority is above a threshold. The scheduler tries to send need-reliable jobs to reliable hosts. When it does, it reduces the delay bound of the job. When job replicas are created in response to errors or timeouts, their priority is raised relative to the job's base priority. The configurable parameters are: <reliable_on_priority>X</reliable_on_priority> Results with priority at least reliable_on_priority are treated as "need-reliable". They'll be sent preferentially to reliable hosts. <reliable_max_avg_turnaround>secs</reliable_max_avg_turnaround> Hosts whose average turnaround is at most reliable_max_avg_turnaround and that have at least 10 consecutive valid results e are considered 'reliable'. Make sure you set this low enough that a significant fraction (e.g. 25%) of your hosts qualify. <reliable_reduced_delay_bound>X</reliable_reduced_delay_bound> When a need-reliable result is sent to a reliable host, multiply the delay bound by reliable_reduced_delay_bound (typically 0.5 or so). <reliable_priority_on_over>X</reliable_priority_on_over> <reliable_priority_on_over_except_error>X</reliable_priority_on_over_except_error> If reliable_priority_on_over is nonzero, increase the priority of duplicate jobs by that amount over the job's base priority. Otherwise, if reliable_priority_on_over_except_error is nonzero, increase the priority of duplicates caused by timeout (not error) by that amount. (Typically only one of these is nonzero, and is equal to reliable_on_priority.) NOTE: this mechanism can be used to preferentially send ANY job, not just retries, to fast/reliable hosts. To do so, set the workunit's priority to reliable_on_priority or greater. And given the low load on the server they could do the automatic Ghost recovery without issue now. And they can abort jobs as well. Job retransmission <resend_lost_results> 0|1 </resend_lost_results> If set, and a <other_results> list is present in scheduler request, resend any in-progress results not in the list. This is recommended; it may increase the efficiency of your project. For reasons that are not well understood, a BOINC client sometimes fails to receive the scheduler reply. This flag addresses that issue: it causes the SAME results to be resent by the scheduler, if the client has failed to receive them. Note: this will increase the load on your DB server; you can minimize this by creating an index: alter table result add index res_host_state (hostid, server_state); <send_result_abort>0|1</send_result_abort> If set, and the client is processing a result for a WU that has been canceled or is not in the DB (i.e. there's no chance of getting credit), tell the client to abort the result regardless of state. If client is processing a result for a WU that has been assimilated or is overdue (i.e. there's a chance of not getting credit) tell the client to abort the result if it hasn't started yet. Note: this will increase the load on your DB server. and it would probably make sense to stop people from being able to join the project now that it's no longer producing new work. <disable_account_creation/> If present, disallow account creation via Web and RPC. See also <no_web_account_creation>. Grant Darwin NT |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
Yes, all needed options are available, I can't understand why they don't use them. They could be propably be way below 100000 results out in the field now, would not waste other peoples electricity and in many cases also not slow down useful work from other BOINC projects. This is really not a clean shutdown anymore. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14667 Credit: 200,643,578 RAC: 874 |
Despite occasional grumbles, David Anderson's BOINC actually handles the steady state, middle of a run, quite well. But I've been concerned for some time that it doesn't handle startup very well, and this experience shows that it doesn't handle 'wrap up and close down' very gracefully, either. We can't do much about the people who install BOINC, join a project and grab some tasks, and then decide that they don't like the extra noise/heat/power/whatever, and uninstall it. But there are things we could do something about. * When detaching from a project, no attempt is made to abort/report uncompleted tasks. * When attaching to a new project, Resource Share is out of balance, and the new project gets the lion's share of the effort. * The REC half-life of 10 days means the new project stays dominant for too long. * The assessment of 'reliable' hosts for the wrap-up is weak (although we didn't even try). * We didn't try any of the other Accelerating retries tools either - perhaps they drop out of mind when you're used to the steady state. I managed to say most of those to David last week, but there are probably more we could add to the list - if anyone can add them, we could write them up for a "How to close down your research cleanly" guide. |
rob smith Send message Joined: 7 Mar 03 Posts: 22404 Credit: 416,307,556 RAC: 380 |
There is a bit of an issue with "reliable host", or at least how they are defined. Does one use the proportion of correct vs incorrect returns? Does one use the turn-around time (for correct results)? Does one use some combination of the above? Does one use something else? Each has its own set of advantages and disadvantages. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Another topic but too performance-degradation one: any chance to attract attention to BOINC for Android issues? Namely - very low share of time when client actually does work (though it could). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14667 Credit: 200,643,578 RAC: 874 |
Another topic but too performance-degradation one: any chance to attract attention to BOINC for Android issues? Namely - very low share of time when client actually does work (though it could).There's a huge effort being put into Android at the moment by Vitalii Koshura and Isira Seneviratnev - basically, a complete re-write from the ground up using modern development tools and libraries. I'm not an expert on Android, but if you think it's important - get yourself over to GitHub and get stuck in. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.