Question(s) on BOINC Client Design

Author	Message
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 919531 - Posted: 19 Jul 2009, 23:16:24 UTC 1) Why can't the local BOINC Client be made aware when the upload server is down or highly stressed so that it tries only once every (say) 15 minutes to upload? And add a msg so user is aware (if they look at their messages)? Then it won't be trying to UL so often when there is a large number of completed tasks on the local machine, at least until Berkeley tells him that UL is up, or stress level acceptable. Can't be worse than the recent experiences, can it? 1a) Similarly for DLs, as well. In both cases my machine gets SOMETHING from servers, might as well use this communication for load balancing, yes? No? Maybe? -------------------------------- 2) Why is there a limit on downloads when uploads are backed up? Doesn't seem useful to me - I have lots of disk space to store non-UL results and still maintain a cache of my desired size. So I can compute when UL is down. I don't think deadlines are a big issue here. Or am I missing something? ID: 919531 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 919535 - Posted: 19 Jul 2009, 23:25:03 UTC - in response to Message 919531. 1) Why can't the local BOINC Client be made aware when the upload server is down or highly stressed so that it tries only once every (say) 15 minutes to upload? And add a msg so user is aware (if they look at their messages)? Then it won't be trying to UL so often when there is a large number of completed tasks on the local machine, at least until Berkeley tells him that UL is up, or stress level acceptable. Can't be worse than the recent experiences, can it? 1a) Similarly for DLs, as well. In both cases my machine gets SOMETHING from servers, might as well use this communication for load balancing, yes? No? Maybe? -------------------------------- 2) Why is there a limit on downloads when uploads are backed up? Doesn't seem useful to me - I have lots of disk space to store non-UL results and still maintain a cache of my desired size. So I can compute when UL is down. I don't think deadlines are a big issue here. Or am I missing something? 1) There are two reasons: The problem is caused by the fact that the client cannot communicate with the server -- it may or may not actually be down. Either way, if the client can't reach the server (or the project) it can't really know if the server is up or down, or when it will come back. So, you've moved the problem (uploads fail) to a new problem (server is overloaded trying to tell people not to upload). 2) If downloads are allowed to continue, they'll eventually become uploads (and "eventually" can be just a few minutes on a CUDA machine). If the problem is "too many uploads" then adding more to an impossible situation just makes the situation worse. A big part of the issue is what happens with a network connection as the loading passes 90% -- at 90% throughput is good. At 95%, throughput is reduced. At 110% throughput may be as low as 10% (90% wasted!) and that includes a simple "go away for a while." ID: 919535 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 919539 - Posted: 19 Jul 2009, 23:33:17 UTC - in response to Message 919535. Last modified: 19 Jul 2009, 23:34:22 UTC 1) There are two reasons: The problem is caused by the fact that the client cannot communicate with the server -- it may or may not actually be down. Either way, if the client can't reach the server (or the project) it can't really know if the server is up or down, or when it will come back. So, you've moved the problem (uploads fail) to a new problem (server is overloaded trying to tell people not to upload). 2) If downloads are allowed to continue, they'll eventually become uploads (and "eventually" can be just a few minutes on a CUDA machine). If the problem is "too many uploads" then adding more to an impossible situation just makes the situation worse. A big part of the issue is what happens with a network connection as the loading passes 90% -- at 90% throughput is good. At 95%, throughput is reduced. At 110% throughput may be as low as 10% (90% wasted!) and that includes a simple "go away for a while." 1) Yup, but how is this worse than what we have now? And not being able to reach the server is not the same as not being able to reach BOINC in Berkeley, where (maybe) the UL request could be intercepted by a program triggered by UL Server status to tell sender to slow down? 2) Again, yup, but my "pacing" suggestion might make it go better? As the UL Request Interceptor could try to balance things in conjunction with a BOINC client that could know not to rush all UL to Berkeley after an outage or slowdown? Maybe I am nuts, but it doesn't strike me as too difficult a thing to do... ID: 919539 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 919551 - Posted: 20 Jul 2009, 0:24:52 UTC - in response to Message 919539. 1) There are two reasons: The problem is caused by the fact that the client cannot communicate with the server -- it may or may not actually be down. Either way, if the client can't reach the server (or the project) it can't really know if the server is up or down, or when it will come back. So, you've moved the problem (uploads fail) to a new problem (server is overloaded trying to tell people not to upload). 2) If downloads are allowed to continue, they'll eventually become uploads (and "eventually" can be just a few minutes on a CUDA machine). If the problem is "too many uploads" then adding more to an impossible situation just makes the situation worse. A big part of the issue is what happens with a network connection as the loading passes 90% -- at 90% throughput is good. At 95%, throughput is reduced. At 110% throughput may be as low as 10% (90% wasted!) and that includes a simple "go away for a while." 1) Yup, but how is this worse than what we have now? And not being able to reach the server is not the same as not being able to reach BOINC in Berkeley, where (maybe) the UL request could be intercepted by a program triggered by UL Server status to tell sender to slow down? 2) Again, yup, but my "pacing" suggestion might make it go better? As the UL Request Interceptor could try to balance things in conjunction with a BOINC client that could know not to rush all UL to Berkeley after an outage or slowdown? Maybe I am nuts, but it doesn't strike me as too difficult a thing to do... Have you read the book "Catch-22"? Using communications with the servers to tell the clients not to communicate with the servers is a Catch-22. ID: 919551 ·

Nicolas Send message Joined: 30 Mar 05 Posts: 161 Credit: 12,985 RAC: 0	Message 919556 - Posted: 20 Jul 2009, 0:39:18 UTC - in response to Message 919539. 1) Yup, but how is this worse than what we have now? And not being able to reach the server is not the same as not being able to reach BOINC in Berkeley, where (maybe) the UL request could be intercepted by a program triggered by UL Server status to tell sender to slow down? There is nothing centralized in "BOINC in Berkeley" that the client communicates with, and there won't be. Contribute to the Wiki! ID: 919556 ·

Aurora Borealis Volunteer tester Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0	Message 919704 - Posted: 20 Jul 2009, 14:32:29 UTC Last modified: 20 Jul 2009, 14:43:56 UTC As others have said the projects are independent entities and are not in any way connected to Boinc. Boinc is a decentralized system and the dev do not want the Boinc servers to be a clearing house for the projects. If millions of computers started contacting Boinc servers every time a project goes down, their own system would get swamped. One change to the Boinc software that is likely to be in future versions, is that upload failures will back off all the uploads instead of having the 10, 50, 500 independent request that now occur. This should ease some of the DOS like pounding the upload servers suffer when the pipes get clogged. EDIT: Also, the WUs with shorter deadlines may be moved to the top of the projects upload queue when there are problems. ID: 919704 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 919789 - Posted: 20 Jul 2009, 18:32:19 UTC - in response to Message 919704. As others have said the projects are independent entities and are not in any way connected to Boinc. Boinc is a decentralized system and the dev do not want the Boinc servers to be a clearing house for the projects. If millions of computers started contacting Boinc servers every time a project goes down, their own system would get swamped. One change to the Boinc software that is likely to be in future versions, is that upload failures will back off all the uploads instead of having the 10, 50, 500 independent request that now occur. This should ease some of the DOS like pounding the upload servers suffer when the pipes get clogged. EDIT: Also, the WUs with shorter deadlines may be moved to the top of the projects upload queue when there are problems. Those both sound like sensible changes. I am surprised, in fact, by the (apparent) fact that WUs are not by default executed in deadline order... ID: 919789 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 919793 - Posted: 20 Jul 2009, 18:35:17 UTC - in response to Message 919789. Those both sound like sensible changes. I am surprised, in fact, by the (apparent) fact that WUs are not by default executed in deadline order... It's been discussed before, but basically that would cause deadline problems. Work Units are processed First in First out. If there is a chance of a missed deadline, then BOINC selects that Work Unit to process. Once done it goes back to First in First out. Rinse & repeat. Grant Darwin NT ID: 919793 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 919814 - Posted: 20 Jul 2009, 19:19:12 UTC - in response to Message 919789. Those both sound like sensible changes. I am surprised, in fact, by the (apparent) fact that WUs are not by default executed in deadline order... Let's say you have a bunch of long work units, due in a month. Then BOINC tops off the cache with nothing but "shorties." If BOINC worked in strict deadline order, it would do the shorties first, and you could go a whole month without touching the long deadline workunits. ... and miss deadlines. FIFO covers every case except when some really short work units show up. BOINC runs a simulation, and if deadlines will be missed, it shifts to deadline-order until the simulation shows that FIFO is safe. ID: 919814 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.