how much GPU can the download server support?

Author	Message
Slavac Volunteer tester Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0	Message 1213349 - Posted: 2 Apr 2012, 21:21:33 UTC - in response to Message 1213346. Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. Under normal circumstances it would actually work. ie download bandwidth is only maxed out 1-10% of the time. The backoffs would help clear that load. But here at Seti where the bandwidth is maxed out all the time, it makes it impossible to download any work. If bandwidth is the bottle neck, then the new sever will not improve the situation, right? Are there any plans to improve bandwidth? Yes and that's all I can really say at this point. The new server with attached JBOD should help smoothe downloads and uploads as well. Executive Director GPU Users Group Inc. - brad@gpuug.org ID: 1213349 ·

Slavac Volunteer tester Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0	Message 1213350 - Posted: 2 Apr 2012, 21:22:18 UTC - in response to Message 1213305. Don't forget we have a new download server coming online soon. It will be interesting to see if it does help with the download problems, although if it does all it will be doing is working around the problem that the load sharing between the two servers, for whatever reason, just isn't working too well. One server appears to carry most of the load- hence trying to download from it results in frustration. Whereas the other server doesn't have much load & so downloads from it, even at the worst of times, are possible. Although presently with all the AP work & the shorites going through the system downloads are more difficult than they have been. And even Scheduler requests are taking a while to get a response, so there could be some other servver issues lurking in the backgropund at the moment. The plan right now is to replace all 3 of the current download/upload servers and compile them into one server. Executive Director GPU Users Group Inc. - brad@gpuug.org ID: 1213350 ·

Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0	Message 1213367 - Posted: 2 Apr 2012, 21:51:54 UTC - in response to Message 1213350. Last modified: 2 Apr 2012, 21:52:14 UTC The plan right now is to replace all 3 of the current download/upload servers and compile them into one server. hm... then it'll probably need the tcp/ip settings tweaking a lot to handle the number of sessions and a shedload of cpu for connection tracking and massive disk concurrency. I think 4 servers, using iptables to route based on least significant two bits over a 1G network, or 4x100mpbs split between different providers would get things running acceptably again. The current state of affairs is really really painful, with so much bandwidth wasted in retries. The lowest hanging fruit (knee level) is to change the WU data encoding for MB tasks from base64 to binary (ala astropulse), that would win about 20% efficiency for a simple tweak. ID: 1213367 ·

Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0	Message 1213370 - Posted: 2 Apr 2012, 22:00:05 UTC The plan right now is to replace all 3 of the current download/upload server and compile them into one server. If they really do that and keep using Apache, we are toast, I'm afraid. :( ID: 1213370 ·

Slavac Volunteer tester Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0	Message 1213374 - Posted: 2 Apr 2012, 22:04:24 UTC - in response to Message 1213370. The plan right now is to replace all 3 of the current download/upload server and compile them into one server. If they really do that and keep using Apache, we are toast, I'm afraid. :( If it comes down to needing another server (assuming that our one new beast won't handle everything) we'll fire up the money begging machine and build another. Executive Director GPU Users Group Inc. - brad@gpuug.org ID: 1213374 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1213378 - Posted: 2 Apr 2012, 22:13:53 UTC - in response to Message 1213344. Last modified: 2 Apr 2012, 22:15:31 UTC After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again. Boinc will also ask for work if NO downloads are backed off, so eithier use SIV on the Boinc status page, or manually retry all downloads, Claggy ID: 1213378 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213394 - Posted: 2 Apr 2012, 22:49:20 UTC - in response to Message 1213378. After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again. Boinc will also ask for work if NO downloads are backed off, so eithier use SIV on the Boinc status page, or manually retry all downloads, Claggy Yes, I am already using SIV, as I said. And I have also tried manually retrying the downloads. Still BOINC will not request more tasks until the queue is completely empty. And when that one task that just refuses to download after 8+ retries, I call uncle, and abort it. Dublin, California Team: SETI.USA ID: 1213394 ·

Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0	Message 1213402 - Posted: 2 Apr 2012, 23:31:45 UTC - in response to Message 1213344. After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again. I see you're using development versions of BOINC (7.x). Now, I have no idea how those versions handle downloads, but the version I use (6.10.58) will request new work and add to the download queue ad infinitum until the limits kick in. So, if I have 1000 in my D/L queue and BOINC does an update, if work is available, I will get more added to the D/L queue. As has been said many times over, from there, it's simply a matter of patience. But I'd consider downgrading my BOINC before I'd kill off work. Just my 2 cents and change. ID: 1213402 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213416 - Posted: 3 Apr 2012, 0:08:36 UTC Yeah, I have to use 7.x for other projects. Looks like I am stuck with the new, bizarre work fetch rules. Dublin, California Team: SETI.USA ID: 1213416 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1213419 - Posted: 3 Apr 2012, 0:34:31 UTC - in response to Message 1213416. Yeah, I have to use 7.x for other projects. Looks like I am stuck with the new, bizarre work fetch rules. Well if that's the case then I can see a lot of people giving all this up if we're all forced to adapt to 7.x and they can't provide a better solution than what is currently available in those versions. :( Cheers. ID: 1213419 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1213492 - Posted: 3 Apr 2012, 5:21:25 UTC - in response to Message 1213344. After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again. Or just download the last 2 & let it process those while it then gets more work. Grant Darwin NT ID: 1213492 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1213496 - Posted: 3 Apr 2012, 6:05:59 UTC Last modified: 3 Apr 2012, 6:06:48 UTC BOINC has had the restriction against asking for more work while any downloads are stalled for a long time, stalled meaning they're counting down to a retry. But even the 7.0 series seems to be configured to allow asking for work when there are none in that state, IOW when downloads are in progress even if no actual data is flowing. The big change in 7.0 is of course it's hysteresis work fetch. It doesn't ask for work until what's on hand is less than set by the preference which formerly had the advice saying it should be set to 0 if you have a 24/7 internet connection. I think when a 7.0 version gets to recommended status there will be a lot of users asking why it doesn't ask for work like previous versions. And I think many will then go and change the web preference and ask why it didn't help, so will need advice about the local preference settings. As that feature of 7.0.x hasn't been mentioned previously in this thread, there's a small possibility it may be involved in zombie67 [MM]'s problems. Rom Walton had noted a week or so ago that he thought 7.0.23 might become a recommended version, and although there have been some quick bug fixes since, he's directed the Alpha testers not to do a full set of tests on those additional minor step revisisions. So there may be a 7.0.x recommendation very soon. Joe ID: 1213496 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213556 - Posted: 3 Apr 2012, 11:51:46 UTC - in response to Message 1213492. After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again. Or just download the last 2 & let it process those while it then gets more work. "Just download" those last two how exactly? The whole problem is that they won't download. They remain stuck. Dublin, California Team: SETI.USA ID: 1213556 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1213560 - Posted: 3 Apr 2012, 11:59:57 UTC - in response to Message 1213496. BOINC has had the restriction against asking for more work while any downloads are stalled for a long time, stalled meaning they're counting down to a retry. But even the 7.0 series seems to be configured to allow asking for work when there are none in that state, IOW when downloads are in progress even if no actual data is flowing. The big change in 7.0 is of course it's hysteresis work fetch. It doesn't ask for work until what's on hand is less than set by the preference which formerly had the advice saying it should be set to 0 if you have a 24/7 internet connection. I think when a 7.0 version gets to recommended status there will be a lot of users asking why it doesn't ask for work like previous versions. And I think many will then go and change the web preference and ask why it didn't help, so will need advice about the local preference settings. As that feature of 7.0.x hasn't been mentioned previously in this thread, there's a small possibility it may be involved in zombie67 [MM]'s problems. Rom Walton had noted a week or so ago that he thought 7.0.23 might become a recommended version, and although there have been some quick bug fixes since, he's directed the Alpha testers not to do a full set of tests on those additional minor step revisisions. So there may be a 7.0.x recommendation very soon. Joe I noticed that same behavior in 6.12.33. As my computers were letting work run down to just what was running before requesting more work I changed the web option to 3 from 0. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1213560 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.