how much GPU can the download server support?

Message boards : Number crunching : how much GPU can the download server support?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1213349 - Posted: 2 Apr 2012, 21:21:33 UTC - in response to Message 1213346.  

Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't.

Under normal circumstances it would actually work.
ie download bandwidth is only maxed out 1-10% of the time. The backoffs would help clear that load. But here at Seti where the bandwidth is maxed out all the time, it makes it impossible to download any work.


If bandwidth is the bottle neck, then the new sever will not improve the situation, right? Are there any plans to improve bandwidth?


Yes and that's all I can really say at this point.

The new server with attached JBOD should help smoothe downloads and uploads as well.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1213349 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1213350 - Posted: 2 Apr 2012, 21:22:18 UTC - in response to Message 1213305.  

Don't forget we have a new download server coming online soon.

It will be interesting to see if it does help with the download problems, although if it does all it will be doing is working around the problem that the load sharing between the two servers, for whatever reason, just isn't working too well. One server appears to carry most of the load- hence trying to download from it results in frustration. Whereas the other server doesn't have much load & so downloads from it, even at the worst of times, are possible.

Although presently with all the AP work & the shorites going through the system downloads are more difficult than they have been. And even Scheduler requests are taking a while to get a response, so there could be some other servver issues lurking in the backgropund at the moment.


The plan right now is to replace all 3 of the current download/upload servers and compile them into one server.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1213350 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 1213367 - Posted: 2 Apr 2012, 21:51:54 UTC - in response to Message 1213350.  
Last modified: 2 Apr 2012, 21:52:14 UTC

The plan right now is to replace all 3 of the current download/upload servers and compile them into one server.


hm... then it'll probably need the tcp/ip settings tweaking a lot to handle the number of sessions and a shedload of cpu for connection tracking and massive disk concurrency.

I think 4 servers, using iptables to route based on least significant two bits over a 1G network, or 4x100mpbs split between different providers would get things running acceptably again.

The current state of affairs is really really painful, with so much bandwidth wasted in retries.

The lowest hanging fruit (knee level) is to change the WU data encoding for MB tasks from base64 to binary (ala astropulse), that would win about 20% efficiency for a simple tweak.
ID: 1213367 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1213370 - Posted: 2 Apr 2012, 22:00:05 UTC

The plan right now is to replace all 3 of the current download/upload server and compile them into one server.

If they really do that and keep using Apache, we are toast, I'm afraid. :(
ID: 1213370 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1213374 - Posted: 2 Apr 2012, 22:04:24 UTC - in response to Message 1213370.  

The plan right now is to replace all 3 of the current download/upload server and compile them into one server.

If they really do that and keep using Apache, we are toast, I'm afraid. :(


If it comes down to needing another server (assuming that our one new beast won't handle everything) we'll fire up the money begging machine and build another.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1213374 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1213378 - Posted: 2 Apr 2012, 22:13:53 UTC - in response to Message 1213344.  
Last modified: 2 Apr 2012, 22:15:31 UTC

After the 8th failure or so, I abort them.

Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch.


Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again.

Boinc will also ask for work if NO downloads are backed off, so eithier use SIV on the Boinc status page, or manually retry all downloads,

Claggy
ID: 1213378 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213394 - Posted: 2 Apr 2012, 22:49:20 UTC - in response to Message 1213378.  

After the 8th failure or so, I abort them.

Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch.


Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again.

Boinc will also ask for work if NO downloads are backed off, so eithier use SIV on the Boinc status page, or manually retry all downloads,

Claggy


Yes, I am already using SIV, as I said. And I have also tried manually retrying the downloads. Still BOINC will not request more tasks until the queue is completely empty. And when that one task that just refuses to download after 8+ retries, I call uncle, and abort it.
Dublin, California
Team: SETI.USA
ID: 1213394 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1213402 - Posted: 2 Apr 2012, 23:31:45 UTC - in response to Message 1213344.  

After the 8th failure or so, I abort them.

Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch.


Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again.


I see you're using development versions of BOINC (7.x). Now, I have no idea how those versions handle downloads, but the version I use (6.10.58) will request new work and add to the download queue ad infinitum until the limits kick in. So, if I have 1000 in my D/L queue and BOINC does an update, if work is available, I will get more added to the D/L queue.

As has been said many times over, from there, it's simply a matter of patience. But I'd consider downgrading my BOINC before I'd kill off work. Just my 2 cents and change.
ID: 1213402 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213416 - Posted: 3 Apr 2012, 0:08:36 UTC

Yeah, I have to use 7.x for other projects. Looks like I am stuck with the new, bizarre work fetch rules.
Dublin, California
Team: SETI.USA
ID: 1213416 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1213419 - Posted: 3 Apr 2012, 0:34:31 UTC - in response to Message 1213416.  

Yeah, I have to use 7.x for other projects. Looks like I am stuck with the new, bizarre work fetch rules.

Well if that's the case then I can see a lot of people giving all this up if we're all forced to adapt to 7.x and they can't provide a better solution than what is currently available in those versions. :(

Cheers.
ID: 1213419 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1213492 - Posted: 3 Apr 2012, 5:21:25 UTC - in response to Message 1213344.  

After the 8th failure or so, I abort them.

Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch.


Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again.

Or just download the last 2 & let it process those while it then gets more work.
Grant
Darwin NT
ID: 1213492 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1213496 - Posted: 3 Apr 2012, 6:05:59 UTC
Last modified: 3 Apr 2012, 6:06:48 UTC

BOINC has had the restriction against asking for more work while any downloads are stalled for a long time, stalled meaning they're counting down to a retry. But even the 7.0 series seems to be configured to allow asking for work when there are none in that state, IOW when downloads are in progress even if no actual data is flowing.

The big change in 7.0 is of course it's hysteresis work fetch. It doesn't ask for work until what's on hand is less than set by the preference which formerly had the advice saying it should be set to 0 if you have a 24/7 internet connection. I think when a 7.0 version gets to recommended status there will be a lot of users asking why it doesn't ask for work like previous versions. And I think many will then go and change the web preference and ask why it didn't help, so will need advice about the local preference settings. As that feature of 7.0.x hasn't been mentioned previously in this thread, there's a small possibility it may be involved in zombie67 [MM]'s problems.

Rom Walton had noted a week or so ago that he thought 7.0.23 might become a recommended version, and although there have been some quick bug fixes since, he's directed the Alpha testers not to do a full set of tests on those additional minor step revisisions. So there may be a 7.0.x recommendation very soon.
                                                                  Joe
ID: 1213496 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213556 - Posted: 3 Apr 2012, 11:51:46 UTC - in response to Message 1213492.  

After the 8th failure or so, I abort them.

Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch.


Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again.

Or just download the last 2 & let it process those while it then gets more work.


"Just download" those last two how exactly? The whole problem is that they won't download. They remain stuck.
Dublin, California
Team: SETI.USA
ID: 1213556 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1213560 - Posted: 3 Apr 2012, 11:59:57 UTC - in response to Message 1213496.  

BOINC has had the restriction against asking for more work while any downloads are stalled for a long time, stalled meaning they're counting down to a retry. But even the 7.0 series seems to be configured to allow asking for work when there are none in that state, IOW when downloads are in progress even if no actual data is flowing.

The big change in 7.0 is of course it's hysteresis work fetch. It doesn't ask for work until what's on hand is less than set by the preference which formerly had the advice saying it should be set to 0 if you have a 24/7 internet connection. I think when a 7.0 version gets to recommended status there will be a lot of users asking why it doesn't ask for work like previous versions. And I think many will then go and change the web preference and ask why it didn't help, so will need advice about the local preference settings. As that feature of 7.0.x hasn't been mentioned previously in this thread, there's a small possibility it may be involved in zombie67 [MM]'s problems.

Rom Walton had noted a week or so ago that he thought 7.0.23 might become a recommended version, and although there have been some quick bug fixes since, he's directed the Alpha testers not to do a full set of tests on those additional minor step revisisions. So there may be a 7.0.x recommendation very soon.
                                                                  Joe

I noticed that same behavior in 6.12.33. As my computers were letting work run down to just what was running before requesting more work I changed the web option to 3 from 0.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1213560 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : how much GPU can the download server support?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.