Panic Mode On (20) Server problems

Message boards : Number crunching : Panic Mode On (20) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 15 · Next

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 918107 - Posted: 15 Jul 2009, 17:20:16 UTC - in response to Message 918102.  
Last modified: 15 Jul 2009, 17:22:16 UTC


I have now maybe ~ 1,600 results ready for UL.

Your machine has 1,600 results, and it tries to connect 1600 times every four hours, more or less, depending on the backoff.

How many others out there have lots of uploads, and are retrying just as aggressively?


I hope you mean it not negative.


I have only one 24/7 PC..

Others have more PCs.. with same or more ULs?

I don't do it.. I don't press the 'retry/update' button.. BOINC do it automatically..

ID: 918107 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918111 - Posted: 15 Jul 2009, 17:24:38 UTC - in response to Message 918104.  

Ok. I give up. There seems no room for any creative ideas :)

There is room for creative ideas.

You asked a question: Why are downloads stopped when uploads are slow?

They're stopped because every download becomes an upload, and allowing downloads during tough times is like pouring gasoline on a fire.

If you have an idea why that logic is flawed, then support that position. Tell us why you think this should be an exception.

The best place for the creative thinking, IMHO, is around traffic flow.

If the average BOINC computer has ten uploads, and there are 180,000 active machines, then there are 1.8 million upload attempts every four hours, and a couple of servers are going to have a tough time getting 1.8 million uploads.

The 100 megabit pipe doesn't help, and distributing the upload servers to other sites is not a viable option with current funding and interdependencies.
ID: 918111 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 918113 - Posted: 15 Jul 2009, 17:29:42 UTC - in response to Message 918107.  


Server stsus page shows everything green (including the upload server), however the increase in network traffic has been, negligable.
So there might still be some problems with the upload server- i'd have expected uploads to max out as soon as the server came back up.
Grant
Darwin NT
ID: 918113 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 918115 - Posted: 15 Jul 2009, 17:31:39 UTC - in response to Message 918105.  

I know you are Sutaru. Actually you're probably more patient than me :-)

Good news, since my last post I've had 3 upload so things are improved.
ID: 918115 · Report as offensive
Profile rebest Project Donor
Volunteer tester
Avatar

Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 45,357,093
RAC: 0
United States
Message 918116 - Posted: 15 Jul 2009, 17:36:00 UTC - in response to Message 918106.  


NEWS FLASH - I just had one WU upload! Only about a dozen left now...

15/07/2009 1:04:50 PM SETI@home Finished upload of 24fe09ab.20328.11115.5.8.44_3_0


I've had a few upload, but they're not being acknowledged. In other words, the transfer shows 100%, but they're not coming off my transfer list. It's a start.



Join the PACK!
ID: 918116 · Report as offensive
Profile Samdani
Avatar

Send message
Joined: 21 Oct 00
Posts: 85
Credit: 13,480,553
RAC: 0
Pakistan
Message 918117 - Posted: 15 Jul 2009, 17:37:20 UTC - in response to Message 918111.  
Last modified: 15 Jul 2009, 17:51:38 UTC

Hmmm... so the answer would be either more funding for bigger pipes or perhaps making work units more computational intensive so that there is less traffic (which would also require funding in the shape of research and programming)
ID: 918117 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918118 - Posted: 15 Jul 2009, 17:38:21 UTC - in response to Message 918107.  
Last modified: 15 Jul 2009, 17:39:17 UTC


I have now maybe ~ 1,600 results ready for UL.

Your machine has 1,600 results, and it tries to connect 1600 times every four hours, more or less, depending on the backoff.

How many others out there have lots of uploads, and are retrying just as aggressively?


I hope you mean it not negative.


I have only one 24/7 PC..

Others have more PCs.. with same or more ULs?

I don't do it.. I don't press the 'retry/update' button.. BOINC do it automatically..

I don't mean it in a positive or negative way. It's just a fact of life.

As you say, this is what BOINC does. It tries very hard to push the work to the servers, and until we get a version that is less aggressive, we'll see high loads and lots of trouble.

... and we can fret over it, or we can wait it out.

I don't think there are enough here in the forums to do anything that'd matter.

Waiting is.
ID: 918118 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 918125 - Posted: 15 Jul 2009, 17:47:10 UTC - in response to Message 918117.  

Or just attach to other projects and let BOINC try and retry to upload your Seti tasks.

Another reason why downloads are stopped when you have that many uploads going is one of deadlines. If no uploads happen, you have a chance that all the work you got done and still get to crunch will eventually pass its deadline. What good is that?
ID: 918125 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 918135 - Posted: 15 Jul 2009, 18:10:31 UTC
Last modified: 15 Jul 2009, 18:13:01 UTC

13-7-2009 20:06:00 SETI@home Started upload of 05dc08ag.1874.11933.13.8.11_2_0
13-7-2009 20:06:22 Project communication failed: attempting access to reference site
13-7-2009 20:06:22 SETI@home Temporarily failed upload of 05dc08ag.1874.11933.13.8.11_2_0: connect() failed
13-7-2009 20:06:22 SETI@home Backing off 1 min 0 sec on upload of 05dc08ag.1874.11933.13.8.11_2_0
14-7-2009 13:28:48 SETI@home Temporarily failed upload of 10oc08ab.9949.257801.10.8.195_0_0: connect() failed
14-7-2009 13:28:48 SETI@home Backing off 1 hr 4 min 21 sec on upload of 10oc08ab.9949.257801.10.8.195_0_0
14-7-2009 13:28:50 Internet access OK - project servers may be temporarily down.
14-7-2009 13:29:24 SETI@home Started upload of 10oc08ab.9949.11115.10.8.96_0_0
14-7-2009 13:29:27 Project communication failed: attempting access to reference site
15-7-2009 20:04:35 SETI@home Temporarily failed upload of 17oc08ac.23107.6616.6.8.98_1_0: connect() failed
15-7-2009 20:04:35 SETI@home Backing off 1 hr 42 min 55 sec on upload of 17oc08ac.23107.6616.6.8.98_1_0
15-7-2009 20:04:37 Internet access OK - project servers may be temporarily down.
15-7-2009 20:05:22 SETI@home Started upload of 06dc08ac.22025.9888.10.8.71_0_0
15-7-2009 20:05:26 SETI@home Started upload of 21no08ab.23122.9888.6.8.135_0_0
15-7-2009 20:05:44 Project communication failed: attempting access to reference site
15-7-2009 20:05:44 SETI@home Temporarily failed upload of 06dc08ac.22025.9888.10.8.71_0_0: connect() failed


Hi, I noticed, the UPLoad SERVER is ENABLED, although I still can't UPload, what's up?
ID: 918135 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 918138 - Posted: 15 Jul 2009, 18:12:55 UTC - in response to Message 918135.  

Everyone else is trying to get through too. This creates a situation where only so many connections can be acknowledged before others get dropped. Hopefully the situation will improve as clients uploads tapper off.
ID: 918138 · Report as offensive
Profile TerryG
Avatar

Send message
Joined: 11 Mar 01
Posts: 16
Credit: 15,351,703
RAC: 37
United Kingdom
Message 918139 - Posted: 15 Jul 2009, 18:13:12 UTC
Last modified: 15 Jul 2009, 18:14:42 UTC

Could someone explain why the uploads aren't getting through quickly even though the Cricket graphs are showing (generally) less than 50Mbs? Not complaining, just wondering.

Link to graphs:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d%3Aw;view=octets
ID: 918139 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918143 - Posted: 15 Jul 2009, 18:14:58 UTC

Some of mine have got through - about 5% of my backlog is now 'ready to report', and I think some of them have reported already. Just give them time, and don't try to rush things.
ID: 918143 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 918145 - Posted: 15 Jul 2009, 18:16:08 UTC - in response to Message 918138.  
Last modified: 15 Jul 2009, 18:18:10 UTC

Thanx for your swift reply, Ozzfan. You are right, as everyone tries to UPload, SERVER gets clogged up again!
And Richard, too, you are quick.
ID: 918145 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 918148 - Posted: 15 Jul 2009, 18:17:51 UTC - in response to Message 918139.  

Just a theory, based on looking at Cricket graphs for the last week or month.

It looks to me like maybe the Berkley Boys have been playing with different ways of throttling downloads and uploads, in order to keep the 100 meg pipe full of mostly data, instead of clogging it with ACKs and other overhead when we try to cram too much into it all at once.

ID: 918148 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 918151 - Posted: 15 Jul 2009, 18:21:26 UTC - in response to Message 918148.  

Just a theory, based on looking at Cricket graphs for the last week or month.

It looks to me like maybe the Berkley Boys have been playing with different ways of throttling downloads and uploads, in order to keep the 100 meg pipe full of mostly data, instead of clogging it with ACKs and other overhead when we try to cram too much into it all at once.

Possibly, but i still thing something's borked.
The problem has generally been downloads killing uploads, not the other way around. At the present rate it'll take a couple of days for the uploads to clear.
Grant
Darwin NT
ID: 918151 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918156 - Posted: 15 Jul 2009, 18:31:52 UTC - in response to Message 918151.  

Just a theory, based on looking at Cricket graphs for the last week or month.

It looks to me like maybe the Berkley Boys have been playing with different ways of throttling downloads and uploads, in order to keep the 100 meg pipe full of mostly data, instead of clogging it with ACKs and other overhead when we try to cram too much into it all at once.

Possibly, but i still thing something's borked.
The problem has generally been downloads killing uploads, not the other way around. At the present rate it'll take a couple of days for the uploads to clear.

Bloody moderators moving things around while I'm typing - got a 'Wrong Thread' error.

Anyway, to repeat what you didn't see me say:

I'm going to turn networking off, and go out to the pub. When you lot have finished crowding round the 'upload' door, could you leave a little space when I get back to get mine through? :-)
ID: 918156 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 918159 - Posted: 15 Jul 2009, 18:34:21 UTC - in response to Message 918156.  

I'm going to turn networking off,

Already done that myself. Tried for 30min and not one got through so decided to wait till things start to clear a bit.
Being this far from the servers getting through all those other attempts just isn't possible.

Grant
Darwin NT
ID: 918159 · Report as offensive
Profile jay_e

Send message
Joined: 6 Apr 03
Posts: 62
Credit: 1,072,112
RAC: 0
United States
Message 918164 - Posted: 15 Jul 2009, 18:38:06 UTC - in response to Message 918151.  

Hi Grant,

I posted an hour ago, asking what a 'good' wait is.
So far, I've waited since Sunday.

Yes, I understand about traffic - just want to know what the average number of days one should wait for a job to upload.

Any guess?

Thanks,
Jay

ID: 918164 · Report as offensive
Profile cliff west

Send message
Joined: 7 May 01
Posts: 211
Credit: 16,180,728
RAC: 15
United States
Message 918165 - Posted: 15 Jul 2009, 18:42:36 UTC - in response to Message 918164.  

it looks like they know and are working the issue... and they hope it will level out soon. see new post at top of the board
ID: 918165 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 918166 - Posted: 15 Jul 2009, 18:42:48 UTC

no problemo's here - reattached project today - got work . . . crunchin' [Issues been fixed]

btw - Thanks to All @ Berkeley . . .

< nEXt


BOINC Wiki . . .

Science Status Page . . .
ID: 918166 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 15 · Next

Message boards : Number crunching : Panic Mode On (20) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.