Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · Next
Author Message
zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45810
Credit: 36,431,404
RAC: 6,720
Message 1321155 - Posted: 29 Dec 2012, 6:23:40 UTC - in response to Message 1321153.

Uploads are still working but it takes a while and can be very temperamental. However I can't get a scheduler to not timeout and without that I can't see if downloads are equally temperamental.

Something is stopping up the pipes. Cricket now shows a nasty spike of incoming packets with a similar spike on the outgoing packets. Incoming packets now around 1/3 higher than is was yesterday and outgoing packets at nearly 10 kpkt/s.

Looks as if the chart started to go off "normal" about 24 hours ago. I imagine we are seeing some kind of cascading failure condition, clogging the pipes with retries that then trigger more hosts to fail connecting which then triggers more retires that clogs the pipes even more, etc.

This is what I'm seeing in my log

12/29/2012 1:09:16 AM | SETI@home | Reporting 6 completed tasks, requesting new tasks for CPU and ATI
12/29/2012 1:09:18 AM | | Project communication failed: attempting access to reference site
12/29/2012 1:09:18 AM | SETI@home | Scheduler request failed: Server returned nothing (no headers, no data)
12/29/2012 1:09:20 AM | | Internet access OK - project servers may be temporarily down.


Question is, can anybody report?
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,522,651
RAC: 48,511
Australia
Message 1321158 - Posted: 29 Dec 2012, 6:30:08 UTC - in response to Message 1321155.
Last modified: 29 Dec 2012, 6:30:59 UTC

Question is, can anybody report?

Very, very, very occasionally.
Very.


And that's with No New Tasks Set.
Even less often when trying for more work.
____________
Grant
Darwin NT.

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,709,025
RAC: 2,306
United States
Message 1321160 - Posted: 29 Dec 2012, 6:31:04 UTC - in response to Message 1321155.
Last modified: 29 Dec 2012, 6:32:47 UTC

Honestly it's becoming SOP that when I complain it starts working. Well sort of.

12/29/2012 1:23:30 AM | SETI@home | Reporting 7 completed tasks, requesting new tasks for CPU and ATI
12/29/2012 1:24:54 AM | SETI@home | Scheduler request completed: got 7 new tasks
12/29/2012 1:24:56 AM | SETI@home | Started download of 22dc09ac.3555.167212.140733193388041.10.92
12/29/2012 1:24:56 AM | SETI@home | Started download of 08oc12aa.26644.119686.4.10.1.vlar
12/29/2012 1:25:05 AM | SETI@home | Computation for task 26no12ad.24534.17654.140733193388043.10.56_1 finished
12/29/2012 1:25:05 AM | SETI@home | Starting task 22no12ad.3861.7429.140733193388038.10.76.vlar_1 using setiathome_enhanced version 603 in slot 0
12/29/2012 1:25:07 AM | SETI@home | Started upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:25:29 AM | | Project communication failed: attempting access to reference site
12/29/2012 1:25:29 AM | SETI@home | Temporarily failed upload of 26no12ad.24534.17654.140733193388043.10.56_1_0: connect() failed
12/29/2012 1:25:29 AM | SETI@home | Backing off 2 min 13 sec on upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:25:31 AM | | Internet access OK - project servers may be temporarily down.
12/29/2012 1:27:43 AM | SETI@home | Started upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:28:58 AM | | Project communication failed: attempting access to reference site
12/29/2012 1:28:58 AM | SETI@home | Temporarily failed upload of 26no12ad.24534.17654.140733193388043.10.56_1_0: connect() failed
12/29/2012 1:28:58 AM | SETI@home | Backing off 4 min 50 sec on upload of 26no12ad.24534.17654.140733193388043.10.56_1_0
12/29/2012 1:28:59 AM | | Internet access OK - project servers may be temporarily down.
12/29/2012 1:29:59 AM | | Project communication failed: attempting access to reference site
12/29/2012 1:29:59 AM | SETI@home | Temporarily failed download of 08oc12aa.26644.119686.4.10.1.vlar: transient HTTP error
12/29/2012 1:29:59 AM | SETI@home | Backing off 3 min 5 sec on download of 08oc12aa.26644.119686.4.10.1.vlar
12/29/2012 1:29:59 AM | SETI@home | Started download of 22dc09ac.3555.167212.140733193388041.10.98
12/29/2012 1:30:00 AM | | Internet access OK - project servers may be temporarily down.


And the answer to my previous post is yes, downloads are equally cranky.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,522,651
RAC: 48,511
Australia
Message 1321162 - Posted: 29 Dec 2012, 6:32:35 UTC - in response to Message 1321150.

Seems as if something has received a fix as my uploads are suddenly start to clear. :)

Still takes multiple clicks of the Retry button for me.

____________
Grant
Darwin NT.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6542
Credit: 90,757,790
RAC: 75,175
Australia
Message 1321163 - Posted: 29 Dec 2012, 6:36:05 UTC - in response to Message 1321150.

Seems as if something has received a fix as my uploads are suddenly start to clear. :)

Cheers.

Well that only lasted long enough to clear 2 rigs. :(

Cheers.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,522,651
RAC: 48,511
Australia
Message 1321170 - Posted: 29 Dec 2012, 6:48:15 UTC - in response to Message 1321162.

Seems as if something has received a fix as my uploads are suddenly start to clear. :)

Still takes multiple clicks of the Retry button for me.


Although it's not taking as many clicks as it was.
However this upload improvement (minor as it is) appears to have come at the expense of Scheduler contact. As difficult as it has been, it's even more borked than it was before.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,522,651
RAC: 48,511
Australia
Message 1321180 - Posted: 29 Dec 2012, 7:38:43 UTC - in response to Message 1321170.


Not getting many Scheduler timeouts now.
Now they just fail.

I can upload, but not contact the Schedluer- or it times out. Or i can upload, but the Scheduler just fails.
Either way, getting work just isn't possible.
____________
Grant
Darwin NT.

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45810
Credit: 36,431,404
RAC: 6,720
Message 1321183 - Posted: 29 Dec 2012, 7:43:15 UTC

I got 33wu's, of course I can't work on them for about 12 hours and 19 minutes... Einstein is in the way...
____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38342
Credit: 561,921,067
RAC: 640,903
United States
Message 1321184 - Posted: 29 Dec 2012, 7:43:49 UTC

It's just the alien's way of trying to know that not too m;any of us shall find them too soon.

____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,522,651
RAC: 48,511
Australia
Message 1321187 - Posted: 29 Dec 2012, 7:59:38 UTC - in response to Message 1321184.


And now uploads have gone from bad to worse, but Scheduler contacts haven't improved at all.
It'll probably all come crashing down before morning.
____________
Grant
Darwin NT.

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7029
Credit: 59,333,165
RAC: 21,465
Germany
Message 1321192 - Posted: 29 Dec 2012, 8:25:29 UTC

:-(

Secondary projects in my BOINC are 'happy' about ..

I guess (worry) S@h is down, until after the first power outage ..

After a fresh start .. - at Jan/7, 8 should run 'everything' again well.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR



>Das Deutsche Cafe. The German Cafe.<

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,136,801
RAC: 2,076
United States
Message 1321217 - Posted: 29 Dec 2012, 10:16:30 UTC

It's not completely dead. I got 3 or 4 uploads through and got one scheduler contact to report some completions in the last half hour. Can't get other uploads through or get new work, so gpu is crunching Einstein on 0 resource share to keep the house warm.

It looks like a long weekend + holiday. Wish they'd loosen the limits if they are even needed - I'm not convinced of that.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38342
Credit: 561,921,067
RAC: 640,903
United States
Message 1321221 - Posted: 29 Dec 2012, 10:32:06 UTC - in response to Message 1321187.


And now uploads have gone from bad to worse, but Scheduler contacts haven't improved at all.
It'll probably all come crashing down before morning.

Every once in a while, the kitty claws get through to the servers letting them know they still want work......
It's still all basically fukayed.
Mostly by the server problems.
But also by the underpinnings of Boinc's 'don't bother the servers' attitude.
Server contacts borked, don't try any more.
Too many uploads pending, don't ask for work.
Even ONE download backed off, don't ask for work.
Things not running smoothly, don't even ask.
A fart in the carload...back off.

Any excuse to back off.
And, if things work, stop at 100 tasks for your GPUs.
Even if they can return 100 tasks in a few hours or less.

This is rather maddening when one has 9 rigs online begging to do as much work possible for this project as they can. THIS project, not Einstein or some other backup. THIS project. The one I care about.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,522,651
RAC: 48,511
Australia
Message 1321236 - Posted: 29 Dec 2012, 11:00:36 UTC - in response to Message 1321221.
Last modified: 29 Dec 2012, 11:01:34 UTC

Whatever has stuffed things up, it's an odd one. There have been a couple of spikes in inbound traffic, but uploads & Scheduler requests have still been stuffed even during those periods. There's been the odd very slight dip in outbound traffic, yet most Scheduler requests result in an error, those that aren't an error timeout- It'd would be 5% or less of the requests that actually get a result.
Still the network traffic (inbound & outbound) is right up there.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38342
Credit: 561,921,067
RAC: 640,903
United States
Message 1321237 - Posted: 29 Dec 2012, 11:05:39 UTC - in response to Message 1321236.

Whatever has stuffed things up, it's an odd one. There have been a couple of spikes in inbound traffic, but uploads & Scheduler requests have still been stuffed even during those periods. There's been the odd very slight dip in outbound traffic, yet most Scheduler requests result in an error, those that aren't an error timeout- It'd would be 5% or less of the requests that actually get a result.
Still the network traffic (inbound & outbound) is right up there.

Two very odd upticks in received traffic shown by the crickets.

Dunno.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,522,651
RAC: 48,511
Australia
Message 1321238 - Posted: 29 Dec 2012, 11:13:10 UTC - in response to Message 1321237.

Two very odd upticks in received traffic shown by the crickets.

Yeah, those 2 (and even the smaller more recent one)- yet even at those times uploads & Scheduler requests were still screwed.


It's bed time.
Will be interesting to see if either system still has any work in the morning.
____________
Grant
Darwin NT.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12260
Credit: 2,554,961
RAC: 754
Netherlands
Message 1321248 - Posted: 29 Dec 2012, 12:53:01 UTC - in response to Message 1321122.

I had to resort to Einstein

... Yes, it's their way of saying "come over to us, we're still trying to reach 1 Petaflop (1,000 Teraflop), so we're wrecking Seti on purpose". ;-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,136,801
RAC: 2,076
United States
Message 1321272 - Posted: 29 Dec 2012, 13:59:24 UTC
Last modified: 29 Dec 2012, 14:02:34 UTC

Just got limits worth of ghosts and then the first 20 started downloading. Downloads look as tough as uploads.

Edit: had received only 21 tasks in the preceding 24 hourts (per BOINCTasks), and the ghosts are new.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Tom
Send message
Joined: 12 Aug 11
Posts: 114
Credit: 4,566,097
RAC: 0
United States
Message 1321303 - Posted: 29 Dec 2012, 15:21:50 UTC
Last modified: 29 Dec 2012, 15:23:52 UTC

and it all started when they mowed the green grass,
when I started receiving shorties.

They used gum and bailing wire to fix the
combo of Normal MB's and AP's but they need to lengthen shorties by turning them into VLAR CPU jobs or by some other means.

Thats the way CSMA/CD works. its fine up to 85% to 90% of the rated bandwidth

then it falls over itself from that point on to 100%. Especialy if they do not

prioritize the control traffic!!! ACK ACK ACK ACK.

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6828
Credit: 24,676,829
RAC: 26,688
United Kingdom
Message 1321305 - Posted: 29 Dec 2012, 15:30:15 UTC
Last modified: 29 Dec 2012, 15:30:39 UTC

Seems no uploads no downloads and no reports! Other than that fine!! :-)
____________


Today is life, the only life we're sure of. Make the most of today.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California