Panic Mode On (58) Server problems?

Message boards : Number crunching : Panic Mode On (58) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 1161073 - Posted: 10 Oct 2011, 22:31:16 UTC

Looks like Cricket is taking a nose dive. I cannot connect at present. (from So Cal)
Dave

ID: 1161073 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1161088 - Posted: 10 Oct 2011, 23:06:11 UTC

Upload server and scheduler are both offline. They must be tweaking something.
ID: 1161088 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1161093 - Posted: 10 Oct 2011, 23:13:34 UTC

I can ping all four servers despite the dip in the graph. I don't know if that means things are still responding to HTTP requests or database queries though.

Regarding the "not solid" maxed-out time period of the cricket graph, that's due to the round-robin DNS still pointing to the frozen download server. Those who haven't put the entry in their HOSTS file to point at .13 would have seen downloads time-out when they try in the 5-minute period where .18 is listed as the primary. That had/has the potential to cause those dreaded ridiculous back-offs due to a single failed transfer, and then when it would try again, there's a 50/50 chance you'll get .18 again.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1161093 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1161097 - Posted: 10 Oct 2011, 23:29:04 UTC
Last modified: 10 Oct 2011, 23:29:35 UTC

Time to panic!
11/10/2011 00:21:57 | SETI@home | Scheduler request failed: Couldn't connect to server

The end is nigh, THE END IS NIGH I SAY! :)
ID: 1161097 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1161104 - Posted: 11 Oct 2011, 0:16:48 UTC - in response to Message 1161097.  

"This is 'Crunch Alert.' Is everything ok?"

"My cache has fallen, and it can't fill-up!!"
ID: 1161104 · Report as offensive
Profile Jim_S
Avatar

Send message
Joined: 23 Feb 00
Posts: 4705
Credit: 64,560,357
RAC: 31
United States
Message 1161115 - Posted: 11 Oct 2011, 1:05:45 UTC

Can't Report Again.

I Desire Peace and Justice, Jim Scott (Mod-Ret.)
ID: 1161115 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1161135 - Posted: 11 Oct 2011, 2:13:02 UTC

The grass is growing on the Cricket field again....
                                                                 Joe
ID: 1161135 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13797
Credit: 208,696,464
RAC: 304
Australia
Message 1161163 - Posted: 11 Oct 2011, 7:16:22 UTC - in response to Message 1161043.  

Any chance that this temporary fix could be made permanent?

I would hope not as most of my requests for work result in none. Prior to the temporary bodge that wasn't the case.

Grant
Darwin NT
ID: 1161163 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13797
Credit: 208,696,464
RAC: 304
Australia
Message 1161164 - Posted: 11 Oct 2011, 7:30:30 UTC


The limit on tasks logic appears somewhat scrambled at the moment. I've reached my limit for CPU tasks, and that resulted in one system running out of GPU work before it was able to get anymore.
Now that it has got some GPU work, it then downloaded another CPU task. Once again, even if only requesting GPU work, it's told it's reached the limit for tasks, even though it's got a bit over a dozen. Given the present mix of work, it might last for an hour before running dry again.
Grant
Darwin NT
ID: 1161164 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19219
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1161166 - Posted: 11 Oct 2011, 7:37:07 UTC - in response to Message 1161164.  


The limit on tasks logic appears somewhat scrambled at the moment. I've reached my limit for CPU tasks, and that resulted in one system running out of GPU work before it was able to get anymore.
Now that it has got some GPU work, it then downloaded another CPU task. Once again, even if only requesting GPU work, it's told it's reached the limit for tasks, even though it's got a bit over a dozen. Given the present mix of work, it might last for an hour before running dry again.

I ran into that problem, then the other day I had 4 AP's running on the CPU and the GPU tasks which have v.high initial completion times drove the DCF v.low. This caused a request for new GPU (and CPU) tasks each time a gpu tasks completed once the DCF was below 0.6ish.
ID: 1161166 · Report as offensive
S@NL - John van Gorsel
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 193
Credit: 139,673,078
RAC: 0
Netherlands
Message 1161192 - Posted: 11 Oct 2011, 10:13:24 UTC - in response to Message 1160753.  

For those that like to fiddle with these things, the working download server - bane this time - is 208.68.240.13

That's the other way round from usual, so anyone who still has .18 in their hosts file will be stuck until Tuesday.


It looks as if it is back to .18 now


Seti@Netherlands website
ID: 1161192 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22324
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1161198 - Posted: 11 Oct 2011, 11:18:49 UTC

A few moments ago I noticed a drop in activity on the crickets, looking at the server status page it would appear that everything on Vader has come to a stand - has Vader met the light side of The Force...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1161198 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22324
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1161199 - Posted: 11 Oct 2011, 11:33:42 UTC

And now the crickets are on the floor of their cage, hardly twitching :-(

That was fun while it lasted, however it is Tuesday so there should be living bodies in the lab in a few hours to sort things out...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1161199 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22324
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1161200 - Posted: 11 Oct 2011, 11:35:02 UTC

As if by magic, as I typed the servers came back to life, sort of...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1161200 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1161219 - Posted: 11 Oct 2011, 13:14:36 UTC - in response to Message 1161200.  

It would be superfantastic if ap_validate3 would come back to life sometime soon.

I have over 200 pending AP results at the moment :/
ID: 1161219 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1161246 - Posted: 11 Oct 2011, 15:14:07 UTC - in response to Message 1161219.  
Last modified: 11 Oct 2011, 15:25:17 UTC

Sometimes, as I noticed again, today, a 'few extra' retries, can help a host to
actually get work, BBA . (BOINC Button Abuse)

Isn't pending always good, as they would have to be VALID and just wait
for your wingmen/women, to report a valid result and make a canonical result?
ID: 1161246 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1161251 - Posted: 11 Oct 2011, 20:42:06 UTC - in response to Message 1161219.  

It would be superfantastic if ap_validate3 would come back to life sometime soon.

I have over 200 pending AP results at the moment :/


Well, I think everybody's hoping that... But all we can do is wait and see if the outage helps this week. Off to bed now in Europe, See if my RAC is back up a little in the morning.
ID: 1161251 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1161280 - Posted: 11 Oct 2011, 22:39:42 UTC

Since things came back online after the weekly outage today I am having trouble getting completed tasks uploaded.

57735 10/11/2011 5:23:59 PM Internet access OK - project servers may be temporarily down.
57736 SETI@home 10/11/2011 5:25:01 PM Started upload of 16jl11ab.1185.67397.12.10.19_1_0
57737 SETI@home 10/11/2011 5:25:01 PM Started upload of 10jl11aa.29866.2112.7.10.212_1_0
57738 SETI@home 10/11/2011 5:25:01 PM Started upload of 16jl11ab.1185.67397.12.10.20_1_0
57739 SETI@home 10/11/2011 5:25:01 PM Started upload of 10jl11aa.29866.2112.7.10.217_1_0
57740 SETI@home 10/11/2011 5:25:01 PM Started upload of 10jl11aa.29866.2112.7.10.210_0_0
57741 SETI@home 10/11/2011 5:25:01 PM Started upload of 25jn11ab.28755.5793.16.10.242_0_0
57742 SETI@home 10/11/2011 5:25:01 PM Started upload of 25jn11ab.28755.5793.16.10.240_0_0
57743 10/11/2011 5:25:21 PM Project communication failed: attempting access to reference site
57744 SETI@home 10/11/2011 5:25:21 PM Temporarily failed upload of 16jl11ab.1185.67397.12.10.19_1_0: connect() failed
57745 SETI@home 10/11/2011 5:25:21 PM Backing off 4 hr 13 min 59 sec on upload of 16jl11ab.1185.67397.12.10.19_1_0
57746 SETI@home 10/11/2011 5:25:21 PM Temporarily failed upload of 10jl11aa.29866.2112.7.10.212_1_0: HTTP error
57747 SETI@home 10/11/2011 5:25:21 PM Backing off 1 hr 38 min 46 sec on upload of 10jl11aa.29866.2112.7.10.212_1_0
57748 SETI@home 10/11/2011 5:25:21 PM Temporarily failed upload of 16jl11ab.1185.67397.12.10.20_1_0: HTTP error
57749 SETI@home 10/11/2011 5:25:21 PM Backing off 1 hr 30 min 58 sec on upload of 16jl11ab.1185.67397.12.10.20_1_0
57750 SETI@home 10/11/2011 5:25:21 PM Temporarily failed upload of 10jl11aa.29866.2112.7.10.217_1_0: HTTP error
57751 SETI@home 10/11/2011 5:25:21 PM Backing off 1 hr 6 min 50 sec on upload of 10jl11aa.29866.2112.7.10.217_1_0
57752 SETI@home 10/11/2011 5:25:21 PM Temporarily failed upload of 10jl11aa.29866.2112.7.10.210_0_0: HTTP error
57753 SETI@home 10/11/2011 5:25:21 PM Backing off 1 hr 1 min 17 sec on upload of 10jl11aa.29866.2112.7.10.210_0_0
57754 SETI@home 10/11/2011 5:25:21 PM Temporarily failed upload of 25jn11ab.28755.5793.16.10.242_0_0: HTTP error
57755 SETI@home 10/11/2011 5:25:21 PM Backing off 1 hr 16 min 49 sec on upload of 25jn11ab.28755.5793.16.10.242_0_0
57756 SETI@home 10/11/2011 5:25:21 PM Temporarily failed upload of 25jn11ab.28755.5793.16.10.240_0_0: HTTP error
57757 SETI@home 10/11/2011 5:25:21 PM Backing off 26 min 23 sec on upload of 25jn11ab.28755.5793.16.10.240_0_0
57758 10/11/2011 5:25:23 PM Internet access OK - project servers may be temporarily down.


I have a host entry to force these nodes to use .18 for downloads but there is only one upload server, correct? Is anyone else experiencing upload issues?
ID: 1161280 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14660
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1161281 - Posted: 11 Oct 2011, 22:43:31 UTC - in response to Message 1161280.  

Is anyone else experiencing upload issues?

Yes, getting lots of timeouts.

And the server status page hasn't updated for over an hour. Things are often sticky for a while after maintenance, but this is stickier than usual.
ID: 1161281 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1161283 - Posted: 11 Oct 2011, 22:46:53 UTC - in response to Message 1161246.  

...
Isn't pending always good, as they would have to be VALID and just wait for your wingmen/women, to report a valid result and make a canonical result?

No, pending merely means the science application did not know of any problem doing the work, so it is reported as a "success". The Validator must check it against a wingmate's result before it can become VALID.

Those who maintain their systems well and don't go too far with an overclock or similar can expect pending tasks to become valid. But they may be more like a stock market investment than like money in the bank.
                                                                   Joe
ID: 1161283 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (58) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.