Panic Mode On (91) Server Problems?

Message boards : Number crunching : Panic Mode On (91) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 21 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13739
Credit: 208,696,464
RAC: 304
Australia
Message 1597112 - Posted: 5 Nov 2014, 8:55:04 UTC - in response to Message 1597110.  

All uploads are going into "pending" though, my RAC hasn't changed since early afternoon on Monday. ;-)

Uploads always go "in to pending". It's not until they have been validated that you will get credit.
Given the length & severity of this outage, I expect it will take several hours at least before the validators will start to even put a dent in pendings. And as things stand at present- all the MB validators are offline.
Also, while the "Fix AP assimilator is running", the AP assimilators aren't. So that will result in some congestion as well.
Grant
Darwin NT
ID: 1597112 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34816
Credit: 261,360,520
RAC: 489
Australia
Message 1597122 - Posted: 5 Nov 2014, 9:26:17 UTC

It looks like that 28mr11ab is stuck again splitting AP's, but at least this time it's only got 3 splitters tied up.

Other than that, my caches are filling nicely.

Cheers.
ID: 1597122 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13739
Credit: 208,696,464
RAC: 304
Australia
Message 1597129 - Posted: 5 Nov 2014, 9:57:12 UTC - in response to Message 1597122.  
Last modified: 5 Nov 2014, 9:57:47 UTC

It looks like that 28mr11ab is stuck again splitting AP's, but at least this time it's only got 3 splitters tied up.

I'm thinking there may be a stuck MB tape or 2 as well. WUs are being pumped out at a very good rate, 35/s although with the extra splitter 40-45 wasn't unusual.

But it shows just how great the demand is- when the system came back to life there were 500,000 WUs ready-to-send, it's now down to 100,000 & still falling like a stone- even with the 35 WU/s being split; usually 30/s is enough to supply the demand for work & re-fill the ready-to-send buffer.
Grant
Darwin NT
ID: 1597129 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34816
Credit: 261,360,520
RAC: 489
Australia
Message 1597146 - Posted: 5 Nov 2014, 10:35:48 UTC

1 rig is full again and the other isn't far off.

Cheers.
ID: 1597146 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1597225 - Posted: 5 Nov 2014, 16:16:29 UTC

Uploads started flowing before anything else did. So you may have been able to upload everything and try to report, only to still get a "down for maintenance" message. (In the case of Beta, my phone finally uploaded the two tasks it had, but still can't report them, and the web site is still down.)

The Cricket graph shows the predictable spike, but it has ramped down and most people should not have server-caused problems now (IMHO).

I hope Beta comes back before my phone finishes the two Einsteins it's working on. I figure they've got about 2 to 2.5 hours of run time to go (which they won't get all at once).
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1597225 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1597234 - Posted: 5 Nov 2014, 16:48:22 UTC - in response to Message 1597225.  

Uploads started flowing before anything else did. So you may have been able to upload everything and try to report, only to still get a "down for maintenance" message. (In the case of Beta, my phone finally uploaded the two tasks it had, but still can't report them, and the web site is still down.)

The Cricket graph shows the predictable spike, but it has ramped down and most people should not have server-caused problems now (IMHO).

I hope Beta comes back before my phone finishes the two Einsteins it's working on. I figure they've got about 2 to 2.5 hours of run time to go (which they won't get all at once).

I'm actually finding that new work is flowing more freely than it has been doing for the last few days - although I haven't been watching closely, I don't think there have been any cases of 'project has no tasks available' when I've asked for work (MB only - AP has different problems, as we all know).

They've also re-started the validators which were shut down this morning, and already got through more than half the backlog (down to 400K tasks waiting, from a peak of 850K). Altogther, a good day's work.
ID: 1597234 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1597243 - Posted: 5 Nov 2014, 17:07:19 UTC

Still no Beta. Does it all run on bruno?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1597243 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 1597246 - Posted: 5 Nov 2014, 17:16:06 UTC - in response to Message 1597243.  
Last modified: 5 Nov 2014, 17:21:49 UTC

Still no Beta. Does it all run on bruno?

I'm not sure which servers Beta uses. But, it's not the first time that after a full day of getting main working, they forget or run out of time to get Beta operational again.
EDIT:
They did note that not everything on Bruno was working.
Upload Server Problem
Our upload server (bruno) is currently unstable. We are working to fix the problem. UPDATE: uploads are now working although some bruno based services are still off. 5 Nov 2014, 3:24:51 UTC


Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 1597246 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 1597250 - Posted: 5 Nov 2014, 17:25:44 UTC

Just re-enabled network activity after suspending it yesterday, & all uploads went through at 1st attempt. Also got downloads no problem.
ID: 1597250 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1597286 - Posted: 5 Nov 2014, 19:09:59 UTC - in response to Message 1597234.  

...
They've also re-started the validators which were shut down this morning, and already got through more than half the backlog (down to 400K tasks waiting, from a peak of 850K). Altogther, a good day's work.

The sah_validate processes for v7 have been moved from bruno to synergy, so somewhat more than just restarting. Down to 170 WUs waiting for validation as of 19:00:06 UTC.
                                                                   Joe
ID: 1597286 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1597294 - Posted: 5 Nov 2014, 19:31:39 UTC - in response to Message 1597286.  

...
They've also re-started the validators which were shut down this morning, and already got through more than half the backlog (down to 400K tasks waiting, from a peak of 850K). Altogther, a good day's work.

The sah_validate processes for v7 have been moved from bruno to synergy, so somewhat more than just restarting. Down to 170 WUs waiting for validation as of 19:00:06 UTC.
                                                                   Joe

Well now its a case of finding another server to replace Bruno and or act as a standby in case of future problems.
I cant see the team wanting to have to swap processes from server to server repeatedly.

And AFAIK Bruno is an old device that's made up of bits of even older components from scrapped servers.

Maybe there's another whip round due.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1597294 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1597296 - Posted: 5 Nov 2014, 19:34:22 UTC - in response to Message 1597286.  

...
They've also re-started the validators which were shut down this morning, and already got through more than half the backlog (down to 400K tasks waiting, from a peak of 850K). Altogther, a good day's work.

The sah_validate processes for v7 have been moved from bruno to synergy, so somewhat more than just restarting. Down to 170 WUs waiting for validation as of 19:00:06 UTC.
                                                                   Joe

Perhaps it is nearing time to retire bruno.
I imagine setting up a new box & taking it down to the colo is on the guys list of favorite things to do.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1597296 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1597304 - Posted: 5 Nov 2014, 19:47:33 UTC - in response to Message 1597296.  

Perhaps it is nearing time to retire bruno.


Maybe replacement could be Muarae? #1 is running web-pages, #3 is "running" (disabled) ntpckr and rfi. One is running (maybe) Eric's GALFA -project.

What about rest of this "donated box - a 3U monster" as Matt calls it?
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1597304 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1597317 - Posted: 5 Nov 2014, 20:01:20 UTC - in response to Message 1597295.  

I think the AP splitters are just sucking air. Four AP files started now, and it doesn't seem as if any WU's are created.

Having 28mr11ab stuck again, doesn't help either.

End of whining for now :-)

A couple of those files have been loaded for hours, yet they are still showing (1)
Still no AstroPulse Spike on the Cricket
Only a few resends have made it to my machines all day...
ID: 1597317 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1597318 - Posted: 5 Nov 2014, 20:01:57 UTC - in response to Message 1597296.  

Perhaps it is nearing time to retire bruno.
I imagine setting up a new box & taking it down to the colo is on the guys list of favorite things to do.

8GB of RAM seems rather meager compared to all the rest of the servers. Perhaps it got tired of running as many processes as it was.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1597318 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1597320 - Posted: 5 Nov 2014, 20:02:53 UTC

Still no Beta.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1597320 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1597325 - Posted: 5 Nov 2014, 20:13:47 UTC - in response to Message 1597304.  

Perhaps it is nearing time to retire bruno.


Maybe replacement could be Muarae? #1 is running web-pages, #3 is "running" (disabled) ntpckr and rfi. One is running (maybe) Eric's GALFA -project.

What about rest of this "donated box - a 3U monster" as Matt calls it?

Hopefully they finished debating how to utilize that box fully long ago.
If required I'm sure a fundraiser to get a new system could be thrown together.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1597325 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1597335 - Posted: 5 Nov 2014, 20:40:23 UTC - in response to Message 1597318.  

Perhaps it is nearing time to retire bruno.
I imagine setting up a new box & taking it down to the colo is on the guys list of favorite things to do.

8GB of RAM seems rather meager compared to all the rest of the servers. Perhaps it got tired of running as many processes as it was.

I suspect that's the specs for the original bruno, a frankenstein machine built up from a motherboard and CPUs donated by Intel, with case, RAM, interface to an existing storage array and such added to make it useful. IIRC that did not recover after a power outage, and there were so many processes on it they decided to keep the name when they substituted another system.
                                                                   Joe
ID: 1597335 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1597344 - Posted: 5 Nov 2014, 20:45:58 UTC

I want Beta!

My phone is going to finish its current pair of Einsteins in the next 20 minutes or so. It would be really nice if it could get Betas to work on instead.

(Okay, I'll stop whining now.)
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1597344 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34816
Credit: 261,360,520
RAC: 489
Australia
Message 1597351 - Posted: 5 Nov 2014, 21:04:02 UTC - in response to Message 1597344.  

I want Beta!

My phone is going to finish its current pair of Einsteins in the next 20 minutes or so. It would be really nice if it could get Betas to work on instead.

(Okay, I'll stop whining now.)

Here is the Beta SSP David if you're interested.

http://setiweb.ssl.berkeley.edu/beta/status.php

Cheers.
ID: 1597351 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (91) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.