Panic Mode On (113) Server Problems?

Message boards : Number crunching : Panic Mode On (113) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 37 · Next

AuthorMessage
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1956821 - Posted: 22 Sep 2018, 16:56:16 UTC

We got more data files. We should be good for the weekend now.!!! - Thank you
ID: 1956821 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1956853 - Posted: 22 Sep 2018, 22:03:09 UTC - in response to Message 1956821.  
Last modified: 22 Sep 2018, 22:04:09 UTC

. . Yep, thanks to whoever got the job and did it. Much appreciated :) Especially since these Blc14 units are of the slower variety as well so they should last a bit longer ...

Stephen

:)
ID: 1956853 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1956898 - Posted: 23 Sep 2018, 8:06:45 UTC

noise bombing files i have

blc06_2bit_guppi_58227_20801_HIP66479_0057.29470.818.21.44.xx
blc06_2bit_guppi_58227_20801_HIP66479_0057.29482.818.22.45.xx
ID: 1956898 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1957263 - Posted: 26 Sep 2018, 2:29:38 UTC

Wow, it's back already ...
ID: 1957263 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1957265 - Posted: 26 Sep 2018, 2:36:32 UTC - in response to Message 1957263.  

And with no work. Very strange outage. The servers came back early only to dump the RTS buffer immediately to those lucky enough to report work and get more work and then the splitters went on strike and hiatus. But no message boards till now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1957265 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1957271 - Posted: 26 Sep 2018, 3:08:44 UTC

The SSP says RTS = 0, this is bad.
ID: 1957271 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1957279 - Posted: 26 Sep 2018, 3:37:10 UTC - in response to Message 1957271.  

One of my windows machines was able to grab 40 or so tasks about 15 minutes ago. Ready to send is just over 600 tasks so it looks like things are slowly getting back to normal.
~Chris

ID: 1957279 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1957283 - Posted: 26 Sep 2018, 4:06:31 UTC
Last modified: 26 Sep 2018, 4:15:13 UTC

Thanks to those who stayed late to get seti up and going again!!!

the results out in the field is around 4.2 million and it is usually 4.6 million, so it has a while until it can start building up a RTS queue.
ID: 1957283 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1957285 - Posted: 26 Sep 2018, 4:23:08 UTC - in response to Message 1957263.  

Wow, it's back already ...


. . Yeah, I blinked and missed the outage completely ...

. . A shame that the message boards, the system and user stats weren't back with the main part of the system though.

. . And while there seemed to be work when it first came back it fell into a severe work famine.

Stephen

:(
ID: 1957285 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1957297 - Posted: 26 Sep 2018, 8:14:35 UTC

Thanks to those who stayed late to get seti up and going again!!!


I am surprised that many people think in this day and age it requires "hands on server" to sort out problems.

I am pretty sure that all it requires is a laptop and internet connection.
ID: 1957297 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1957298 - Posted: 26 Sep 2018, 8:17:33 UTC - in response to Message 1957297.  

Thanks to those who stayed late to get seti up and going again!!!


I am surprised that many people think in this day and age it requires "hands on server" to sort out problems.

I am pretty sure that all it requires is a laptop and internet connection.

That's when things are working as intended. Around here, that's not always the case.
And even for remote work, someone has to be awake & fully functional at the time to do what's necessary.
Grant
Darwin NT
ID: 1957298 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1957313 - Posted: 26 Sep 2018, 12:08:12 UTC - in response to Message 1957297.  

Thanks to those who stayed late to get seti up and going again!!!


I am surprised that many people think in this day and age it requires "hands on server" to sort out problems.

I am pretty sure that all it requires is a laptop and internet connection.


. . True for some things. But I am sure there are still some things that require a manual activity to complete.

. . And respect is still a good thing ...

Stephen

:)
ID: 1957313 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1957439 - Posted: 27 Sep 2018, 8:56:05 UTC

lot of noise with blc15_2bit_guppi_58329_19913_HIP91145_0023.4996.818.20.29.XX WU
ID: 1957439 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1957459 - Posted: 27 Sep 2018, 15:47:27 UTC

The system seems to be able to handle a certain amount of noise bombs. A file here and there isn't a problem. When someone posts about noise bomb files I usually check the status page to see what the "Results received in the last hour" is and also the queries/second level.

The results received in the last hour is less than 100k, so the system is running well. I grow concerned when that number is over 120k as it usually means that the system can't do clean up (db purging) as well as usual as it is too busy.

We have a good amount of data, so hopefully we will have no reason to panic until next Friday when more data will need to be added. I'm hoping Tuesday's outage is short :-).
ID: 1957459 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1957982 - Posted: 1 Oct 2018, 2:55:28 UTC

Seems the Server is stuck.
Splitters are out to supper and the RTS is falling.
Data Distribution State	    SETI@home v7 #	    Astropulse #	   SETI@home v8 #	 As of*
Results ready to send	         0	                 0	      366,231              0m
Current result creation rate     0/sec	               0/sec	     0.9691/sec            6m
Results out in the field	         0	               21,076	     4,557,661             1m
Results received in last hour    0                      541	      92,556               2h
ID: 1957982 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1957984 - Posted: 1 Oct 2018, 3:34:55 UTC

I'll echo TBar's "something's wrong"
There are a bunch of WUs stuck in the "waiting for assimilation" and the "Results received in the last hour" is 3hours stale.

Data Distribution State SETI@home v7 # Astropulse # SETI@home v8 # As of*
Results ready to send 0 0 324,145 1m
Current result creation rate ** 0/sec 0.0115/sec 0.7770/sec 7m
Results out in the field 0 20,688 4,556,169 1m
Results received in last hour ** 0 541 92,556 3h
Result turnaround time (last hour average) ** 0.00 hours 28.60 hours 34.04 hours 3h
Results returned and awaiting validation 0 17,044 3,881,077 1m
Workunits waiting for validation 0 0 27 1m
Workunits waiting for assimilation 0 4 142,876 1m
Workunit files waiting for deletion 0 0 0 1m
Result files waiting for deletion 0 0 0 1m
Workunits waiting for db purging 0 5,272 982,623 1m
Results waiting for db purging 71 11,265 2,028,550 1m
ID: 1957984 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1957991 - Posted: 1 Oct 2018, 4:29:14 UTC

It's borked.
Awaiting Validation & Assimilation are going through the roof, and the splitters are no longer splitting so the Ready-to-send buffer is emptying fast. Received in the last hour has frozen.
At the return rate when it froze- we'll be out of work in just over 2 hours.
Grant
Darwin NT
ID: 1957991 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1957992 - Posted: 1 Oct 2018, 4:30:27 UTC
Last modified: 1 Oct 2018, 4:45:00 UTC

Panic!
Guessing we have about 2 more hours of WUs in the RTS and then there will only be a stray AP resend here and there until someone performs some percussive maintenance on the system.

if the splitters have stopped, then all those WUs waiting for assimilation 200K+ are resends?
ID: 1957992 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1957999 - Posted: 1 Oct 2018, 6:15:33 UTC

I sent Eric an email message 4-1/2 hours ago, but no word back from him yet.
Although I remember a similar outage where he said the system went down in a way that stopped email forwarding as well ...
ID: 1957999 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1958000 - Posted: 1 Oct 2018, 6:25:00 UTC

It is after 11pm on a Sunday night in California. The team seems to have some early risers, so hopefully this can be solved in the morning. good night.
ID: 1958000 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 37 · Next

Message boards : Number crunching : Panic Mode On (113) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.