Panic Mode On (114) Server Problems?

Message boards : Number crunching : Panic Mode On (114) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 45 · Next

AuthorMessage
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1973900 - Posted: 6 Jan 2019, 22:51:03 UTC

Yes got work on all 3 hosts first time, but now it's no work available on all hosts. Hope it's a demand problem and not more trouble.
ID: 1973900 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973910 - Posted: 6 Jan 2019, 23:48:41 UTC

I think they need to turn off task viewing again. Last time the database recovered after 3-4 hours and was able to service work requests.

I too think all the issues lately can be attributed to bringing the replica database back online.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973910 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1973926 - Posted: 7 Jan 2019, 0:38:53 UTC

Eric posted they had a science db crash. I do like it when they tell us what is going on. https://setiathome.berkeley.edu/forum_thread.php?id=83778#1973919
ID: 1973926 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1973995 - Posted: 7 Jan 2019, 6:51:37 UTC

… and the recovery from this latest outage has been (and continues to be) exceptionally poor, about on par with the recovery after last weeks weekly outage.
Grant
Darwin NT
ID: 1973995 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1974059 - Posted: 7 Jan 2019, 17:26:45 UTC
Last modified: 7 Jan 2019, 18:17:23 UTC

I'm not inclined to panic as the system hasn't been running well lately. I'm assuming that someone is at work today and already knows.

splitting isn't happening. we have 380K in RTS (good for 3 more hours)

edit: now we are down to 280K in RTS. (18:10 UTC)
ID: 1974059 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1974064 - Posted: 7 Jan 2019, 18:23:48 UTC - in response to Message 1974059.  

. . Hi,

. . Yes I noticed that too. the GBT splitters all show as running OK but nothing is splitting, and we have seen that before ...

. . Time to kick the server. It seems we need a server bot that has one job, to kick the other servers from time to time.

. . And of course, this is with the regular outage less than 24 hours away.

Stephen

:(
ID: 1974064 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1974075 - Posted: 7 Jan 2019, 20:51:05 UTC

Well, we're still running through the loaded tapes at much the same speed as we have been all day, but now we have a positive result creation rate of 20+, 30+ per second.

My guess (and is it only a guess) is that yesterday tapes were being split and workunits were being put into the database, but something went wrong with the housekeeping when the science database went belly up. So, to be on the safe side, they rewound the tapes and ran through them again, but most of the resulting WUs turned out to be in the database already and didn't need to be (or couldn't be) added again as a duplicate. So, no new work creation until we reached the point where it broke yesterday.
ID: 1974075 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1974076 - Posted: 7 Jan 2019, 21:07:24 UTC

It's only here or the forum pages are extremely slow to open?
ID: 1974076 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1974078 - Posted: 7 Jan 2019, 21:11:01 UTC - in response to Message 1974076.  

Nope, it is slow on the upper left coast
ID: 1974078 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1974097 - Posted: 7 Jan 2019, 23:55:20 UTC - in response to Message 1974075.  

... So, to be on the safe side, they rewound the tapes and ran through them again, but most of the resulting WUs turned out to be in the database already and didn't need to be (or couldn't be) added again as a duplicate. So, no new work creation until we reached the point where it broke yesterday.


. . Right or wrong it sounds like a good hypothesis ...

Stephen

:)
ID: 1974097 · Report as offensive
JLDun
Volunteer tester
Avatar

Send message
Joined: 21 Apr 06
Posts: 573
Credit: 196,101
RAC: 0
United States
Message 1974099 - Posted: 8 Jan 2019, 0:45:46 UTC - in response to Message 1974064.  
Last modified: 8 Jan 2019, 0:46:12 UTC

. . Time to kick the server. It seems we need a server bot that has one job, to kick the other servers from time to time.


Officially , isn't that what the Transitioner for; to Transition things along?

Maybe it needs bigger boots.
ID: 1974099 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1974111 - Posted: 8 Jan 2019, 1:16:55 UTC - in response to Message 1974099.  

Actually it's just pissed off that it had 2 Tuesday outages in a row on Wednesdays.
ID: 1974111 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974112 - Posted: 8 Jan 2019, 1:29:11 UTC - in response to Message 1974111.  

Actually it's just pissed off that it had 2 Tuesday outages in a row on Wednesdays.

HA ha. LOL
+1
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974112 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 1974120 - Posted: 8 Jan 2019, 2:33:34 UTC

UH OH getting the dreaded project has no tasks available
ID: 1974120 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974125 - Posted: 8 Jan 2019, 3:01:03 UTC

Been dealing with that for the past hour. Think the servers are getting over their daily glitch now as I am getting decent replacement work quantities.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974125 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1974141 - Posted: 8 Jan 2019, 4:06:06 UTC
Last modified: 8 Jan 2019, 4:25:14 UTC

I hope this week's shutdown will give them a chance to fix some things. The RTS keeps falling as the splitters stop, and then I'm assuming someone gives it a kick and it starts up again. So far we haven't emptied the RTS queue, but the system is still in a bad way, and I'm not sure why.

edit: I wonder if they have set the "start the splitters, refill the rts queue" at a lower point. It used to fire up when it fell to 550k or around that...and maybe they have set this number lower to give the system a longer time to do something else ??
ID: 1974141 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1974172 - Posted: 8 Jan 2019, 7:30:28 UTC - in response to Message 1974141.  

edit: I wonder if they have set the "start the splitters, refill the rts queue" at a lower point. It used to fire up when it fell to 550k or around that...and maybe they have set this number lower to give the system a longer time to do something else ??

There's plenty of other weirdness & poor performance going on for it to just be one of the symptoms of the underlying issues, no changes to start/stop points needed.
Grant
Darwin NT
ID: 1974172 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1974207 - Posted: 9 Jan 2019, 3:08:36 UTC

We are back! After 13 hours this is going to be a bumpy recovery.
ID: 1974207 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1974208 - Posted: 9 Jan 2019, 3:08:51 UTC

Seti is coming alive again. One of my machines was able to get 113 tasks. Another machine wasn't able to get any. Hopefully the defrag of the dbase will help in the days to come.
~Chris

ID: 1974208 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1974222 - Posted: 9 Jan 2019, 3:53:14 UTC - in response to Message 1974209.  

Seti is coming alive again. One of my machines was able to get 113 tasks. Another machine wasn't able to get any. Hopefully the defrag of the dbase will help in the days to come.


. . I have a feeling of deja vu ...

:)
ID: 1974222 · Report as offensive
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 45 · Next

Message boards : Number crunching : Panic Mode On (114) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.