Panic Mode On (97) Server Problems?

Message boards : Number crunching : Panic Mode On (97) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 35 · Next

AuthorMessage
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2775
Credit: 622,510,725
RAC: 889,609
Canada
Message 1668230 - Posted: 22 Apr 2015, 1:41:39 UTC

Way to go on this weeks maintenance!

RTS never hit 0 and graphs indicate every thing is returning to normal.

Job well done Matt and the team!
ID: 1668230 · Report as offensive
Profile betreger
Avatar

Send message
Joined: 29 Jun 99
Posts: 9694
Credit: 27,188,735
RAC: 23,137
United States
Message 1668256 - Posted: 22 Apr 2015, 3:01:25 UTC - in response to Message 1668230.  

Way to go on this weeks maintenance!

RTS never hit 0 and graphs indicate every thing is returning to normal.

Job well done Matt and the team!

And yet no APs, sigh.
ID: 1668256 · Report as offensive
Wild6-NJ
Volunteer tester

Send message
Joined: 4 Aug 99
Posts: 38
Credit: 94,357,272
RAC: 31,021
United States
Message 1668445 - Posted: 22 Apr 2015, 14:08:53 UTC

SAH v7 assimilators are offline and the assimilation queue is building.
Hope we don't have database issues again.
ID: 1668445 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2775
Credit: 622,510,725
RAC: 889,609
Canada
Message 1668451 - Posted: 22 Apr 2015, 14:16:10 UTC - in response to Message 1668445.  

SAH v7 assimilators are offline and the assimilation queue is building.
Hope we don't have database issues again.



Yea its a bit strange to see that they recovered nicely after maintenance then shut down.
ID: 1668451 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50378
Credit: 983,787,227
RAC: 26
United States
Message 1668453 - Posted: 22 Apr 2015, 14:18:17 UTC - in response to Message 1668451.  

SAH v7 assimilators are offline and the assimilation queue is building.
Hope we don't have database issues again.



Yea its a bit strange to see that they recovered nicely after maintenance then shut down.

Never a dull moment around here....LOL.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 1668453 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 195,570,019
RAC: 11,347
United States
Message 1668457 - Posted: 22 Apr 2015, 14:33:54 UTC

So there are 12 sah assimilator (v7) processes now? I don't recall there being so many previously. I wonder if that is related.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1668457 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50378
Credit: 983,787,227
RAC: 26
United States
Message 1668458 - Posted: 22 Apr 2015, 14:35:45 UTC - in response to Message 1668457.  
Last modified: 22 Apr 2015, 14:40:40 UTC

So there are 12 sah assimilator (v7) processes now? I don't recall there being so many previously. I wonder if that is related.

Maybe Matt fired some extra ones up when working on that huge backlog we had....

Meanwhile, the fact that they are all shut down could point to another DB problem, as they are on both vader and georgem, so it's not just one server locking up.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 1668458 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8237
Credit: 13,466,256
RAC: 11,408
United States
Message 1668511 - Posted: 22 Apr 2015, 17:10:12 UTC - in response to Message 1668458.  

So there are 12 sah assimilator (v7) processes now? I don't recall there being so many previously. I wonder if that is related.

Maybe Matt fired some extra ones up when working on that huge backlog we had....

Meanwhile, the fact that they are all shut down could point to another DB problem, as they are on both vader and georgem, so it's not just one server locking up.

I noticed last week when Matt brought the Science DB back up that there were new Sah assimilators on georgem. Since they are all disabled right now, I would presume that Matt or Jeff are doing something with the Science Database. Maybe we'll get an update from them at the end of the day......
Donald
Infernal Optimist / Submariner, retired
ID: 1668511 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1668532 - Posted: 22 Apr 2015, 17:45:35 UTC

While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations.

And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come.

The assimilators might be off for a while again (more index rebuilding) but no worries (yet).

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1668532 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50378
Credit: 983,787,227
RAC: 26
United States
Message 1668533 - Posted: 22 Apr 2015, 17:47:40 UTC - in response to Message 1668532.  

Thank you for the continued updates, Matt!

Hope the massages go well.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 1668533 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 175
Credit: 55,290,673
RAC: 26,262
United States
Message 1668543 - Posted: 22 Apr 2015, 18:00:48 UTC

Just a thought...
Has anyone considered the possibility that someone is damaging the SETI database by deliberately introducing specifically engineered results?

I could think of many different (albeit bad) reasons to motivate some unscrupulous person to sabotage a project like SETI. It could be one possible scenario for the persistent problems our SETI scientists are experiencing, in spite of their continuing efforts to restore and repair the affected databases.

If Matt has not already, perhaps he might want to consider whether a model based on sabotage might fit the specific details he has observed in the current ongoing problems.
ID: 1668543 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 195,570,019
RAC: 11,347
United States
Message 1668552 - Posted: 22 Apr 2015, 18:19:29 UTC - in response to Message 1668532.  

While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations.

And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come.

The assimilators might be off for a while again (more index rebuilding) but no worries (yet).

- Matt

It it good to hear that the automated checks & balances kicked in & did their thing. Rather than a catastrophic issue that brings everything to a sudden halt.

Does everyone have "Professional Database Masseur" on their resume around there at this point? :P
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1668552 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4928
Credit: 666,733,569
RAC: 1,417,651
United States
Message 1668557 - Posted: 22 Apr 2015, 19:04:48 UTC - in response to Message 1668532.  

While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations.

And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come.

The assimilators might be off for a while again (more index rebuilding) but no worries (yet).

- Matt

One item you might investigate is the continued problems with the MB work at Beta. Although it's difficult to imagine how Autocorrelations problems at Beta could be linked to the database problems, it is interesting that the Beta problem has existed for almost as long as the database problems. At this point it is difficult to conduct any MB testing at Beta. My last 100 tasks at Beta has only produced a handful of useable tasks as seen here, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959
ID: 1668557 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2775
Credit: 622,510,725
RAC: 889,609
Canada
Message 1668710 - Posted: 23 Apr 2015, 1:07:04 UTC - in response to Message 1668543.  
Last modified: 23 Apr 2015, 1:20:23 UTC

My thought on the database issues ... when maintenance mode starts, the validators and assimilators are in the middle of a task , but then the transitioners shut down.

Then these processes can't complete their task because they don't know what to do (or where to get data) and hold a database entry open tiring to finish what they were doing. (EDIT: because the tranitioner left them hanging with only half the data)

Then the backups start, and HARD close any open database entries. And poof, you get errors. since they weren't completed.

I maybe completely wrong, but brainstorming is never a bad idea. Even silly things might make one think .. that's wrong, but you know it might apply to this process over here ...
ID: 1668710 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2775
Credit: 622,510,725
RAC: 889,609
Canada
Message 1668713 - Posted: 23 Apr 2015, 1:28:09 UTC - in response to Message 1668710.  

Is there a database command like ... "who has active entries" log it, before you close down and see which computer is still trying to complete a task?
ID: 1668713 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8237
Credit: 13,466,256
RAC: 11,408
United States
Message 1668734 - Posted: 23 Apr 2015, 3:01:02 UTC - in response to Message 1668710.  

One would hope, that the transitioner, validator, assimilator, and file deleter processes are written so that when they receive a "Disable" or "Shutdown" command, they complete the task in progress and close the database entry before shutting down.
Donald
Infernal Optimist / Submariner, retired
ID: 1668734 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1668995 - Posted: 23 Apr 2015, 16:48:52 UTC - in response to Message 1668734.  

One would hope, that the transitioner, validator, assimilator, and file deleter processes are written so that when they receive a "Disable" or "Shutdown" command, they complete the task in progress and close the database entry before shutting down.


This is exactly the case.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1668995 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2775
Credit: 622,510,725
RAC: 889,609
Canada
Message 1669011 - Posted: 23 Apr 2015, 17:20:56 UTC - in response to Message 1668995.  

That's good to hear Matt, was just thinking as to why things always seem to go nuts after maintenance.
ID: 1669011 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1669080 - Posted: 23 Apr 2015, 19:54:51 UTC - in response to Message 1669011.  

That's good to hear Matt, was just thinking as to why things always seem to go nuts after maintenance.


Things tend to go nuts after maintenance usually due to any number of the following reasons:

1. we're doing maintenance, and it's taking longer than the usual span of the outage, so we only bring parts of the project up at first, then other later, thus giving the impression things are going nuts.

2. we're doing the sort of maintenance where stuff might actually break, and sometimes not noticeably until after the project is back up.

3. the dam breaking after an outage might overwhelm some systems/servers enough to cause problems.

4. we actually make things go nuts on purpose (add more processes/listeners) in order to find the weak spots in our whole system.

There are others, but the general problem is given our lack of servers/manpower, and the rather dynamic/chaotic nature of a global project such as this, the only way we can truly fix and test most things is live and the period just after an outage is a particularly sensitive period.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1669080 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2775
Credit: 622,510,725
RAC: 889,609
Canada
Message 1669090 - Posted: 23 Apr 2015, 20:07:31 UTC - in response to Message 1669080.  

Thanks for the info Matt, and yes, it's a bit hard for us out here to even to begin to understand what is happening in the lab with only a few graphs to look at.
ID: 1669090 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 35 · Next

Message boards : Number crunching : Panic Mode On (97) Server Problems?


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.