Panic Mode On (97) Server Problems?

Message boards : Number crunching : Panic Mode On (97) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 33 · Next

AuthorMessage
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1668511 - Posted: 22 Apr 2015, 17:10:12 UTC - in response to Message 1668458.  

So there are 12 sah assimilator (v7) processes now? I don't recall there being so many previously. I wonder if that is related.

Maybe Matt fired some extra ones up when working on that huge backlog we had....

Meanwhile, the fact that they are all shut down could point to another DB problem, as they are on both vader and georgem, so it's not just one server locking up.

I noticed last week when Matt brought the Science DB back up that there were new Sah assimilators on georgem. Since they are all disabled right now, I would presume that Matt or Jeff are doing something with the Science Database. Maybe we'll get an update from them at the end of the day......
Donald
Infernal Optimist / Submariner, retired
ID: 1668511 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1668532 - Posted: 22 Apr 2015, 17:45:35 UTC

While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations.

And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come.

The assimilators might be off for a while again (more index rebuilding) but no worries (yet).

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1668532 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1668533 - Posted: 22 Apr 2015, 17:47:40 UTC - in response to Message 1668532.  

Thank you for the continued updates, Matt!

Hope the massages go well.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1668533 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1668543 - Posted: 22 Apr 2015, 18:00:48 UTC

Just a thought...
Has anyone considered the possibility that someone is damaging the SETI database by deliberately introducing specifically engineered results?

I could think of many different (albeit bad) reasons to motivate some unscrupulous person to sabotage a project like SETI. It could be one possible scenario for the persistent problems our SETI scientists are experiencing, in spite of their continuing efforts to restore and repair the affected databases.

If Matt has not already, perhaps he might want to consider whether a model based on sabotage might fit the specific details he has observed in the current ongoing problems.
ID: 1668543 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1668552 - Posted: 22 Apr 2015, 18:19:29 UTC - in response to Message 1668532.  

While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations.

And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come.

The assimilators might be off for a while again (more index rebuilding) but no worries (yet).

- Matt

It it good to hear that the automated checks & balances kicked in & did their thing. Rather than a catastrophic issue that brings everything to a sudden halt.

Does everyone have "Professional Database Masseur" on their resume around there at this point? :P
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1668552 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1668557 - Posted: 22 Apr 2015, 19:04:48 UTC - in response to Message 1668532.  

While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations.

And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come.

The assimilators might be off for a while again (more index rebuilding) but no worries (yet).

- Matt

One item you might investigate is the continued problems with the MB work at Beta. Although it's difficult to imagine how Autocorrelations problems at Beta could be linked to the database problems, it is interesting that the Beta problem has existed for almost as long as the database problems. At this point it is difficult to conduct any MB testing at Beta. My last 100 tasks at Beta has only produced a handful of useable tasks as seen here, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959
ID: 1668557 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1668710 - Posted: 23 Apr 2015, 1:07:04 UTC - in response to Message 1668543.  
Last modified: 23 Apr 2015, 1:20:23 UTC

My thought on the database issues ... when maintenance mode starts, the validators and assimilators are in the middle of a task , but then the transitioners shut down.

Then these processes can't complete their task because they don't know what to do (or where to get data) and hold a database entry open tiring to finish what they were doing. (EDIT: because the tranitioner left them hanging with only half the data)

Then the backups start, and HARD close any open database entries. And poof, you get errors. since they weren't completed.

I maybe completely wrong, but brainstorming is never a bad idea. Even silly things might make one think .. that's wrong, but you know it might apply to this process over here ...
ID: 1668710 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1668713 - Posted: 23 Apr 2015, 1:28:09 UTC - in response to Message 1668710.  

Is there a database command like ... "who has active entries" log it, before you close down and see which computer is still trying to complete a task?
ID: 1668713 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1668734 - Posted: 23 Apr 2015, 3:01:02 UTC - in response to Message 1668710.  

One would hope, that the transitioner, validator, assimilator, and file deleter processes are written so that when they receive a "Disable" or "Shutdown" command, they complete the task in progress and close the database entry before shutting down.
Donald
Infernal Optimist / Submariner, retired
ID: 1668734 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1668995 - Posted: 23 Apr 2015, 16:48:52 UTC - in response to Message 1668734.  

One would hope, that the transitioner, validator, assimilator, and file deleter processes are written so that when they receive a "Disable" or "Shutdown" command, they complete the task in progress and close the database entry before shutting down.


This is exactly the case.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1668995 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1669011 - Posted: 23 Apr 2015, 17:20:56 UTC - in response to Message 1668995.  

That's good to hear Matt, was just thinking as to why things always seem to go nuts after maintenance.
ID: 1669011 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1669080 - Posted: 23 Apr 2015, 19:54:51 UTC - in response to Message 1669011.  

That's good to hear Matt, was just thinking as to why things always seem to go nuts after maintenance.


Things tend to go nuts after maintenance usually due to any number of the following reasons:

1. we're doing maintenance, and it's taking longer than the usual span of the outage, so we only bring parts of the project up at first, then other later, thus giving the impression things are going nuts.

2. we're doing the sort of maintenance where stuff might actually break, and sometimes not noticeably until after the project is back up.

3. the dam breaking after an outage might overwhelm some systems/servers enough to cause problems.

4. we actually make things go nuts on purpose (add more processes/listeners) in order to find the weak spots in our whole system.

There are others, but the general problem is given our lack of servers/manpower, and the rather dynamic/chaotic nature of a global project such as this, the only way we can truly fix and test most things is live and the period just after an outage is a particularly sensitive period.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1669080 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1669090 - Posted: 23 Apr 2015, 20:07:31 UTC - in response to Message 1669080.  

Thanks for the info Matt, and yes, it's a bit hard for us out here to even to begin to understand what is happening in the lab with only a few graphs to look at.
ID: 1669090 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1669213 - Posted: 24 Apr 2015, 3:06:36 UTC - in response to Message 1669090.  

Hmm... Didn't even see them but evidently some APs were split just recently and are gone again.
ID: 1669213 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1669263 - Posted: 24 Apr 2015, 6:08:15 UTC - in response to Message 1669213.  

Hmm... Didn't even see them but evidently some APs were split just recently and are gone again.

I didn't see any new ones, but got a few resends (-2, -3, etc) recently.
Donald
Infernal Optimist / Submariner, retired
ID: 1669263 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 1669273 - Posted: 24 Apr 2015, 6:57:24 UTC - in response to Message 1669263.  

Hmm... Didn't even see them but evidently some APs were split just recently and are gone again.

I didn't see any new ones, but got a few resends (-2, -3, etc) recently.


There were some new ones. I must have been very lucky because I got (exactly) one.
It is a new version (AP v7.10).

Tom
ID: 1669273 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1669288 - Posted: 24 Apr 2015, 7:37:05 UTC

Ahhh....
Judging from the cricket graphs and my task list we had a small AP run whilst I was at work today.
The kitties managed to sniff out and catch 164 of the beasties. That's with 9 rigs requesting same....

Not much, but happy to have 'em.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1669288 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1669324 - Posted: 24 Apr 2015, 8:53:48 UTC - in response to Message 1669323.  
Last modified: 24 Apr 2015, 8:57:10 UTC

If everything goes as I plan, I will have my ASUS R9 290X Matrix Platinum, up and running late Friday evening my time.

If I don't get any AP's then, I will problably put myself into a fetal position, and cry the whole weekend :-)

Best invest in a box of soft tissues, buddy.
'Cuz it looks like MB all the way down for a while.

Meow.

EDIT..
And, BTW, good luck with the new build.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1669324 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1670645 - Posted: 27 Apr 2015, 8:41:58 UTC

If someone still collects this type of errors - this 'Notice' shows at the top if you red-x any post:

Notice: Undefined variable: config in /disks/carolyn/b/home/boincadm/projects/sah/html/user/forum_report_post.php on line 64
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1670645 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1670694 - Posted: 27 Apr 2015, 12:09:33 UTC - in response to Message 1670645.  

If someone still collects this type of errors - this 'Notice' shows at the top if you red-x any post:

Notice: Undefined variable: config in /disks/carolyn/b/home/boincadm/projects/sah/html/user/forum_report_post.php on line 64

Thanks. I relayed the bugrep. Luckily you can still report ;)
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1670694 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 33 · Next

Message boards : Number crunching : Panic Mode On (97) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.