Message boards :
Number crunching :
Panic Mode On (97) Server Problems?
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 33 · Next
Author | Message |
---|---|
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
So there are 12 sah assimilator (v7) processes now? I don't recall there being so many previously. I wonder if that is related. I noticed last week when Matt brought the Science DB back up that there were new Sah assimilators on georgem. Since they are all disabled right now, I would presume that Matt or Jeff are doing something with the Science Database. Maybe we'll get an update from them at the end of the day...... Donald Infernal Optimist / Submariner, retired |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations. And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come. The assimilators might be off for a while again (more index rebuilding) but no worries (yet). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Thank you for the continued updates, Matt! Hope the massages go well. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74 |
Just a thought... Has anyone considered the possibility that someone is damaging the SETI database by deliberately introducing specifically engineered results? I could think of many different (albeit bad) reasons to motivate some unscrupulous person to sabotage a project like SETI. It could be one possible scenario for the persistent problems our SETI scientists are experiencing, in spite of their continuing efforts to restore and repair the affected databases. If Matt has not already, perhaps he might want to consider whether a model based on sabotage might fit the specific details he has observed in the current ongoing problems. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations. It it good to hear that the automated checks & balances kicked in & did their thing. Rather than a catastrophic issue that brings everything to a sudden halt. Does everyone have "Professional Database Masseur" on their resume around there at this point? :P SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations. One item you might investigate is the continued problems with the MB work at Beta. Although it's difficult to imagine how Autocorrelations problems at Beta could be linked to the database problems, it is interesting that the Beta problem has existed for almost as long as the database problems. At this point it is difficult to conduct any MB testing at Beta. My last 100 tasks at Beta has only produced a handful of useable tasks as seen here, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959 |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
My thought on the database issues ... when maintenance mode starts, the validators and assimilators are in the middle of a task , but then the transitioners shut down. Then these processes can't complete their task because they don't know what to do (or where to get data) and hold a database entry open tiring to finish what they were doing. (EDIT: because the tranitioner left them hanging with only half the data) Then the backups start, and HARD close any open database entries. And poof, you get errors. since they weren't completed. I maybe completely wrong, but brainstorming is never a bad idea. Even silly things might make one think .. that's wrong, but you know it might apply to this process over here ... |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Is there a database command like ... "who has active entries" log it, before you close down and see which computer is still trying to complete a task? |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
One would hope, that the transitioner, validator, assimilator, and file deleter processes are written so that when they receive a "Disable" or "Shutdown" command, they complete the task in progress and close the database entry before shutting down. Donald Infernal Optimist / Submariner, retired |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
One would hope, that the transitioner, validator, assimilator, and file deleter processes are written so that when they receive a "Disable" or "Shutdown" command, they complete the task in progress and close the database entry before shutting down. This is exactly the case. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
That's good to hear Matt, was just thinking as to why things always seem to go nuts after maintenance. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
That's good to hear Matt, was just thinking as to why things always seem to go nuts after maintenance. Things tend to go nuts after maintenance usually due to any number of the following reasons: 1. we're doing maintenance, and it's taking longer than the usual span of the outage, so we only bring parts of the project up at first, then other later, thus giving the impression things are going nuts. 2. we're doing the sort of maintenance where stuff might actually break, and sometimes not noticeably until after the project is back up. 3. the dam breaking after an outage might overwhelm some systems/servers enough to cause problems. 4. we actually make things go nuts on purpose (add more processes/listeners) in order to find the weak spots in our whole system. There are others, but the general problem is given our lack of servers/manpower, and the rather dynamic/chaotic nature of a global project such as this, the only way we can truly fix and test most things is live and the period just after an outage is a particularly sensitive period. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Thanks for the info Matt, and yes, it's a bit hard for us out here to even to begin to understand what is happening in the lab with only a few graphs to look at. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Hmm... Didn't even see them but evidently some APs were split just recently and are gone again. |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
Hmm... Didn't even see them but evidently some APs were split just recently and are gone again. I didn't see any new ones, but got a few resends (-2, -3, etc) recently. Donald Infernal Optimist / Submariner, retired |
BetelgeuseFive Send message Joined: 6 Jul 99 Posts: 158 Credit: 17,117,787 RAC: 19 |
Hmm... Didn't even see them but evidently some APs were split just recently and are gone again. There were some new ones. I must have been very lucky because I got (exactly) one. It is a new version (AP v7.10). Tom |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Ahhh.... Judging from the cricket graphs and my task list we had a small AP run whilst I was at work today. The kitties managed to sniff out and catch 164 of the beasties. That's with 9 rigs requesting same.... Not much, but happy to have 'em. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
If everything goes as I plan, I will have my ASUS R9 290X Matrix Platinum, up and running late Friday evening my time. Best invest in a box of soft tissues, buddy. 'Cuz it looks like MB all the way down for a while. Meow. EDIT.. And, BTW, good luck with the new build. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
If someone still collects this type of errors - this 'Notice' shows at the top if you red-x any post: Notice: Undefined variable: config in /disks/carolyn/b/home/boincadm/projects/sah/html/user/forum_report_post.php on line 64 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
If someone still collects this type of errors - this 'Notice' shows at the top if you red-x any post: Thanks. I relayed the bugrep. Luckily you can still report ;) A person who won't read has no advantage over one who can't read. (Mark Twain) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.