Panic Mode On (88) Server Problems?

Author	Message
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1542266 - Posted: 15 Jul 2014, 22:50:58 UTC - in response to Message 1542254. At least we will finally see the end of files 29my13ac and 29no13aa after all these weeks. Cheers. or not... :/ A tape just showed up and it splitting AP. :) With how the client operates on FIFO one would think the splitters would also work that way with data. The splitters are working on those 2 files as we speak. ;-) Cheers. ID: 1542266 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1542278 - Posted: 15 Jul 2014, 23:07:44 UTC - in response to Message 1542266. At least we will finally see the end of files 29my13ac and 29no13aa after all these weeks. Cheers. or not... :/ A tape just showed up and it splitting AP. :) With how the client operates on FIFO one would think the splitters would also work that way with data. The splitters are working on those 2 files as we speak. ;-) Cheers. Oh I was thinking of 31au13aa, 31au13ac, & 31mr13ae. Which I am pretty sure have been there for months. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1542278 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1542304 - Posted: 15 Jul 2014, 23:33:47 UTC - in response to Message 1542232. Last modified: 15 Jul 2014, 23:35:41 UTC However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day. Okay, now I'm intrigued. And of course the computer list is hidden. And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1542304 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1542307 - Posted: 15 Jul 2014, 23:36:42 UTC - in response to Message 1542304. However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day. Okay, now I'm intrigued. And of course the computer list is hidden. And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX. It does make 1 inquisitive doesn't it? Cheers. ID: 1542307 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1542310 - Posted: 15 Jul 2014, 23:39:26 UTC - in response to Message 1542304. Last modified: 15 Jul 2014, 23:41:28 UTC However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day. Okay, now I'm intrigued. And of course the computer list is hidden. And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX. I only noticed them because they showed up on one of the stat overtake pages for me with over a million RAC & I thought "that can't be right", but it seems it is. Their first post did mention something about load testing their data centers IIRC. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1542310 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1542320 - Posted: 15 Jul 2014, 23:57:33 UTC Yes there would certainly be plenty of CPU cores available in a data centre. ;-) I wouldn't want to be the 1 paying the power bill. :-O Cheers. ID: 1542320 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1542347 - Posted: 16 Jul 2014, 1:15:48 UTC Last modified: 16 Jul 2014, 1:16:51 UTC Damn, the splitter let go of 29no13aa. :-( It did progress from 2 to 3 though. Cheers. ID: 1542347 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1542380 - Posted: 16 Jul 2014, 2:51:34 UTC - in response to Message 1542310. However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day. Okay, now I'm intrigued. And of course the computer list is hidden. And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX. I only noticed them because they showed up on one of the stat overtake pages for me with over a million RAC & I thought "that can't be right", but it seems it is. Their first post did mention something about load testing their data centers IIRC. Here's one of his machines, 7309756, that got into my database before he hid them. Looks like he ran S@H on it for about 4 weeks, then stopped cold on July 6. A lot of WUs successfully processed, which is terrific, but he might have left 100 in limbo if that machine doesn't connect again. Hope he doesn't do it that way for his whole data center. ID: 1542380 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1542456 - Posted: 16 Jul 2014, 6:01:24 UTC - in response to Message 1542232. The MB assimilator backlog has gone from rather large to seriously huge- second biggest backlog this year. 300,000 & growing. Probably related to that is the steadily rising number of MB results waiting on validation. That which is usually around 2.6 million, has also spiked to the second biggest backlog this year- 3.53 million & climbing. It does look like the assimilators & the validators haven't recovered after the server barfed on Friday. Even after the weekly outage, the backlogs continue to grow. Grant Darwin NT ID: 1542456 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1542532 - Posted: 16 Jul 2014, 10:15:19 UTC - in response to Message 1542380. Here's one of his machines, 7309756, that got into my database before he hid them. Looks like he ran S@H on it for about 4 weeks, then stopped cold on July 6. A lot of WUs successfully processed, which is terrific, but he might have left 100 in limbo if that machine doesn't connect again. Hope he doesn't do it that way for his whole data center. Hmm, I just had a dual-node machine with those processors (tho' @ 2.5 GHz) ordered for me. I'll probably not run hyperthreading though, so 2x 20-core machines. :-) ID: 1542532 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1542970 - Posted: 17 Jul 2014, 5:57:52 UTC - in response to Message 1542456. Even after the weekly outage, the backlogs continue to grow. The backlogs are now clearing, interestingly shortly after AP WUs started flowing again. Cause & effect or just high correlation? Grant Darwin NT ID: 1542970 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1542983 - Posted: 17 Jul 2014, 6:59:55 UTC This is on the front page: Planned Power Outage - July 23rd-24th, 2014 Due to necessary electrical repairs at the data center, we need to shut down the servers on the evening of Wednesday, July 23rd. All the SETI projects and web sites will be unreachable for most of the time during this outage. The work should be complete by the following afternoon, and all services back on line shortly afterward. ID: 1542983 ·

Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3	Message 1542984 - Posted: 17 Jul 2014, 7:02:01 UTC - in response to Message 1542983. Thanks Richard i'll keep that in mind thanks for the heads up ID: 1542984 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1543336 - Posted: 17 Jul 2014, 18:58:46 UTC - in response to Message 1542983. And so it appears, that is also the date and time that the BOINC server goes down. It wasn't today. ID: 1543336 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1543879 - Posted: 18 Jul 2014, 15:49:01 UTC This tapes 29my13ac & 29no13aa apears as spliting for weeks in the server status page and never ends. Is something wrong with them? ID: 1543879 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1543882 - Posted: 18 Jul 2014, 15:52:18 UTC - in response to Message 1543879. Last modified: 18 Jul 2014, 15:52:41 UTC This tapes 29my13ac & 29no13aa apears as spliting for weeks in the server status page and never ends. Is something wrong with them? There are from time to time a dataset or two that have a problem and they lock the splitter working on them up. The staff usually tries to restart them a number of times before giving up on the data. After a while, if they continually stall, they are pulled from the splitting queue. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1543882 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1543999 - Posted: 18 Jul 2014, 20:41:22 UTC Now THIS is mighty odd.... The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing. Can't really recall that scenario before. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1543999 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1544001 - Posted: 18 Jul 2014, 20:46:20 UTC - in response to Message 1543999. Now THIS is mighty odd.... The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing. Can't really recall that scenario before. I am going to guess that it is just an error in the cricket collection of the SNMP data. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1544001 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1544004 - Posted: 18 Jul 2014, 20:54:01 UTC - in response to Message 1544001. Now THIS is mighty odd.... The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing. Can't really recall that scenario before. I am going to guess that it is just an error in the cricket collection of the SNMP data. I suspect so. The Crickets live on a server too, and it has gone down once in a long while. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1544004 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1544008 - Posted: 18 Jul 2014, 21:09:08 UTC - in response to Message 1544004. Now THIS is mighty odd.... The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing. Can't really recall that scenario before. I am going to guess that it is just an error in the cricket collection of the SNMP data. I suspect so. The Crickets live on a server too, and it has gone down once in a long while. Actually it looks like we got moved to the other router. http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-210/gigabitethernet6_17&ranges=d%3Aw&view=Octets That port is also named srb-ssi-seti-net and it looks like it picks up exactly where the other one ended. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1544008 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.