Panic Mode On (88) Server Problems?

Message boards : Number crunching : Panic Mode On (88) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 21 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1542266 - Posted: 15 Jul 2014, 22:50:58 UTC - in response to Message 1542254.  

At least we will finally see the end of files 29my13ac and 29no13aa after all these weeks.

Cheers.

or not... :/ A tape just showed up and it splitting AP. :)
With how the client operates on FIFO one would think the splitters would also work that way with data.

The splitters are working on those 2 files as we speak. ;-)

Cheers.
ID: 1542266 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1542278 - Posted: 15 Jul 2014, 23:07:44 UTC - in response to Message 1542266.  

At least we will finally see the end of files 29my13ac and 29no13aa after all these weeks.

Cheers.

or not... :/ A tape just showed up and it splitting AP. :)
With how the client operates on FIFO one would think the splitters would also work that way with data.

The splitters are working on those 2 files as we speak. ;-)

Cheers.

Oh I was thinking of 31au13aa, 31au13ac, & 31mr13ae. Which I am pretty sure have been there for months.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1542278 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1542304 - Posted: 15 Jul 2014, 23:33:47 UTC - in response to Message 1542232.  
Last modified: 15 Jul 2014, 23:35:41 UTC

However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day.

Okay, now I'm intrigued. And of course the computer list is hidden.

And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1542304 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1542307 - Posted: 15 Jul 2014, 23:36:42 UTC - in response to Message 1542304.  

However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day.

Okay, now I'm intrigued. And of course the computer list is hidden.

And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX.

It does make 1 inquisitive doesn't it?

Cheers.
ID: 1542307 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1542310 - Posted: 15 Jul 2014, 23:39:26 UTC - in response to Message 1542304.  
Last modified: 15 Jul 2014, 23:41:28 UTC

However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day.

Okay, now I'm intrigued. And of course the computer list is hidden.

And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX.

I only noticed them because they showed up on one of the stat overtake pages for me with over a million RAC & I thought "that can't be right", but it seems it is.
Their first post did mention something about load testing their data centers IIRC.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1542310 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1542320 - Posted: 15 Jul 2014, 23:57:33 UTC

Yes there would certainly be plenty of CPU cores available in a data centre. ;-)

I wouldn't want to be the 1 paying the power bill. :-O

Cheers.
ID: 1542320 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1542347 - Posted: 16 Jul 2014, 1:15:48 UTC
Last modified: 16 Jul 2014, 1:16:51 UTC

Damn, the splitter let go of 29no13aa. :-(

It did progress from 2 to 3 though.

Cheers.
ID: 1542347 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1542380 - Posted: 16 Jul 2014, 2:51:34 UTC - in response to Message 1542310.  

However part of the increase could be from this user which seems to be stress testing a freaking data center & chucking out 18,000,000 credits worth of work a day.

Okay, now I'm intrigued. And of course the computer list is hidden.

And at least one of their systems has more than an 8-core OSX machine, because all three posts they've ever done were in a thread asking how to get more than 8 to run at a time in OSX.

I only noticed them because they showed up on one of the stat overtake pages for me with over a million RAC & I thought "that can't be right", but it seems it is.
Their first post did mention something about load testing their data centers IIRC.

Here's one of his machines, 7309756, that got into my database before he hid them. Looks like he ran S@H on it for about 4 weeks, then stopped cold on July 6. A lot of WUs successfully processed, which is terrific, but he might have left 100 in limbo if that machine doesn't connect again. Hope he doesn't do it that way for his whole data center.
ID: 1542380 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1542456 - Posted: 16 Jul 2014, 6:01:24 UTC - in response to Message 1542232.  

The MB assimilator backlog has gone from rather large to seriously huge- second biggest backlog this year. 300,000 & growing.
Probably related to that is the steadily rising number of MB results waiting on validation. That which is usually around 2.6 million, has also spiked to the second biggest backlog this year- 3.53 million & climbing.

It does look like the assimilators & the validators haven't recovered after the server barfed on Friday.

Even after the weekly outage, the backlogs continue to grow.
Grant
Darwin NT
ID: 1542456 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1542532 - Posted: 16 Jul 2014, 10:15:19 UTC - in response to Message 1542380.  

Here's one of his machines, 7309756, that got into my database before he hid them. Looks like he ran S@H on it for about 4 weeks, then stopped cold on July 6. A lot of WUs successfully processed, which is terrific, but he might have left 100 in limbo if that machine doesn't connect again. Hope he doesn't do it that way for his whole data center.

Hmm, I just had a dual-node machine with those processors (tho' @ 2.5 GHz) ordered for me. I'll probably not run hyperthreading though, so 2x 20-core machines. :-)
ID: 1542532 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1542970 - Posted: 17 Jul 2014, 5:57:52 UTC - in response to Message 1542456.  

Even after the weekly outage, the backlogs continue to grow.

The backlogs are now clearing, interestingly shortly after AP WUs started flowing again. Cause & effect or just high correlation?
Grant
Darwin NT
ID: 1542970 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1542983 - Posted: 17 Jul 2014, 6:59:55 UTC

This is on the front page:

Planned Power Outage - July 23rd-24th, 2014
Due to necessary electrical repairs at the data center, we need to shut down the servers on the evening of Wednesday, July 23rd. All the SETI projects and web sites will be unreachable for most of the time during this outage. The work should be complete by the following afternoon, and all services back on line shortly afterward.
ID: 1542983 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1542984 - Posted: 17 Jul 2014, 7:02:01 UTC - in response to Message 1542983.  

Thanks Richard i'll keep that in mind thanks for the heads up
ID: 1542984 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1543336 - Posted: 17 Jul 2014, 18:58:46 UTC - in response to Message 1542983.  

And so it appears, that is also the date and time that the BOINC server goes down. It wasn't today.
ID: 1543336 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1543879 - Posted: 18 Jul 2014, 15:49:01 UTC

This tapes 29my13ac & 29no13aa apears as spliting for weeks in the server status page and never ends. Is something wrong with them?
ID: 1543879 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1543882 - Posted: 18 Jul 2014, 15:52:18 UTC - in response to Message 1543879.  
Last modified: 18 Jul 2014, 15:52:41 UTC

This tapes 29my13ac & 29no13aa apears as spliting for weeks in the server status page and never ends. Is something wrong with them?

There are from time to time a dataset or two that have a problem and they lock the splitter working on them up. The staff usually tries to restart them a number of times before giving up on the data. After a while, if they continually stall, they are pulled from the splitting queue.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1543882 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1543999 - Posted: 18 Jul 2014, 20:41:22 UTC

Now THIS is mighty odd....
The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing.
Can't really recall that scenario before.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1543999 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1544001 - Posted: 18 Jul 2014, 20:46:20 UTC - in response to Message 1543999.  

Now THIS is mighty odd....
The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing.
Can't really recall that scenario before.

I am going to guess that it is just an error in the cricket collection of the SNMP data.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1544001 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1544004 - Posted: 18 Jul 2014, 20:54:01 UTC - in response to Message 1544001.  

Now THIS is mighty odd....
The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing.
Can't really recall that scenario before.

I am going to guess that it is just an error in the cricket collection of the SNMP data.

I suspect so. The Crickets live on a server too, and it has gone down once in a long while.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1544004 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1544008 - Posted: 18 Jul 2014, 21:09:08 UTC - in response to Message 1544004.  

Now THIS is mighty odd....
The Cricket graphs appear to have dropped dead about an hour and a half ago, but work is still flowing.
Can't really recall that scenario before.

I am going to guess that it is just an error in the cricket collection of the SNMP data.

I suspect so. The Crickets live on a server too, and it has gone down once in a long while.

Actually it looks like we got moved to the other router.
http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-210/gigabitethernet6_17&ranges=d%3Aw&view=Octets
That port is also named srb-ssi-seti-net and it looks like it picks up exactly where the other one ended.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1544008 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (88) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.