Panic Mode On (88) Server Problems?

Message boards : Number crunching : Panic Mode On (88) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 21 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1552362 - Posted: 4 Aug 2014, 21:54:34 UTC - in response to Message 1552358.  

23ja09aa Again?

It looks that way. :-(

Cheers.
ID: 1552362 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1552365 - Posted: 4 Aug 2014, 22:00:35 UTC - in response to Message 1552245.  

23ja09aa looks to be running normally again & only has one splitter on it with 3 channels complete now. Creation rate is now at ~26/sec. In a few hours the RTS will probably be back at full capacity.

Splitter output has increased, but only slightly.
Ready-to-send buffer remains at 0.
Grant
Darwin NT
ID: 1552365 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1552451 - Posted: 5 Aug 2014, 2:49:27 UTC - in response to Message 1552365.  

23ja09aa looks to be running normally again & only has one splitter on it with 3 channels complete now. Creation rate is now at ~26/sec. In a few hours the RTS will probably be back at full capacity.

Splitter output has increased, but only slightly.
Ready-to-send buffer remains at 0.

Yeah that tape is only nerfing 1 splitter instead of 2 now. The number of tasks in progress is rising. So once the tapes full of shorties are done we might build up a send buffer again.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1552451 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1552579 - Posted: 5 Aug 2014, 12:26:47 UTC

Seems like our friend 23ja09aa is still having problems.
ID: 1552579 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1552653 - Posted: 5 Aug 2014, 19:52:33 UTC

Straight back from the outrage and 23ja09aa has tied up 2 splitters again. :-(

Cheers.
ID: 1552653 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1552671 - Posted: 5 Aug 2014, 20:15:32 UTC

One of the Berkeley campus core routers has a problem, so services failed over to a backup. An interesting side effect is that the usual Cricket graph from the data center inr-211 router shows only the blue line for our uploads, etc. The green for our downloads, etc., is shown on the alternate Cricket graph from data center router inr-210.
                                                                   Joe
ID: 1552671 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1552672 - Posted: 5 Aug 2014, 20:17:00 UTC

The bandwidth graphs show the outbound is on router port 210 again, but inbound seems to still be on 211.
http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets
http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-210/gigabitethernet6_17&ranges=d%3Aw&view=Octets

So data is flying out normally after the outage.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1552672 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1552714 - Posted: 5 Aug 2014, 22:06:40 UTC - in response to Message 1552653.  

Straight back from the outrage and 23ja09aa has tied up 2 splitters again. :-(

Joy.
Splitter output is less than it was before the outage, and before the outage it wasn't enough. Probably one in 15-20 requests result in work, but no where near enough to fill up the cache again. Before the outage about 1 in 7-10 requests got work, and usually enough to keep the cache close to full.
Grant
Darwin NT
ID: 1552714 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1552739 - Posted: 5 Aug 2014, 23:09:00 UTC - in response to Message 1552714.  

EDIT- well the work going out should help offset how little there is. Everything I've got so far for the CPU is VLAR & all the GPU stuff is long running as well, no shorties or even mid range stuff in sight. Still a good 50+ short of a full cache.
Grant
Darwin NT
ID: 1552739 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1552780 - Posted: 6 Aug 2014, 2:26:55 UTC

I hope someone is looking into it it seems to be getting worse
ID: 1552780 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1552781 - Posted: 6 Aug 2014, 2:28:01 UTC

Thamks Hal9000 for the cricket links i've been trying to find the cricket links so thank you very much
ID: 1552781 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1552825 - Posted: 6 Aug 2014, 5:59:44 UTC - in response to Message 1552781.  

Splitter output has dropped off even further.
While not full, my caches have remained close to it, but now they are starting to drain down. Even more requests for work are resulting in none, and those that do get some work are getting less than before.
Grant
Darwin NT
ID: 1552825 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1552828 - Posted: 6 Aug 2014, 6:47:05 UTC - in response to Message 1552825.  

same here grant fast running out...backup progect till it's fixed if i run out
ID: 1552828 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1552836 - Posted: 6 Aug 2014, 9:00:49 UTC

23ja09aa still havng problems.
ID: 1552836 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1552868 - Posted: 6 Aug 2014, 12:46:08 UTC

Confirmed splitter problem(s): one or more of them need a re-boot!
.

Hello, from Albany, CA!...
ID: 1552868 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1552877 - Posted: 6 Aug 2014, 13:01:55 UTC

Another night passes and another splitter problem apears, seems like this is the new constant.
ID: 1552877 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1552878 - Posted: 6 Aug 2014, 13:04:48 UTC - in response to Message 1552877.  

Another night passes and another splitter problem apears, seems like this is the new constant.

It's a bum dataset, 23ja09aa, that keeps getting stuck and ties up a splitter or two working on it when that happens.

Have sent off another message to headquarters.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1552878 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1552881 - Posted: 6 Aug 2014, 13:17:47 UTC

Thanks Mark.

Something else call my attention, they don´t have a "watch dog" on the splitters exactly to avoid that?
ID: 1552881 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1552888 - Posted: 6 Aug 2014, 14:01:10 UTC - in response to Message 1552881.  

Thanks Mark.

Something else call my attention, they don´t have a "watch dog" on the splitters exactly to avoid that?

I think the main problem is that with the complexity of dependencies of the various bits of the back-end. Adding another process to the mix to watch for errors may introduce more errors.

I believe that there is/was a system in place to prevent the db from getting to large, but it was not working correctly. So the db server would crash instead.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1552888 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1552891 - Posted: 6 Aug 2014, 14:09:22 UTC
Last modified: 6 Aug 2014, 14:20:22 UTC

Maybe they could do that in a more simple way, watch the spllinitng process and if it hangs, just stop splitting on this determinated tape until someone could verify the problem.
ID: 1552891 · Report as offensive
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (88) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.