Panic Mode On (88) Server Problems?

Message boards : Number crunching : Panic Mode On (88) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1554057 - Posted: 9 Aug 2014, 1:55:26 UTC - in response to Message 1554051.  

MB Current result creation rate 0.6014/sec, Huston do we have a problem?

The splitters were probably just taking a cat nap.

Yep.
It looks like they're working again now- output is in the mid to high 20's; for the last few days it's been high teens at best, mid teens most of the time. Generally 22/s is the minimum needed to meet demand & rebuild the Ready-to-send buffer. 20/s is enough when then there's a lot of VLAR work about, 25 or even 28/s when it's mostly shorties.
When the Ready-to-send buffer is full, they shut down till it drops below 300,000 & then they crank up again.
Grant
Darwin NT
ID: 1554057 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1554059 - Posted: 9 Aug 2014, 1:55:34 UTC - in response to Message 1554056.  

No panic at present.

SO all it´s working, then it´s time for a beer, thanks for the info.
ID: 1554059 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1554060 - Posted: 9 Aug 2014, 1:56:16 UTC - in response to Message 1554056.  

MB Current result creation rate 0.6014/sec, Huston do we have a problem?

The splitters were probably just taking a cat nap.

Yes, they do that now and again until they hit a certain number to crank up again.

23ja09aa is up to 13 done now so we may finally see the last of it.

Cheers.

Yup...current update shows 26.7324/sec and RTS went from about 290k to 302k.
No panic at present.

Yeah it looks like the normal on/off method of operation they use as the number of RTS goes up/down.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1554060 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1554247 - Posted: 9 Aug 2014, 13:23:19 UTC - in response to Message 1553710.  

23ja09aa needs another kick. :(

Looks like 23ja09aa has been kicked right off the status page :)

Member of the People Encouraging Niceness In Society club.

ID: 1554247 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1554750 - Posted: 10 Aug 2014, 22:09:25 UTC - in response to Message 1554247.  

Yet another stuck tape?
MB splitter output once again has taken a hit. From high 20s/low 30s it's back down to barely hitting 20/s.
Grant
Darwin NT
ID: 1554750 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1554772 - Posted: 10 Aug 2014, 22:56:37 UTC - in response to Message 1554750.  
Last modified: 10 Aug 2014, 23:26:57 UTC

Yet another stuck tape?
MB splitter output once again has taken a hit. From high 20s/low 30s it's back down to barely hitting 20/s.

I wonder if they are just not creating shorties. As the splitter are running nearly constantly instead of the normal on/off mode. Average output looks to be about the same for running constantly at a lower output rate vs on/off with times of greater output.
http://setistats.haveland.com/sah_creation.html

I think 18fe09ag might have been at 1 for a while actually. So perhaps we should keep an eye on that one.
If it get stuck also. does that point to there being an issue with the recorder in the Jan-Feb '09 time frame?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1554772 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1554801 - Posted: 11 Aug 2014, 0:20:46 UTC - in response to Message 1554772.  

Yet another stuck tape?
MB splitter output once again has taken a hit. From high 20s/low 30s it's back down to barely hitting 20/s.

I wonder if they are just not creating shorties. As the splitter are running nearly constantly instead of the normal on/off mode. Average output looks to be about the same for running constantly at a lower output rate vs on/off with times of greater output.

The fact that the splitters are running at the lower rate indicates there is a problem- as long as they are able to put out 30/s (or better yet even more) when they are running means it's possible to meet demand as well as build up the Ready-to-send buffer, no matter how many shorties are going out at the time & even if there is no AP work to keep those machines happy.
When the output drops to mid 20s & below then it's barely enough to meet demand. Drop below 20/s & that often isn't enough to meet demand.

At the moment there are very few shorties in the mix- around 65,000/hr (just over 18/s) are being returned. So roughly 19/s is needed just to meet demand. When it's mostly shorties you're looking at 23/s just to meet demand.
Any less than that and work tends to (eventually) run out.
As long as they can split 30/s, people can refill their caches after outages, and the Ready-to-send buffer can be refilled.
Grant
Darwin NT
ID: 1554801 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1554910 - Posted: 11 Aug 2014, 5:12:54 UTC

Grant - you are beyond belief...
There are about 300k in the cache, so the splitters will be slowing down or even stopping until the cache drops to about 275k, they will then kick in again running at about 25-30/s until the cache has reached just over 300k. They've been doing this sort of thing for many months, it is thier normal behaviour. Abnormal is when they are running at 30/s and 400k in the cache, or 5/s and 50k in the cache.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1554910 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1554916 - Posted: 11 Aug 2014, 5:24:25 UTC

Yep, 18fe09ag certainly needs a kick seeing that it hasn't shown any progress all day.

Cheers.
ID: 1554916 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1554922 - Posted: 11 Aug 2014, 5:32:34 UTC - in response to Message 1554910.  

Grant - you are beyond belief...

No more than you.
If you look at the graphs you will see that the splitters aren't working as they should be. By your own description of how the splitters operate, you should be able to see there is an issue.
Look at the Ready-to-send buffer graph, look at the splitter output graph.
It's right there, starting about 12.5 hours ago.
Grant
Darwin NT
ID: 1554922 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1554927 - Posted: 11 Aug 2014, 5:39:20 UTC - in response to Message 1554910.  

I am not so sure about that. The behaviour you describe, on at 275K then off at 300+K is definitely what has been happening but the munin graphs (http://setistats.haveland.com/sah_results.html) show a different pattern. Perhaps the staff is trying for better control of the RTS cache - or maybe generation and consumption have hit a precarious balance - we shall see.
Member of the 20 Year Club



ID: 1554927 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1554930 - Posted: 11 Aug 2014, 5:50:18 UTC - in response to Message 1554927.  

I am not so sure about that. The behaviour you describe, on at 275K then off at 300+K is definitely what has been happening but the munin graphs (http://setistats.haveland.com/sah_results.html) show a different pattern. Perhaps the staff is trying for better control of the RTS cache - or maybe generation and consumption have hit a precarious balance - we shall see.

Or as I suggested, yet another stuck tape.
It's happened before, it'll happen again. It would appear it's occured now.
Grant
Darwin NT
ID: 1554930 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1555097 - Posted: 11 Aug 2014, 15:48:17 UTC

MB average production is currently still greater than the average return rate. About the time AP in process drops below 80K demand for MB will increase & if a 2nd splitter gets stuck then we are really smurfed.
I would watch 22fe09ah to see if it gets stuck. It seems to be near the same time frame as the past 2 that have done so.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1555097 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1555122 - Posted: 11 Aug 2014, 16:42:05 UTC

Grant, why don't you learn to read a graph, or maybe read the correct one - the 24 hour one not the monthly one - during the last 24 hours the ready to send has varied between about 280k and 320k, with a periodicity of about an hour and a half, with the exception of a few hours just after midnight it was pretty flat at about 290-310k - I guess that's when the USA were sleeping, so we didn't get the "day time only" crunchers making demands. That looks like a well behaved buffer/cache to me, running nicely in its constraints.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1555122 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1555132 - Posted: 11 Aug 2014, 16:59:25 UTC - in response to Message 1554930.  

Or as I suggested, yet another stuck tape.
It's happened before, it'll happen again. It would appear it's occured now.

The 18fe09ag aparentely is stuck at 1 after about a day or more.
ID: 1555132 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1555205 - Posted: 11 Aug 2014, 18:52:07 UTC - in response to Message 1555132.  

Or as I suggested, yet another stuck tape.
It's happened before, it'll happen again. It would appear it's occured now.

The 18fe09ag aparentely is stuck at 1 after about a day or more.

Yeah that is the one we have been pointing at for the past 20 hours or so.

Has 17oc08ab been sitting on 2 splitters for a long time? I saw a tape using 2 before I went to lunch but I did not note the tape name.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1555205 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1555262 - Posted: 11 Aug 2014, 20:43:50 UTC

18fe09ag has claimed its 2nd splitter, I see this morning, so we'll just have to make do running on 3 splitters until someone kicks that file loose again.

Cheers.
ID: 1555262 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1555277 - Posted: 11 Aug 2014, 21:00:13 UTC - in response to Message 1555262.  

18fe09ag has claimed its 2nd splitter, I see this morning, so we'll just have to make do running on 3 splitters until someone kicks that file loose again.

Cheers.

I think 22fe09ah might be stuck as well, or it could just be running very slowly. Other tapes have advanced while it has not.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1555277 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1555296 - Posted: 11 Aug 2014, 21:33:06 UTC - in response to Message 1555277.  

18fe09ag has claimed its 2nd splitter, I see this morning, so we'll just have to make do running on 3 splitters until someone kicks that file loose again.

Cheers.

I think 22fe09ah might be stuck as well, or it could just be running very slowly. Other tapes have advanced while it has not.

If that is the case then we'll just have to get by on 2 splitters.

Cheers.
ID: 1555296 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1555338 - Posted: 11 Aug 2014, 23:19:22 UTC - in response to Message 1555122.  

Grant, why don't you learn to read a graph, or maybe read the correct one - the 24 hour one not the monthly one - during the last 24 hours the ready to send has varied between about 280k and 320k,

Are you serious? Honestly?
If you are unable to read a graph, and make comparisons between daily, weekly, monthly & yearly numbers then there is nothing I can do to help you. But I will try.
Suffice to say, just by looking at the daily Ready-to-send buffer or Result creation rate graphs, without even comparing to the other graphs, it is glaringly obvious there is an issue with the splitters.

Since the issue has been going for over 24hrs, you now need to compare the current 24hr graph to the weekly or monthly graph to see what is happening now, and what usually happens.


If you want to educate yourself, look at the Result creation rate weekly graph & compare the 4th-7th, with the 8th-10th. Now look at what happened at the end of the 10th, and in to the 11th.
Also look at the Ready-to-send buffer & compare the same dates.
If you're unsure what you're looking for, it's the maximum & minimum values. When on the 24hour graph you can also check the period of time it takes to go from the those maximum or minimum values.
In case you're still unsure- the maximum & minimum values at present are less than they usually are, and the time it takes to go between those much lower minimum & maximum values is much less for the splitter output, & to run down the Ready-to-send buffer, but much longer to refill the Ready-to-send buffer. That indicates problems with the splitters.
Of course the Ready-to-send buffer depends on not just the rate at which work is split, but also the rate at which it is processed, but explaining the factors involved in that is beyond the scope of this lesson.
Grant
Darwin NT
ID: 1555338 · Report as offensive
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · Next

Message boards : Number crunching : Panic Mode On (88) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.