Panic Mode On (89) Server Problems?

Message boards : Number crunching : Panic Mode On (89) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 24 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1556670 - Posted: 14 Aug 2014, 10:48:54 UTC - in response to Message 1556669.  

On the contrary when we have AP spliting the intensity of the problem is smaller because less hosts ask for MB WU, releasing some of the remaining MB splitters workload.

Yep, that's what Richard & I were suggesting.
The only reason we have a Ready-to-send buffer building up at the moment is all the machines that are now running AP WUs. If they were still chewing on MB, there wouldn't be enough to meet demand, let alone build up the Ready-to-send buffer.

The problem with the splitters producing work is just because of the sticking tapes, not the amount of work being downloaded.
Grant
Darwin NT
ID: 1556670 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1556680 - Posted: 14 Aug 2014, 11:52:04 UTC - in response to Message 1556670.  

On the contrary when we have AP spliting the intensity of the problem is smaller because less hosts ask for MB WU, releasing some of the remaining MB splitters workload.

Yep, that's what Richard & I were suggesting.
The only reason we have a Ready-to-send buffer building up at the moment is all the machines that are now running AP WUs. If they were still chewing on MB, there wouldn't be enough to meet demand, let alone build up the Ready-to-send buffer.

The problem with the splitters producing work is just because of the sticking tapes, not the amount of work being downloaded.

The amount of MB work being downloaded alters by a factor of at least 2, possibly nearer 3, depending on whether they're shorties or not (uncertainty because I'm not sure what the ratio of CPU to CUDA is there days. CPUs have nearer to the 3::1 ratio for shorties, compared to the 2::1 ratio for the currently inefficient (at VHAR) CUDA apps).

The number of straws draining beer from the bottom of the barrel affects the ready-to-drink level, just as the number of barmaids pouring buckets of beer into the top of the barrel affects it.
ID: 1556680 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1556681 - Posted: 14 Aug 2014, 11:53:23 UTC - in response to Message 1556669.  

I´m not sure about that. Why? Simply because the slow down of the splitting process happening before the AP frenzy starts, the problem starts when any tape is stuck. At that time 3 days ago there where no AP splitters running, just remember the last monday.

On the contrary when we have AP spliting the intensity of the problem is smaller because less hosts ask for MB WU, releasing some of the remaining MB splitters workload.

A simple kick o the stuck WU normaly clears the problem. What i can understand why that is not automatic.

Before maintenance 4 splitters were keeping RTS around 300,000. With it falling and rising a bit. However once the stuck tape had a 2nd splitter on it RTS dropped quickly. There were around 40K RTS by the time maintenance stared. For whatever reason splitting shorties also seemed to slow down the process & the average output has also picked up a little.
Of course the AP going out is also helpful for building the MB queue up, but I wouldn't put all of my money on it alone. As AP RTS started growing after MB RTS was already doing so.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1556681 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1556702 - Posted: 14 Aug 2014, 13:28:56 UTC

The answer to all our problems is simple: kick the dam 18fe09ag tape and you all will see the MB cache fill very fast.

Can anyone ping the lab people to at least verify why this splitter is stuck at the #3 channel after 3 days?

The WOW event start tomorrow and in about a day we will running out of new AP WU so the MB cache will need to be read to handle all the requests.
ID: 1556702 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1556707 - Posted: 14 Aug 2014, 13:51:43 UTC - in response to Message 1556702.  

The answer to all our problems is simple: kick the dam 18fe09ag tape and you all will see the MB cache fill very fast.

Can anyone ping the lab people to at least verify why this splitter is stuck at the #3 channel after 3 days?

The WOW event start tomorrow and in about a day we will running out of new AP WU so the MB cache will need to be read to handle all the requests.

With WOW event demand may be greater than normal. It would certainly make things interesting.

It looks like the tapes are being processed in FIFO order instead of alphanumerical order. So after sitting in the queue since at least May(maybe April) 31au13aa, 31au13ac, & 31mr13ae may actually get processed!
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1556707 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22204
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1556794 - Posted: 14 Aug 2014, 16:51:09 UTC

There is a throttle on the size of the RTS queue, and APs are 8Mb compared to 300kB of an MB.
It is a shame they don't throttle back the AP production somewhat which would allow a larger number of MBs in the RTS queue.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1556794 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1556803 - Posted: 14 Aug 2014, 17:00:55 UTC - in response to Message 1556794.  

There is a throttle on the size of the RTS queue, and APs are 8Mb compared to 300kB of an MB.
It is a shame they don't throttle back the AP production somewhat which would allow a larger number of MBs in the RTS queue.

The problem is not with the size of the RTS queue. It is with the splitting rate. MB RTS when all is well usually hangs around 300k +/- 20k or so. Which is plenty, even in times of shorty storms. UNLESS we have a hung splitter or two, which unfortunately seems to have happened all too frequently in the last few weeks.

If I understand correctly, there IS a mechanism in place that is supposed to be able to detect a splitting process that is hung up. It apparently is not robust enough or needs some more tweaking to be able to detect and restart things automatically under whatever conditions that have been causing the recent problems.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1556803 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1556936 - Posted: 14 Aug 2014, 20:03:16 UTC

Was anybody able to contact the lab and talk about the dam 18fe09ag tape?
ID: 1556936 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 1557035 - Posted: 14 Aug 2014, 22:58:38 UTC - in response to Message 1556936.  

Was anybody able to contact the lab and talk about the dam 18fe09ag tape?


Patience Juan....Eric is out of town. It may need to wait until he gets back.


ID: 1557035 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1557044 - Posted: 14 Aug 2014, 23:19:23 UTC - in response to Message 1557035.  

Was anybody able to contact the lab and talk about the dam 18fe09ag tape?


Patience Juan....Eric is out of town. It may need to wait until he gets back.

Seriusly? He is the only at the lab? I don´t belive...
ID: 1557044 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1557064 - Posted: 14 Aug 2014, 23:56:52 UTC - in response to Message 1557044.  

No need to panic until all of the present AP work is gone. Then whatever Ready-to-send MB buffer we might have by then will start to feel the demand.
Then we can panic.
Grant
Darwin NT
ID: 1557064 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1557075 - Posted: 15 Aug 2014, 0:23:03 UTC - in response to Message 1557044.  

Was anybody able to contact the lab and talk about the dam 18fe09ag tape?


Patience Juan....Eric is out of town. It may need to wait until he gets back.

Seriusly? He is the only at the lab? I don´t belive...

There is a very small number of people that work the lab & I believe only Matt and Eric are the only 2 that know how to it correct the issue at hand. Please correct me if I am wrong.
ID: 1557075 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1557085 - Posted: 15 Aug 2014, 1:00:18 UTC - in response to Message 1557075.  
Last modified: 15 Aug 2014, 1:01:23 UTC

Was anybody able to contact the lab and talk about the dam 18fe09ag tape?


Patience Juan....Eric is out of town. It may need to wait until he gets back.

Seriusly? He is the only at the lab? I don´t belive...

There is a very small number of people that work the lab & I believe only Matt and Eric are the only 2 that know how to it correct the issue at hand. Please correct me if I am wrong.

I'm sure Dr. Anderson could figure it out, but I expect he wouldn't touch it unless everything had crashed. Even if the RTS buffer gets depleted from reduced splitter capacity. The project is still up and running. Just not at full capacity. Some requests for new work would be answered and some would not.

As far as I know Eric, Matt, and Jeff are the only people that do work for the SETI@home project. Which is only part time of their focus. Jeff seems to be the least talkative of them. So I'm not sure if he is less involved or is there even less than the other guys.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1557085 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1557258 - Posted: 15 Aug 2014, 11:11:08 UTC
Last modified: 15 Aug 2014, 11:14:23 UTC

Ok guy´s very soon we will see if is to be in panic or no.

This round of tapes for AP splitting are allmost at the end (just 7 channels to do)

Soon (in few hours at most, still are about 20k of AP WU in the buffer) we will be out of new AP WU for a while and all the duty will be on the hands of the MB spliters and our 5 days old stuck tape.

And since the WOW event starts today i imagine a lot of users will transfer their processing power for other projest to SETI so that will be an interesting moment to follow.
ID: 1557258 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1557314 - Posted: 15 Aug 2014, 14:21:27 UTC - in response to Message 1557258.  
Last modified: 15 Aug 2014, 14:25:06 UTC


And since the WOW event starts today i imagine a lot of users will transfer their processing power for other projest to SETI so that will be an interesting moment to follow.



You are right, there may be problems, I will be watching the graphs as well. But so far less than 700 users have signed up for WOW. I am guessing most of them are "heavy hitters" - so it should be interesting.

I have committed my modest resources to WOW as well, just for fun!
Member of the 20 Year Club



ID: 1557314 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1557317 - Posted: 15 Aug 2014, 14:28:44 UTC
Last modified: 15 Aug 2014, 14:44:38 UTC

AP spliting just stop few moments ago, an the AP buffer downs from 24K to about 18K at this rate will be empty tomorrow.

The WOW event will going to start in about 1 1/2 hrs, that´s a hell of a coincidence of course, we will see the fun very soon.

Hope i´m wrong and the MB splitters can handle because the dam tape still stuck at the #3 channel and aparently nothing will be done until the next week.
ID: 1557317 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1557318 - Posted: 15 Aug 2014, 14:33:57 UTC - in response to Message 1557258.  

Ok guy´s very soon we will see if is to be in panic or no.

This round of tapes for AP splitting are allmost at the end (just 7 channels to do)

Soon (in few hours at most, still are about 20k of AP WU in the buffer) we will be out of new AP WU for a while and all the duty will be on the hands of the MB spliters and our 5 days old stuck tape.

And since the WOW event starts today i imagine a lot of users will transfer their processing power for other projest to SETI so that will be an interesting moment to follow.

If we use Results returned in the last hour as an indicator of demand. With the average returned for the past month being 64100. Then average demand is about 17.8055 per second. Meaning an average splitter output as little as 18 per second is more than enough. With 4 splitters the average output seems to be running 18.75 and rising.
Splitting VHAR seems to slow the splitting down for some reason. So if we get a bunch of tapes full of shorties, which would also drive up demand, or another splitter goes down we should be alright.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1557318 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1557424 - Posted: 15 Aug 2014, 17:57:47 UTC

OK, kitties.
AP RTS just ran out.
I hope somebody's in the lab to babysit the servers.
We're gonna need more power splitting MB, I suspect.

Meow and away!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1557424 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1557437 - Posted: 15 Aug 2014, 18:13:55 UTC

Panic Mode: ON

Somebody please kick the dam stuck tape splitter.
ID: 1557437 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1557547 - Posted: 15 Aug 2014, 22:11:11 UTC - in response to Message 1557437.  

Once again, demand exceeds supply. MB Ready-to-send buffer slowly draining.
Grant
Darwin NT
ID: 1557547 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (89) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.