Kosh Validators Need a Kick Start

Message boards : Number crunching : Kosh Validators Need a Kick Start
Message board moderation

To post messages, you must log in.

AuthorMessage
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 144822 - Posted: 30 Jul 2005, 12:23:12 UTC

Matt is away and the validators on Kosh seemed to have stopped. How can they be restarted? The validation queue is rather large as it is.
May this Farce be with You
ID: 144822 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 144832 - Posted: 30 Jul 2005, 13:02:45 UTC

here's what the "status page" says:

Database status
State Approximate #results
Ready to send 434,469
In progress 989,416
Waiting for validation 47,612
Transitioner backlog 0 hours


I see nothing out of the ordinary here. Nothing to be concerned about. It appears they don't need "kosh" to keep up. Now if the "Waiting to Validate" number is over 300K AND nothing appears to be being done, then a gentle reminder may be appropriate. IMO Seti doesn't need to tell us each move they make. I don't wanna see news like "Sorry, I need to use the bathroom I'll be back in 10 minutes", or "I'm going to throw this switch now, but will put it back in 5 minutes", or pretty much anything they are doing that doesn't adversely affect our ability to get and crunch numbers. The Validators don't affect anyone, except you may have to wait another hour for your credit. Heck, you've waited hours and days to return it.

What's the big deal?

tony

2cents inserted
ID: 144832 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 144835 - Posted: 30 Jul 2005, 13:14:15 UTC - in response to Message 144832.  

here's what the "status page" says:

Database status
State Approximate #results
Ready to send 434,469
In progress 989,416
Waiting for validation 47,612
Transitioner backlog 0 hours



I'll admit to having my beer goggles on about 10 hours ago when I last looked at the status page, but if my remaining brain cells recall correctly, the WFV queue has actually gained ground with kosh in the red.
ID: 144835 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 144836 - Posted: 30 Jul 2005, 13:21:55 UTC
Last modified: 30 Jul 2005, 13:23:14 UTC

I agree the WFV number was larger an hour ago, but I have to assume there is a reason command central assigned four validation processes. Beyond that, this community seems to think that the target WFV for nominal operations is zero, not 5e5.

Maybe the reason the two extra processes don't seem to be needed is that there is another bottleneck in the system we are not aware of (yet). In any event, the implication is that there is a problem somewhere because they could not have run out of work and they were not disabled.


May this Farce be with You
ID: 144836 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 144843 - Posted: 30 Jul 2005, 13:55:48 UTC
Last modified: 30 Jul 2005, 13:56:56 UTC

That isn't a solid stationary number, it's a floating number.

Imagine you and a friend sell apples on the street. Each hour you count the number of apples in the bushel basket. Your friend continues to get more apples for your basket, that adds to the count. However, you have a great product at a great price and they are selling fast. to count the apples in the basket each our doesn't tell you the important stuff, like the total number of apples sold, the profit from your venture, or even how many apples you might need tommorrow. It's only if you look down and are out of apples OR have to many apples that it means ANYTHING.

Sometimes you'll look down and have to many apples and turn on the sales effort, Or give your friend "KOSH" a break and let him got get some coffee. If you start running out then Kosh better get back to work.

The validators under a hundred K really mean nothing, it's only when they are at zero, or above 250/300K that it means anything.

I gotta go, I have a 12 hour drive ahead of me, so I'll leave you all with this thought. I'll be back tomorrow to check your responses.

tony
ID: 144843 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 144921 - Posted: 30 Jul 2005, 17:19:02 UTC

I understand, to some extent, dynamic queuing.

However, I was just trying to point out that two processes crashed and that one of the queue lenghts was out of target. It may mean nothing, which is your point, and I agree. But if it did mean something, like immenent disk full problems, maybe the rumor would spread to command central so they could find the root cause of the crash and fix it.

Have a nice 12h drive.

May this Farce be with You
ID: 144921 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 145307 - Posted: 31 Jul 2005, 13:02:32 UTC

The server status now shows:

Database status
State Approximate #results
Ready to send 500,134
In progress 986,348
Waiting for validation 22,846
Transitioner backlog 0 hours

The WFV number has halved from earlier so perhaps the Kosh processes are not so vital for keeping up after all.

All part of the good fun for a still developing system.

Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 145307 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 145424 - Posted: 31 Jul 2005, 19:39:45 UTC

Looks like I agree with you regarding the Kosh validator. However, I noticed Kosh transitioner was down a little while ago (red, not out of work). Since then it was restarted and we are 2 hours behind on the transitioner.

I wonder if command central would show us both the number of units waiting and the approximate time to clear, at least for the transitioner and the wfv queues. It would be interesting.
May this Farce be with You
ID: 145424 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 145611 - Posted: 1 Aug 2005, 12:06:34 UTC

Looking this morning, the WFV queue has slipped considerably, to more than 60K and growing. I think this is evidence that two processes are not enough to tread water, and my original concern was justified: Taser Kosh and get those processes running again!
May this Farce be with You
ID: 145611 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 145625 - Posted: 1 Aug 2005, 13:34:01 UTC - in response to Message 145611.  
Last modified: 1 Aug 2005, 14:06:19 UTC

Taser Kosh and get those processes running again!


As of 1 Aug 2005 13:20:10 UTC
sah_validate1 koloth Not Running
sah_validate2 koloth Not Running
sah_validate3 kosh Not Running
sah_validate4 kosh Not Running

Now we should worry :)

[edit]
As of 1 Aug 2005 14:00:07 UTC
sah_validate1 penguin Running
sah_validate2 penguin Running
sah_validate3 penguin Running
sah_validate4 penguin Running

Now we can stop.
[/edit]
ID: 145625 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 145670 - Posted: 1 Aug 2005, 16:23:36 UTC - in response to Message 145625.  

As of 1 Aug 2005 13:20:10 UTC
sah_validate1 koloth Not Running
sah_validate2 koloth Not Running
sah_validate3 kosh Not Running
sah_validate4 kosh Not Running

Now we should worry :)

[edit]
As of 1 Aug 2005 14:00:07 UTC
sah_validate1 penguin Running
sah_validate2 penguin Running
sah_validate3 penguin Running
sah_validate4 penguin Running

Now we can stop.
[/edit]

OK, so Koloth & Kosh transmute into the Penguin. I think we can allow Berkeley 40 mins for such a swap-over!

(Batman not needed on this occasion :) )

Cheers,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 145670 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 146050 - Posted: 2 Aug 2005, 11:51:04 UTC - in response to Message 145670.  
Last modified: 2 Aug 2005, 11:52:48 UTC

OK, so Koloth & Kosh transmute into the Penguin. I think we can allow Berkeley 40 mins for such a swap-over!

...And it looks like the Penguin is maxed out:

Database status
State Approximate #results
Ready to send 502,886
In progress 986,023
Waiting for validation 111,882
Transitioner backlog 0 hours

And WVF is slowly creeping higher.

I hope they get the next shuffle-around done before they run out of disk space. Galileo and Kryten are suspiciously lightly listed for Boinc tasks...

Cheers,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 146050 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 146082 - Posted: 2 Aug 2005, 14:08:47 UTC - in response to Message 146050.  

OK, so Koloth & Kosh transmute into the Penguin. I think we can allow Berkeley 40 mins for such a swap-over!

...And it looks like the Penguin is maxed out:

Database status
State Approximate #results
Ready to send 502,886
In progress 986,023
Waiting for validation 111,882
Transitioner backlog 0 hours

And WVF is slowly creeping higher.

I hope they get the next shuffle-around done before they run out of disk space. Galileo and Kryten are suspiciously lightly listed for Boinc tasks...

Cheers,
Martin


Kryten is upload/download-server for SETI@Home/BOINC, and with expected increase when more users migrate from "classic" it's not certain adding other processes is advisable.

Galileo is scheduling-server for SETI@Home/BOINC, and will also get increased load as more users migrates. Additionally, it's the old master science database for "classic". They're been migrating this db to another server, but AFAIK still isn't finished...
ID: 146082 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 146232 - Posted: 2 Aug 2005, 21:48:08 UTC - in response to Message 146082.  
Last modified: 2 Aug 2005, 21:48:24 UTC

...Galileo and Kryten are suspiciously lightly listed for Boinc tasks...

Kryten is upload/download-server for SETI@Home/BOINC, and with expected increase when more users migrate from "classic" it's not certain adding other processes is advisable.

Galileo is scheduling-server for SETI@Home/BOINC, and will also get increased load as more users migrates. Additionally, it's the old master science database for "classic". They're been migrating this db to another server, but AFAIK still isn't finished...

Thanks for the heads-up on the bits we can't see from the status page.

... Could Matt (or whomever) perhaps add yet another few status page boxes? ;)

Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 146232 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 146250 - Posted: 2 Aug 2005, 22:48:47 UTC

I vote to bring Sagan back for an encore. He seems like a pretty lazy lout, with 4 cpu's being wasted on seti-classic.

Penguin is burning his feathers off with 7 processes running, and notably all the validation processes. Must be the network topography is still 'non-optimal' at command central.

We junkies need a fix, I mean an explanatory post.
May this Farce be with You
ID: 146250 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 146279 - Posted: 2 Aug 2005, 23:54:18 UTC - in response to Message 146250.  

... We junkies need a fix, I mean an explanatory post.

I'll second that!

:)
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 146279 · Report as offensive

Message boards : Number crunching : Kosh Validators Need a Kick Start


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.