Message boards :
Number crunching :
Kosh Validators Need a Kick Start
Message board moderation
Author | Message |
---|---|
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Matt is away and the validators on Kosh seemed to have stopped. How can they be restarted? The validation queue is rather large as it is. May this Farce be with You |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
here's what the "status page" says: Database status State Approximate #results Ready to send 434,469 In progress 989,416 Waiting for validation 47,612 Transitioner backlog 0 hours I see nothing out of the ordinary here. Nothing to be concerned about. It appears they don't need "kosh" to keep up. Now if the "Waiting to Validate" number is over 300K AND nothing appears to be being done, then a gentle reminder may be appropriate. IMO Seti doesn't need to tell us each move they make. I don't wanna see news like "Sorry, I need to use the bathroom I'll be back in 10 minutes", or "I'm going to throw this switch now, but will put it back in 5 minutes", or pretty much anything they are doing that doesn't adversely affect our ability to get and crunch numbers. The Validators don't affect anyone, except you may have to wait another hour for your credit. Heck, you've waited hours and days to return it. What's the big deal? tony 2cents inserted |
Scarecrow Send message Joined: 15 Jul 00 Posts: 4520 Credit: 486,601 RAC: 0 |
here's what the "status page" says: I'll admit to having my beer goggles on about 10 hours ago when I last looked at the status page, but if my remaining brain cells recall correctly, the WFV queue has actually gained ground with kosh in the red. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I agree the WFV number was larger an hour ago, but I have to assume there is a reason command central assigned four validation processes. Beyond that, this community seems to think that the target WFV for nominal operations is zero, not 5e5. Maybe the reason the two extra processes don't seem to be needed is that there is another bottleneck in the system we are not aware of (yet). In any event, the implication is that there is a problem somewhere because they could not have run out of work and they were not disabled. May this Farce be with You |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
That isn't a solid stationary number, it's a floating number. Imagine you and a friend sell apples on the street. Each hour you count the number of apples in the bushel basket. Your friend continues to get more apples for your basket, that adds to the count. However, you have a great product at a great price and they are selling fast. to count the apples in the basket each our doesn't tell you the important stuff, like the total number of apples sold, the profit from your venture, or even how many apples you might need tommorrow. It's only if you look down and are out of apples OR have to many apples that it means ANYTHING. Sometimes you'll look down and have to many apples and turn on the sales effort, Or give your friend "KOSH" a break and let him got get some coffee. If you start running out then Kosh better get back to work. The validators under a hundred K really mean nothing, it's only when they are at zero, or above 250/300K that it means anything. I gotta go, I have a 12 hour drive ahead of me, so I'll leave you all with this thought. I'll be back tomorrow to check your responses. tony |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I understand, to some extent, dynamic queuing. However, I was just trying to point out that two processes crashed and that one of the queue lenghts was out of target. It may mean nothing, which is your point, and I agree. But if it did mean something, like immenent disk full problems, maybe the rumor would spread to command central so they could find the root cause of the crash and fix it. Have a nice 12h drive. May this Farce be with You |
ML1 Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20 |
The server status now shows: Database status State Approximate #results Ready to send 500,134 In progress 986,348 Waiting for validation 22,846 Transitioner backlog 0 hours The WFV number has halved from earlier so perhaps the Kosh processes are not so vital for keeping up after all. All part of the good fun for a still developing system. Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Looks like I agree with you regarding the Kosh validator. However, I noticed Kosh transitioner was down a little while ago (red, not out of work). Since then it was restarted and we are 2 hours behind on the transitioner. I wonder if command central would show us both the number of units waiting and the approximate time to clear, at least for the transitioner and the wfv queues. It would be interesting. May this Farce be with You |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Looking this morning, the WFV queue has slipped considerably, to more than 60K and growing. I think this is evidence that two processes are not enough to tread water, and my original concern was justified: Taser Kosh and get those processes running again! May this Farce be with You |
Scarecrow Send message Joined: 15 Jul 00 Posts: 4520 Credit: 486,601 RAC: 0 |
Taser Kosh and get those processes running again! As of 1 Aug 2005 13:20:10 UTC sah_validate1 koloth Not Running sah_validate2 koloth Not Running sah_validate3 kosh Not Running sah_validate4 kosh Not Running Now we should worry :) [edit] As of 1 Aug 2005 14:00:07 UTC sah_validate1 penguin Running sah_validate2 penguin Running sah_validate3 penguin Running sah_validate4 penguin Running Now we can stop. [/edit] |
ML1 Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20 |
As of 1 Aug 2005 13:20:10 UTC OK, so Koloth & Kosh transmute into the Penguin. I think we can allow Berkeley 40 mins for such a swap-over! (Batman not needed on this occasion :) ) Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
ML1 Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20 |
OK, so Koloth & Kosh transmute into the Penguin. I think we can allow Berkeley 40 mins for such a swap-over! ...And it looks like the Penguin is maxed out: Database status State Approximate #results Ready to send 502,886 In progress 986,023 Waiting for validation 111,882 Transitioner backlog 0 hours And WVF is slowly creeping higher. I hope they get the next shuffle-around done before they run out of disk space. Galileo and Kryten are suspiciously lightly listed for Boinc tasks... Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
OK, so Koloth & Kosh transmute into the Penguin. I think we can allow Berkeley 40 mins for such a swap-over! Kryten is upload/download-server for SETI@Home/BOINC, and with expected increase when more users migrate from "classic" it's not certain adding other processes is advisable. Galileo is scheduling-server for SETI@Home/BOINC, and will also get increased load as more users migrates. Additionally, it's the old master science database for "classic". They're been migrating this db to another server, but AFAIK still isn't finished... |
ML1 Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20 |
...Galileo and Kryten are suspiciously lightly listed for Boinc tasks... Thanks for the heads-up on the bits we can't see from the status page. ... Could Matt (or whomever) perhaps add yet another few status page boxes? ;) Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I vote to bring Sagan back for an encore. He seems like a pretty lazy lout, with 4 cpu's being wasted on seti-classic. Penguin is burning his feathers off with 7 processes running, and notably all the validation processes. Must be the network topography is still 'non-optimal' at command central. We junkies need a fix, I mean an explanatory post. May this Farce be with You |
ML1 Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20 |
... We junkies need a fix, I mean an explanatory post. I'll second that! :) Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.