Panic Mode On (8) Server problems

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 803247 - Posted: 29 Aug 2008, 22:56:57 UTC - in response to Message 803235. Seems to be a problem Yep. Ready to Send buffer is down to zero, and although the Server Status page says the Splitters are running, they're not producing any work. Grant Darwin NT ID: 803247 ·

Bert Send message Joined: 12 Oct 06 Posts: 84 Credit: 813,295 RAC: 0	Message 803250 - Posted: 29 Aug 2008, 23:14:41 UTC - in response to Message 803247. Seems to be a problem Yep. Ready to Send buffer is down to zero, and although the Server Status page says the Splitters are running, they're not producing any work. Friday just before a long weekend. We should be getting used to it. I upped my queue to 6 extra days. Should keep me going if we gotta wait until Wednesday. ID: 803250 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 803260 - Posted: 29 Aug 2008, 23:34:15 UTC Oooopssss.. it's look like no new work until Tuesday.. Technical News : Sort-of Weekend Wrapup (Aug 28 2008) ID: 803260 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 803332 - Posted: 30 Aug 2008, 5:03:17 UTC Looks like someone gave it a kick- the splitters are splitting again & have been doing so for a while now. The Ready to Send buffer is slowly growing again. Grant Darwin NT ID: 803332 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 803510 - Posted: 30 Aug 2008, 23:14:49 UTC And now the Ready to Send buffer has dropped to 0 again. Grant Darwin NT ID: 803510 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 803517 - Posted: 31 Aug 2008, 0:10:23 UTC We just hit another patch of short duration workunits that fast optimized systems can crunch in 7-10 minutes each. Due to the short duration they also tend to go to the head of the queue so it doesn't matter how many 10s of 100s of workunits the fast systems already have queued. The splitters simply can't keep up with the demand. Server status shows the creation rate barley keeping ahead of the return rate at the 31 Aug 0Z snapshot. "Life is just nature's way of keeping meat fresh." - The Doctor ID: 803517 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 803532 - Posted: 31 Aug 2008, 1:12:27 UTC - in response to Message 803517. The splitters simply can't keep up with the demand. Which indicates a problem somewhere in the system. The splitters can (and have) churned out 35+ Work Units per second for sustained periods. But for some reason they were stuck at around 10 or so for several hours & have only been doing 15 or so for the last hour & a bit. Usually once the RtS buffer drops by a few 1,000 they'll crank up the pace to 20 or more & maintain the buffer. That hasn't happened today. Grant Darwin NT ID: 803532 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 803592 - Posted: 31 Aug 2008, 4:49:28 UTC I think we are dealing with the lack of HD space again, with all the shorties being generated it fills up the space and then the splitters have to throttle back until some more space opens up. ID: 803592 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 803603 - Posted: 31 Aug 2008, 6:35:45 UTC Last modified: 31 Aug 2008, 7:30:22 UTC Which brings up the strange conundrum with having work unit queues. On one hand, the client seeing a work unit with a short deadline would prioritize it if you are running with a queue of even modest length. This adds to your turnaround time for those work units already in your queue. This results in the work unit information and any already returned result taking up space longer on the servers. The lack of space on the server caps new work unit creation. However if you have a small queue, this will most likely mean you will run out of work. Which in turn encourages members to increase their queue size to avoid running out of work. Which leads to more space being used up on the servers. Which caps the creation rate of new work units. Rinse and repeat. Oh, and I forgot another thing. Since validated work units and their results hang around for I think 24 hours before deletion, a run of short duration workunits that are processed immediately will still take up more space due to the shear number of the work units that can get done in 24 hours. So even if everyone was running with virtually no queues, the servers will still clog up. It's thermodynamics class all over again. You can't win, you can't break even, you can't get out of the game. "Life is just nature's way of keeping meat fresh." - The Doctor ID: 803603 ·

Bernie Vine Volunteer moderator Volunteer tester Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328	Message 803635 - Posted: 31 Aug 2008, 10:25:38 UTC Last modified: 31 Aug 2008, 10:26:30 UTC I must be a bit slow here, but how exactly how does Boinc decide which order to do work units. On my quad I keep a 1 day queue, but currently work units with a date 6 or 7th of September are being left in favour of 16th 17th 22nd and 23rd of September. Admittedly the earlier ones are 20mins and the later ones about an hour, however if I was to go on holiday today till the 8th of September (not a long time) then 54 units would have to be re-sent. Doesn't seem an efficient way of doing things. I only run S@H by the way. Bernie ID: 803635 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 803642 - Posted: 31 Aug 2008, 11:06:10 UTC - in response to Message 803635. Last modified: 31 Aug 2008, 11:06:40 UTC I must be a bit slow here, but how exactly how does Boinc decide which order to do work units. On my quad I keep a 1 day queue, but currently work units with a date 6 or 7th of September are being left in favour of 16th 17th 22nd and 23rd of September. Admittedly the earlier ones are 20mins and the later ones about an hour, however if I was to go on holiday today till the 8th of September (not a long time) then 54 units would have to be re-sent. Doesn't seem an efficient way of doing things. I only run S@H by the way. Bernie In the normal course of events, BOINC does the work in the order it was issued by Berkeley. You've identified yourself as being in the UK, so you must be familiar with the concept of a queue - first one to the bus-stop is the first to get on the bus, that sort of thing? That's how BOINC works. Except - if tail-end Charlie is in danger of missing an important appointment (a deadline), she's allowed to jump the queue and get on the bus first. And how much danger she's in depends on how long the queue is. If the queue is one day, and the deadline is seven days, then there's no risk of missing deadline and no queue-jumping is allowed. If you actually tell BOINC that you're going to go away on holiday.... well, you can't exactly, but you can say "I'm not going to connect to the internet again for another 10 days", then BOINC will rush through the tasks which have a shorter deadline than that. It'll also assume that you're not going to take your computer on holiday with you, and try to stock up on work it can do without contacting the internet for that length of time. ID: 803642 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 803643 - Posted: 31 Aug 2008, 11:09:26 UTC - in response to Message 803635. I must be a bit slow here, but how exactly how does Boinc decide which order to do work units. It does then in the order in which they are downloaded. If there is a chance of missing a deadline (due to a early deadline or due to your connection settings, amount of time Seti gets to run while the computer is on, or the number of hours the computer is actually on) then it will do those Work Units first, then go back to processing them in the order in which they were downloaded. Grant Darwin NT ID: 803643 ·

[B^S] madmac Volunteer tester Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0	Message 803644 - Posted: 31 Aug 2008, 11:14:22 UTC - in response to Message 803510. And now the Ready to Send buffer has dropped to 0 again. It has drooped to 0 again after the ready to send went up ID: 803644 ·

Bernie Vine Volunteer moderator Volunteer tester Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328	Message 803646 - Posted: 31 Aug 2008, 11:40:16 UTC Last modified: 31 Aug 2008, 11:50:49 UTC I realise I can tell Boinc I am "going away for a few days" but I actively participate, I have 9 machines in various places and I keep tabs on what they are all doing. However, if my Quad belonged to "Joe Public" who just happened to think S@H was a good idea, but took no active part. If today a midday UTC they shutdown their PC till the 8th of September. 54 work units would default and have to be re-issued, still doesn't seem the most efficient way to use resources. Most of my pending credit is exactly this, units that have not been returned by deadline and have to be re-issued. I am not normally a "numbers hound" however I just noticed I am nearly at the 500,000 milestone and you start to notice things. Bernie PS. Oh dear: ID: 803646 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 803651 - Posted: 31 Aug 2008, 11:53:41 UTC - in response to Message 803603. Oh, and I forgot another thing. Since validated work units and their results hang around for I think 24 hours before deletion, a run of short duration workunits that are processed immediately will still take up more space due to the shear number of the work units that can get done in 24 hours. So even if everyone was running with virtually no queues, the servers will still clog up. Not quite correct. While there's a 24-hour delay so users can see the outcome of their results, this delay is 24 hours after the wu/result-files has been deleted from disk. The way through the system is basically: 2nd. result reported -> Transitioner -> Validator -> Assimilator -> file_deleter -> Wait 24 hours before wu/result-info purged from database. If there's no backlog, it normally takes only a couple seconds from 2nd. result is reported, till the wu-file and result-files has been deleted. Well, as long as passes validation that is. ;) "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." ID: 803651 ·

[B^S] madmac Volunteer tester Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0	Message 803783 - Posted: 31 Aug 2008, 19:44:19 UTC - in response to Message 803644. And now the Ready to Send buffer has dropped to 0 again. It has drooped to 0 again after the ready to send went up Back to zero again,its seems to be yo-yoing again. ID: 803783 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 803789 - Posted: 31 Aug 2008, 20:11:50 UTC . . . Database/file status Results ready to send 0 40m Current result creation rate 14.23/sec 0m Results out in the field 3,452,221 40m Results received in last hour 56,568 0m Result turnaround time (last hour average) 54.59 hours 0m Results returned and awaiting validation 2,675,848 40m Workunits waiting for validation 30 40m Workunits waiting for assimilation 331 40m Workunit files waiting for deletion 32 40m Result files waiting for deletion 88 40m Workunits waiting for db purging 592,515 40m Results waiting for db purging 1,250,952 40m Transitioner backlog (hours) 0 0m > 'ave plenty to crunch 'till somebody kicks the server again . . . BOINC Wiki . . . Science Status Page . . . ID: 803789 ·

Bernie Vine Volunteer moderator Volunteer tester Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328	Message 803843 - Posted: 31 Aug 2008, 23:35:19 UTC This looks less than promising!! Cricket ID: 803843 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 803844 - Posted: 31 Aug 2008, 23:46:28 UTC I swear I'm cursed. I'm on dial-up so I have to manually contact the server to return results. And more times than I can count the servers are down or go wonky just as I try to upload my results. "Life is just nature's way of keeping meat fresh." - The Doctor ID: 803844 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 803864 - Posted: 1 Sep 2008, 1:00:45 UTC It looks to be back. Cricket "Life is just nature's way of keeping meat fresh." - The Doctor ID: 803864 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.