Message boards :
Number crunching :
Panic Mode On (8) Server problems
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Seems to be a problem Yep. Ready to Send buffer is down to zero, and although the Server Status page says the Splitters are running, they're not producing any work. Grant Darwin NT |
Bert Send message Joined: 12 Oct 06 Posts: 84 Credit: 813,295 RAC: 0 |
Seems to be a problem Friday just before a long weekend. We should be getting used to it. I upped my queue to 6 extra days. Should keep me going if we gotta wait until Wednesday. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Oooopssss.. it's look like no new work until Tuesday.. Technical News : Sort-of Weekend Wrapup (Aug 28 2008) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Looks like someone gave it a kick- the splitters are splitting again & have been doing so for a while now. The Ready to Send buffer is slowly growing again. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
And now the Ready to Send buffer has dropped to 0 again. Grant Darwin NT |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
We just hit another patch of short duration workunits that fast optimized systems can crunch in 7-10 minutes each. Due to the short duration they also tend to go to the head of the queue so it doesn't matter how many 10s of 100s of workunits the fast systems already have queued. The splitters simply can't keep up with the demand. Server status shows the creation rate barley keeping ahead of the return rate at the 31 Aug 0Z snapshot. "Life is just nature's way of keeping meat fresh." - The Doctor |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
The splitters simply can't keep up with the demand. Which indicates a problem somewhere in the system. The splitters can (and have) churned out 35+ Work Units per second for sustained periods. But for some reason they were stuck at around 10 or so for several hours & have only been doing 15 or so for the last hour & a bit. Usually once the RtS buffer drops by a few 1,000 they'll crank up the pace to 20 or more & maintain the buffer. That hasn't happened today. Grant Darwin NT |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
Which brings up the strange conundrum with having work unit queues. On one hand, the client seeing a work unit with a short deadline would prioritize it if you are running with a queue of even modest length. This adds to your turnaround time for those work units already in your queue. This results in the work unit information and any already returned result taking up space longer on the servers. The lack of space on the server caps new work unit creation. However if you have a small queue, this will most likely mean you will run out of work. Which in turn encourages members to increase their queue size to avoid running out of work. Which leads to more space being used up on the servers. Which caps the creation rate of new work units. Rinse and repeat. Oh, and I forgot another thing. Since validated work units and their results hang around for I think 24 hours before deletion, a run of short duration workunits that are processed immediately will still take up more space due to the shear number of the work units that can get done in 24 hours. So even if everyone was running with virtually no queues, the servers will still clog up. It's thermodynamics class all over again. You can't win, you can't break even, you can't get out of the game. "Life is just nature's way of keeping meat fresh." - The Doctor |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
I must be a bit slow here, but how exactly how does Boinc decide which order to do work units. On my quad I keep a 1 day queue, but currently work units with a date 6 or 7th of September are being left in favour of 16th 17th 22nd and 23rd of September. Admittedly the earlier ones are 20mins and the later ones about an hour, however if I was to go on holiday today till the 8th of September (not a long time) then 54 units would have to be re-sent. Doesn't seem an efficient way of doing things. I only run S@H by the way. Bernie |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I must be a bit slow here, but how exactly how does Boinc decide which order to do work units. On my quad I keep a 1 day queue, but currently work units with a date 6 or 7th of September are being left in favour of 16th 17th 22nd and 23rd of September. Admittedly the earlier ones are 20mins and the later ones about an hour, however if I was to go on holiday today till the 8th of September (not a long time) then 54 units would have to be re-sent. Doesn't seem an efficient way of doing things. In the normal course of events, BOINC does the work in the order it was issued by Berkeley. You've identified yourself as being in the UK, so you must be familiar with the concept of a queue - first one to the bus-stop is the first to get on the bus, that sort of thing? That's how BOINC works. Except - if tail-end Charlie is in danger of missing an important appointment (a deadline), she's allowed to jump the queue and get on the bus first. And how much danger she's in depends on how long the queue is. If the queue is one day, and the deadline is seven days, then there's no risk of missing deadline and no queue-jumping is allowed. If you actually tell BOINC that you're going to go away on holiday.... well, you can't exactly, but you can say "I'm not going to connect to the internet again for another 10 days", then BOINC will rush through the tasks which have a shorter deadline than that. It'll also assume that you're not going to take your computer on holiday with you, and try to stock up on work it can do without contacting the internet for that length of time. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I must be a bit slow here, but how exactly how does Boinc decide which order to do work units. It does then in the order in which they are downloaded. If there is a chance of missing a deadline (due to a early deadline or due to your connection settings, amount of time Seti gets to run while the computer is on, or the number of hours the computer is actually on) then it will do those Work Units first, then go back to processing them in the order in which they were downloaded. Grant Darwin NT |
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
I realise I can tell Boinc I am "going away for a few days" but I actively participate, I have 9 machines in various places and I keep tabs on what they are all doing. However, if my Quad belonged to "Joe Public" who just happened to think S@H was a good idea, but took no active part. If today a midday UTC they shutdown their PC till the 8th of September. 54 work units would default and have to be re-issued, still doesn't seem the most efficient way to use resources. Most of my pending credit is exactly this, units that have not been returned by deadline and have to be re-issued. I am not normally a "numbers hound" however I just noticed I am nearly at the 500,000 milestone and you start to notice things. Bernie PS. Oh dear: |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
Oh, and I forgot another thing. Since validated work units and their results hang around for I think 24 hours before deletion, a run of short duration workunits that are processed immediately will still take up more space due to the shear number of the work units that can get done in 24 hours. So even if everyone was running with virtually no queues, the servers will still clog up. Not quite correct. While there's a 24-hour delay so users can see the outcome of their results, this delay is 24 hours after the wu/result-files has been deleted from disk. The way through the system is basically: 2nd. result reported -> Transitioner -> Validator -> Assimilator -> file_deleter -> Wait 24 hours before wu/result-info purged from database. If there's no backlog, it normally takes only a couple seconds from 2nd. result is reported, till the wu-file and result-files has been deleted. Well, as long as passes validation that is. ;) "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Database/file status Results ready to send 0 40m Current result creation rate 14.23/sec 0m Results out in the field 3,452,221 40m Results received in last hour 56,568 0m Result turnaround time (last hour average) 54.59 hours 0m Results returned and awaiting validation 2,675,848 40m Workunits waiting for validation 30 40m Workunits waiting for assimilation 331 40m Workunit files waiting for deletion 32 40m Result files waiting for deletion 88 40m Workunits waiting for db purging 592,515 40m Results waiting for db purging 1,250,952 40m Transitioner backlog (hours) 0 0m > 'ave plenty to crunch 'till somebody kicks the server again . . . BOINC Wiki . . . Science Status Page . . . |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
|
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
I swear I'm cursed. I'm on dial-up so I have to manually contact the server to return results. And more times than I can count the servers are down or go wonky just as I try to upload my results. "Life is just nature's way of keeping meat fresh." - The Doctor |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
|
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.