Message boards :
Number crunching :
Transitioner backlog?
Message board moderation
Author | Message |
---|---|
B-Roy Send message Joined: 4 May 03 Posts: 220 Credit: 260,955 RAC: 1 |
Ready to send 77,215 In progress 1,765,954 Waiting for validation 0 Transitioner backlog 22 hours What is the last line all about, as I haven't seen it in the past? |
Angus Send message Joined: 26 May 99 Posts: 459 Credit: 91,013 RAC: 0 |
Ready to send 77,215 It's a new metric they put up when the servers came back on-line after the OS update. (What was really updated - a server OS or a storage system OS?) It was backlogged 10 or 11 hours when the servers came up, and it's been steadily growing ever since. Which is an interesting thing... If the connection to the servers is so iffy right now, why can't the transitioner recover? It would seem to have less load if hosts can't connect to upload and download. |
Timcom99 Send message Joined: 30 Sep 04 Posts: 105 Credit: 8,927,290 RAC: 0 |
Someone has to be Downloading Tons of Work Units too. The 3 Splitters have been working Full Tilt and the Work Units are going out a soon as they get Split. |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
That could be, but I doubt it. My guess is that these numbers are way, way off of what's going on in reality. (Heck, it's shown a sold green light for the scheduler when we all know that it's been up and down like crazy for the last few days.) When the scheduler situation settles down the status page might start returning some usefull information. Don't hold your breath waiting though! ;) |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
Psssst, hey buddy.......wanna buy a WU? |
ksnash Send message Joined: 28 Nov 99 Posts: 402 Credit: 528,725 RAC: 0 |
That could be, but I doubt it. My guess is that these numbers are way, way off of what's going on in reality. (Heck, it's shown a sold green light for the scheduler when we all know that it's been up and down like crazy for the last few days.) When the scheduler situation settles down the status page might start returning some usefull information. Don't hold your breath waiting though! ;) They can't figure out how long it takes to complete a work unit. How are they supposed to figure out if a computer is working? |
Archon Send message Joined: 31 Aug 01 Posts: 90 Credit: 400,599 RAC: 0 |
Psssst, hey buddy.......wanna buy a WU? Yeah lets start selling WU's on ebay, lol that will raise some cash for those berkeley guys :) Cheers Gav Nothing is 'fool-proof', someone will always invent a better fool! |
Speedy67 & Friends Send message Joined: 14 Jul 99 Posts: 335 Credit: 1,178,138 RAC: 0 |
Someone has to be Downloading Tons of Work Units too. The 3 Splitters have been working Full Tilt and the Work Units are going out a soon as they get Split. The transitioner has to take the final step on the workunits before they get sent out. Because it's backlogged 26 hours at the moment, there are lots of wu's already split, but waiting to be transitioned. Greetings, Speedy67 |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
If my understanding is correct, the transitioner is also checking on when results are returned and when a quorum is achieved. Does this mean that the transitioner backlog figure is the rough estimate of how long it will take a quorum of results to be "seen" by the validator? It would appear that there are lot of situations where the transitioner comes into play: incomming, outgoing and WU end-of-life. It's tough to figure out what this backlog figure applies to. All of them? I guess this bugger is also the culprit for these six month to a year old results that are lingering in the system. It seems that the backlog is underestimated by quite a bit for some people! ;) |
Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0 |
Someone has to be Downloading Tons of Work Units too. The 3 Splitters have been working Full Tilt and the Work Units are going out a soon as they get Split. at a guess this, from the description of what the transition does: Handles state transitions of workunits and results. Basically, the transitioners keep track of the many, many results in progress and makes sure they properly move down the pipeline. It is always asking the questions: Is this workunit ready to send out? (snip).... may explain where the WU's are going, they are going into limbo, waiting for the transitioned to decide they are ready to be sent? Foamy is "Lord and Master". (Oh, + some Classic WUs too.) |
Silver Streak Send message Joined: 14 May 03 Posts: 4 Credit: 826,866 RAC: 0 |
Someone has to be Downloading Tons of Work Units too. The 3 Splitters have been working Full Tilt and the Work Units are going out a soon as they get Split. They may not be making it all the way to the machines to crunch them though, I just got 40 W/U's but they are all "ghosts". None of them made it to my 'puter. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Someone has to be Downloading Tons of Work Units too. The 3 Splitters have been working Full Tilt and the Work Units are going out a soon as they get Split. Are you running 4.45? BOINC WIKI |
Silver Streak Send message Joined: 14 May 03 Posts: 4 Credit: 826,866 RAC: 0 |
Someone has to be Downloading Tons of Work Units too. The 3 Splitters have been working Full Tilt and the Work Units are going out a soon as they get Split. No, I went back to 4.27 |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Someone has to be Downloading Tons of Work Units too. The 3 Splitters have been working Full Tilt and the Work Units are going out a soon as they get Split. That would be the cause of the ghost WUs. This is believed to have been fixed in 4.45. BOINC WIKI |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
The Transitioners have many jobs: 1; Generate 4 results for newly-split wu. 2; Generate more results if one or more reported as client-error, or failed validation due to missing files, or due to "no consensus yet", or due to past deadline if wu not already validated. 3; Setting "need_validate"-flag when atleast 3 "success"-results reported, and when more "success"-results is reported for already validated wu or if already in validator-queue. 4; Trigger Assimilator if wu errored-out. 5; Trigger file_deleter when all results "done", and wu Assimilated. With Transitioners being 31 hours backlogged, it means only now is wu split 31 hours ago being made ready to be sent out to users. Also, it will take over 31 hours from a wu have got 3 "success"-results before it's tried validated. Validator only looks on wu with "need_validate"-flag set, since Transitioners is 31 hours backlogged the validators have currently no problems keeping up, and the validator-queue is therefore very short even many wu have enough "success"-results. Oh, and "Waiting for validation" is a count of how many wu or results with "need_validate"-flag set, it's not a count of how many "pending" results there is. ;) For successfully validatet wu or if got 6 "success"-results but still fails due to "no consensus yet", it's the Validator that triggers Assimilator. Not sure, but one possibility is the status-page displays how many hours since last wu split that have zero "results" generated. This means if all servers is shut off 24 hours the queue will also increase 24 hours. Since the Transitioners-queue is increasing, this means the Splitters is currently generating wu faster than Transitioners can keep up, and till there's 500k results ready to send out the queue will continue to increase. |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
Thanks for the insight into this, Ingleside! |
Darth Dogbytes™ Send message Joined: 30 Jul 03 Posts: 7512 Credit: 2,021,148 RAC: 0 |
[b]Now it would be great if we could just connect to the server! Account frozen... |
Celtic Wolf Send message Joined: 3 Apr 99 Posts: 3278 Credit: 595,676 RAC: 0 |
[b]Now it would be great if we could just connect to the server! You sure ask alot!!! I'd rather speak my mind because it hurts too much to bite my tongue. American Spirit BBQ Proudly Serving those that courageously defend freedom. |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
Anyone know how the transitioner delay affects deadlines? In other words, does a WU get recorded as returned when it's transmitted and updated, or only when the transitioner gets round to it? I hope it's the first, as with the backlog at 31 hours and growing around 1 hour per 2 real hours, it would not be long before work gets dumped? A second issue comes to mind, is there enough storage for this backlog? |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
Anyone know how the transitioner delay affects deadlines? The report-time is recorded in the database. But, the only reason for not getting any credit for too-late results is due to "canonical result" have been deleted from upload-directory by file_deleter. If wu not already deleted, if there's any results reported when Transitioner catches-up the "need_validate"-flag is set regardless of these being before or after deadline. Also, the file_deleter isn't triggered before all results is tried validated. As for size on upload/download-disks, don't know, but if they plans to handle 1M result/day without adding more disk-capasity it shouldn't be a problem... Anyway, they've now moved 2 of the transitioners off klaatu, the web-server, and over to kosh. This seems to have sped-up things a little, and they're now only 29 hours backlogged. When again, the speed-up can also be due to scheduling-server being down at the time so no work reported, and therefore the Validators sharing servers with transitioners doesn't use any cpu-power... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.