Message boards :
Number crunching :
Ghosts are just KILLING me...
Message board moderation
Author | Message |
---|---|
Nemesis Send message Joined: 14 Mar 07 Posts: 129 Credit: 31,295,655 RAC: 0 |
Ghost WU's are just killing me these days. I have done a count on my 3 crunchers and the depressing numbers are just that...depressing.. All 3 of my rigs have been out of work for at least 3 days. On my least capbable rig I have 20 WU's waiting to upload but according to the stats page I have 100 WU's "In Progress". My slower quad has 199 WU's waiting to upload but again the stats show 310 WU's "In Progress". The worst one of all is my main cruncher, it has 121 WU's waiting to upload but has in excess of 800! WU's ( I gave up counting ) listed as "In Progress" status. The net result of all this is that when the project finally comes up, WU's are uploaded and reported and all the "Timed out - no response" WU's get tabulated and my download quota for new work gets hammered. Repeat the cycle for a few weeks and I might as well be doing the WU's by hand with a slide rule ( remember those? ) I'm lucky to get enough new work to last a day... So, am I unhappy? Yup. Will I stick around? Probably for a few weeks, at least until the new servers are installed. After that I'll just wait and see... This is the ONLY project I have run or have any interest in running...what's a guy supposed to do... |
Zebra3 Send message Joined: 22 Oct 01 Posts: 186 Credit: 13,658,148 RAC: 0 |
I have been out of work for a few days on my rigs as well so have been running another project until we get back up and going again. The nice thing is when we finally do get to report all of our Wu's it will be time to detach from the project and reattach to get rid of those ghosts we have acquired over the past months. http://www.novascotia.com |
S@NL - Eesger - www.knoop.nl Send message Joined: 7 Oct 01 Posts: 385 Credit: 50,200,038 RAC: 0 |
Hmm, my main rig "only" has 35 wu's to report and about 150 in progress.. that detaching and re-ataching part.. is that realy a good idea? If so, I'll be all for it.. The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS |
Norwich Gadfly Send message Joined: 29 Dec 08 Posts: 100 Credit: 488,414 RAC: 0 |
Ghost WU's are just killing me these days. Well, Hallowe'en is nearly upon us... |
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
I have been out of work for a few days on my rigs as well so have been running another project until we get back up and going again. The nice thing is when we finally do get to report all of our Wu's it will be time to detach from the project and reattach to get rid of those ghosts we have acquired over the past months. Detach/Reattach definitely works, but I've found that one has to repeat the process pretty much weekly when new work is available. I'm hopeful that with the new servers and cleaning up of things that ghosts will be less of a problem in the future. The idea posted elsewhere of allowing uploading and reporting for a few days now, then simply cancelling out all unreported work and starting fresh on the new equipment seems like a good way to go, IMHO. |
Zebra3 Send message Joined: 22 Oct 01 Posts: 186 Credit: 13,658,148 RAC: 0 |
YES...but only after you have reported and cleared your cache or you will get NO credit for the work that has been done. Hmmm, my main rig "only" has 35 Wu's to report and about 150 in progress.. that detaching and re-attaching part.. is that really a good idea? If so, I'll be all for it.. http://www.novascotia.com |
Nemesis Send message Joined: 14 Mar 07 Posts: 129 Credit: 31,295,655 RAC: 0 |
Quick question regarding doing a "detach". Do I need to do it for each cruncher I have or just once for the project? Since the "tasks" stuff has been disabled I can't tell by just looking... |
Zebra3 Send message Joined: 22 Oct 01 Posts: 186 Credit: 13,658,148 RAC: 0 |
Each rig you will have to do it http://www.novascotia.com |
S@NL - Eesger - www.knoop.nl Send message Joined: 7 Oct 01 Posts: 385 Credit: 50,200,038 RAC: 0 |
OK, I had done this, resulting (somehow) in corrupting my CPDN work, the relation is unclear, but I fixed it by resetting CPDN also (which didn't do the trick) and re-installing BOINC (which did do the trick). Only bummer now is that boinc is trying to download some png images for S@H (I have installed the optimized clients). Ah well.. when it all starts to really work again, it'll start crunching again.. YES...but only after you have reported and cleared your cache or you will get NO credit for the work that has been done. The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Of the 2 mil + results that are still at large, how many do you suppose are ghosts?? I bet a large percentage. Dave |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
There are 1.8M results in the field. Why doesn't seti just re-issue those results, rather than waiting for them to time out? At this point, they must all be ghosts, I would think. Right? I guess Seti did something like this a few days ago; so just do it again, completely and finally. But, I see that the return rate is 10K/h. This is curious because it means that in about a week all the wu's would be returned. But this rate is steadily falling, so the time to clear will extend far beyond 1 week. Yet, who has stuff to return? Surely we are all empty after all this chaos. Yet, the turnaround times are rising and are now up to 100h. What does that mean? Are these the wu's that were issued to very slow machines with large queues a few days ago? Or are ghosts included in this indicator. Just curious, pondering Scarecrow's graphs. |
J. Mileski Send message Joined: 9 Jun 02 Posts: 632 Credit: 172,116,532 RAC: 572 |
|
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
because they arent all ghosts. I have plenty running on my computers and I would be irked if they decided to pull the plug on them. They appear to have figured out which are ghost since we are getting a crapload of multiple reissues. This can only mean that either those WU's timed out of they cancelled them so that they can reissue them In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Just started this one..11/7/2010 9:40:55 AM SETI@home Starting 31my10ab.4517.385551.13.10.33_2. I just reported one from the same series and have ten more to do. They are all the same except for the last set of numbers and they all end in _2. Think I might be clearing out someone's ghosts? :-) PROUD MEMBER OF Team Starfire World BOINC |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
There are 1.8M results in the field. Why doesn't seti just re-issue those results, rather than waiting for them to time out? At this point, they must all be ghosts, I would think. Right? Wrong, many are recent re-issues of ghosts that timed out. Even with recent issues, the S@H Staff have no way to tell which are ghosts and which are being actively crunched. But, I see that the return rate is 10K/h. This is curious because it means that in about a week all the wu's would be returned. But this rate is steadily falling, so the time to clear will extend far beyond 1 week. Yet, who has stuff to return? Surely we are all empty after all this chaos. Wrong again. Because the ghosts have to time out to be identified and re-issued, there will be batches of re-issues every few days. And there are LOTS of crunchers who are running set-and-forget on machines that are much older and slower than yours. They still have work, and it is steadily trickling in. Yet, the turnaround times are rising and are now up to 100h. What does that mean? Are these the wu's that were issued to very slow machines with large queues a few days ago? Some are, some are not. And an older, slower machine would not necessarily have a large queue, because it is slower and not running S@H 24/7 the way some newer, dedicated crunchers do. Or are ghosts included in this indicator. I would think they are not. How can tasks which time-out when NOT returned by deadline have a turn-around time? Donald Infernal Optimist / Submariner, retired |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
Yet, who has stuff to return? Surely we are all empty after all this chaos. Update: I've had 4 Tasks on my two G4s since the download servers came back up. All -2 or higher, so they are either timed-out ghosts or otherwise abandoned. And as I complete one, I get another. Work is available. Maybe not enough to fill the queue of a state-of-the-art mega-cruncher, but it's there if you want it. As always around here, patience is not just a virtue, it is a requirement. Donald Infernal Optimist / Submariner, retired |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Last Thursday, prior to the project enabling delivery of work from the "ready to send", there were about 1150 results being reported per hour and the turnaround time was slightly over 600 hours. Consider the range of hosts which may take 25 days to finish a WU: 1. Very slow, running 24/7. 2. Fairly slow, has 10 day cache and 15 day crunch times. 3. Modest, but running only a few hours a day. 4. Quick, but attached to multiple projects many of which have shorter deadlines than S@H. We can expect there will always be hosts getting tasks done just within deadline for a variety of reasons. Given that VLAR and most midrange work has deadlines 1000 hours or more after the "sent" time, that 600 hour turnaround is not at all unreasonable after an extended period of no work delivery. I do agree that a large fraction of the work which the servers believe to be "in progress" is probably ghosted. Some of the ghosts created since last Thursday may even have deadlines in January 2011. It's not simple for the servers to judge what's ghosted or not, though perhaps a script which checked how long a task has supposedly been on a host against the host's average turnaround might give a usable guesstimate. The most reliable approach is to turn on the full resend_lost_results function at least in brief bursts if jocelyn will stand the load. Maybe 5 minutes out of each half hour, my seat of the pants guess. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.