Ghosts are just KILLING me...

Message boards : Number crunching : Ghosts are just KILLING me...
Message board moderation

To post messages, you must log in.

AuthorMessage
Nemesis

Send message
Joined: 14 Mar 07
Posts: 129
Credit: 31,295,655
RAC: 0
Canada
Message 1044399 - Posted: 28 Oct 2010, 14:45:07 UTC
Last modified: 28 Oct 2010, 14:46:22 UTC

Ghost WU's are just killing me these days. I have done a count on my 3 crunchers and the depressing numbers are just that...depressing..

All 3 of my rigs have been out of work for at least 3 days.

On my least capbable rig I have 20 WU's waiting to upload but according to the stats page I have 100 WU's "In Progress".

My slower quad has 199 WU's waiting to upload but again the stats show 310 WU's "In Progress".

The worst one of all is my main cruncher, it has 121 WU's waiting to upload but has in excess of 800! WU's ( I gave up counting ) listed as "In Progress" status.

The net result of all this is that when the project finally comes up, WU's are uploaded and reported and all the "Timed out - no response" WU's get tabulated and my download quota for new work gets hammered. Repeat the cycle for a few weeks and I might as well be doing the WU's by hand with a slide rule ( remember those? ) I'm lucky to get enough new work to last a day...

So, am I unhappy? Yup. Will I stick around? Probably for a few weeks, at least until the new servers are installed. After that I'll just wait and see...

This is the ONLY project I have run or have any interest in running...what's a guy supposed to do...
ID: 1044399 · Report as offensive
Zebra3
Avatar

Send message
Joined: 22 Oct 01
Posts: 186
Credit: 13,658,148
RAC: 0
Canada
Message 1044400 - Posted: 28 Oct 2010, 14:51:50 UTC

I have been out of work for a few days on my rigs as well so have been running another project until we get back up and going again. The nice thing is when we finally do get to report all of our Wu's it will be time to detach from the project and reattach to get rid of those ghosts we have acquired over the past months.
http://www.novascotia.com
ID: 1044400 · Report as offensive
Profile S@NL - Eesger - www.knoop.nl
Avatar

Send message
Joined: 7 Oct 01
Posts: 385
Credit: 50,200,038
RAC: 0
Netherlands
Message 1044434 - Posted: 28 Oct 2010, 17:23:52 UTC - in response to Message 1044400.  

Hmm, my main rig "only" has 35 wu's to report and about 150 in progress.. that detaching and re-ataching part.. is that realy a good idea? If so, I'll be all for it..
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS
ID: 1044434 · Report as offensive
Norwich Gadfly
Avatar

Send message
Joined: 29 Dec 08
Posts: 100
Credit: 488,414
RAC: 0
United Kingdom
Message 1044439 - Posted: 28 Oct 2010, 17:32:32 UTC - in response to Message 1044399.  

Ghost WU's are just killing me these days.


Well, Hallowe'en is nearly upon us...
ID: 1044439 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1044452 - Posted: 28 Oct 2010, 18:03:27 UTC - in response to Message 1044400.  

I have been out of work for a few days on my rigs as well so have been running another project until we get back up and going again. The nice thing is when we finally do get to report all of our Wu's it will be time to detach from the project and reattach to get rid of those ghosts we have acquired over the past months.


Detach/Reattach definitely works, but I've found that one has to repeat the process pretty much weekly when new work is available. I'm hopeful that with the new servers and cleaning up of things that ghosts will be less of a problem in the future. The idea posted elsewhere of allowing uploading and reporting for a few days now, then simply cancelling out all unreported work and starting fresh on the new equipment seems like a good way to go, IMHO.
ID: 1044452 · Report as offensive
Zebra3
Avatar

Send message
Joined: 22 Oct 01
Posts: 186
Credit: 13,658,148
RAC: 0
Canada
Message 1044459 - Posted: 28 Oct 2010, 18:14:54 UTC - in response to Message 1044434.  

YES...but only after you have reported and cleared your cache or you will get NO credit for the work that has been done.



Hmmm, my main rig "only" has 35 Wu's to report and about 150 in progress.. that detaching and re-attaching part.. is that really a good idea? If so, I'll be all for it..


http://www.novascotia.com
ID: 1044459 · Report as offensive
Nemesis

Send message
Joined: 14 Mar 07
Posts: 129
Credit: 31,295,655
RAC: 0
Canada
Message 1044841 - Posted: 29 Oct 2010, 20:24:24 UTC

Quick question regarding doing a "detach". Do I need to do it for each cruncher I have or just once for the project? Since the "tasks" stuff has been disabled I can't tell by just looking...
ID: 1044841 · Report as offensive
Zebra3
Avatar

Send message
Joined: 22 Oct 01
Posts: 186
Credit: 13,658,148
RAC: 0
Canada
Message 1044844 - Posted: 29 Oct 2010, 20:31:55 UTC - in response to Message 1044841.  

Each rig you will have to do it
http://www.novascotia.com
ID: 1044844 · Report as offensive
Profile S@NL - Eesger - www.knoop.nl
Avatar

Send message
Joined: 7 Oct 01
Posts: 385
Credit: 50,200,038
RAC: 0
Netherlands
Message 1045522 - Posted: 1 Nov 2010, 23:11:15 UTC - in response to Message 1044459.  
Last modified: 1 Nov 2010, 23:12:10 UTC

OK, I had done this, resulting (somehow) in corrupting my CPDN work, the relation is unclear, but I fixed it by resetting CPDN also (which didn't do the trick) and re-installing BOINC (which did do the trick). Only bummer now is that boinc is trying to download some png images for S@H (I have installed the optimized clients).

Ah well.. when it all starts to really work again, it'll start crunching again..

YES...but only after you have reported and cleared your cache or you will get NO credit for the work that has been done.
Hmmm, my main rig "only" has 35 Wu's to report and about 150 in progress.. that detaching and re-attaching part.. is that really a good idea? If so, I'll be all for it..


The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS
ID: 1045522 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 1045525 - Posted: 1 Nov 2010, 23:54:30 UTC

Of the 2 mil + results that are still at large, how many do you suppose are ghosts??

I bet a large percentage.


Dave

ID: 1045525 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 1046808 - Posted: 7 Nov 2010, 14:07:12 UTC - in response to Message 1045525.  

There are 1.8M results in the field. Why doesn't seti just re-issue those results, rather than waiting for them to time out? At this point, they must all be ghosts, I would think. Right?

I guess Seti did something like this a few days ago; so just do it again, completely and finally.

But, I see that the return rate is 10K/h. This is curious because it means that in about a week all the wu's would be returned. But this rate is steadily falling, so the time to clear will extend far beyond 1 week. Yet, who has stuff to return? Surely we are all empty after all this chaos.

Yet, the turnaround times are rising and are now up to 100h. What does that mean? Are these the wu's that were issued to very slow machines with large queues a few days ago? Or are ghosts included in this indicator.

Just curious, pondering Scarecrow's graphs.
ID: 1046808 · Report as offensive
J. Mileski
Volunteer tester
Avatar

Send message
Joined: 9 Jun 02
Posts: 632
Credit: 172,116,532
RAC: 572
United States
Message 1046823 - Posted: 7 Nov 2010, 14:33:26 UTC - in response to Message 1046808.  

I can say everything I got has been returned and reported. I can't tell if I have any ghosts for my 3 machines

ID: 1046823 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1046826 - Posted: 7 Nov 2010, 14:36:09 UTC - in response to Message 1046808.  

because they arent all ghosts. I have plenty running on my computers and I would be irked if they decided to pull the plug on them. They appear to have figured out which are ghost since we are getting a crapload of multiple reissues. This can only mean that either those WU's timed out of they cancelled them so that they can reissue them


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1046826 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1046828 - Posted: 7 Nov 2010, 14:50:43 UTC - in response to Message 1046826.  

Just started this one..11/7/2010 9:40:55 AM SETI@home Starting 31my10ab.4517.385551.13.10.33_2. I just reported one from the same series and have ten more to do. They are all the same except for the last set of numbers and they all end in _2. Think I might be clearing out someone's ghosts? :-)



PROUD MEMBER OF Team Starfire World BOINC
ID: 1046828 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1046830 - Posted: 7 Nov 2010, 14:54:28 UTC - in response to Message 1046808.  
Last modified: 7 Nov 2010, 14:58:36 UTC

There are 1.8M results in the field. Why doesn't seti just re-issue those results, rather than waiting for them to time out? At this point, they must all be ghosts, I would think. Right?

Wrong, many are recent re-issues of ghosts that timed out. Even with recent issues, the S@H Staff have no way to tell which are ghosts and which are being actively crunched.

But, I see that the return rate is 10K/h. This is curious because it means that in about a week all the wu's would be returned. But this rate is steadily falling, so the time to clear will extend far beyond 1 week. Yet, who has stuff to return? Surely we are all empty after all this chaos.

Wrong again. Because the ghosts have to time out to be identified and re-issued, there will be batches of re-issues every few days. And there are LOTS of crunchers who are running set-and-forget on machines that are much older and slower than yours. They still have work, and it is steadily trickling in.

Yet, the turnaround times are rising and are now up to 100h. What does that mean? Are these the wu's that were issued to very slow machines with large queues a few days ago?

Some are, some are not. And an older, slower machine would not necessarily have a large queue, because it is slower and not running S@H 24/7 the way some newer, dedicated crunchers do.

Or are ghosts included in this indicator.

I would think they are not. How can tasks which time-out when NOT returned by deadline have a turn-around time?
Donald
Infernal Optimist / Submariner, retired
ID: 1046830 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1046867 - Posted: 7 Nov 2010, 17:02:00 UTC - in response to Message 1046808.  

Yet, who has stuff to return? Surely we are all empty after all this chaos.


Update: I've had 4 Tasks on my two G4s since the download servers came back up. All -2 or higher, so they are either timed-out ghosts or otherwise abandoned. And as I complete one, I get another.

Work is available. Maybe not enough to fill the queue of a state-of-the-art mega-cruncher, but it's there if you want it.

As always around here, patience is not just a virtue, it is a requirement.

Donald
Infernal Optimist / Submariner, retired
ID: 1046867 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1046882 - Posted: 7 Nov 2010, 17:36:22 UTC

Last Thursday, prior to the project enabling delivery of work from the "ready to send", there were about 1150 results being reported per hour and the turnaround time was slightly over 600 hours.

Consider the range of hosts which may take 25 days to finish a WU:

1. Very slow, running 24/7.
2. Fairly slow, has 10 day cache and 15 day crunch times.
3. Modest, but running only a few hours a day.
4. Quick, but attached to multiple projects many of which have shorter deadlines than S@H.

We can expect there will always be hosts getting tasks done just within deadline for a variety of reasons. Given that VLAR and most midrange work has deadlines 1000 hours or more after the "sent" time, that 600 hour turnaround is not at all unreasonable after an extended period of no work delivery.

I do agree that a large fraction of the work which the servers believe to be "in progress" is probably ghosted. Some of the ghosts created since last Thursday may even have deadlines in January 2011. It's not simple for the servers to judge what's ghosted or not, though perhaps a script which checked how long a task has supposedly been on a host against the host's average turnaround might give a usable guesstimate. The most reliable approach is to turn on the full resend_lost_results function at least in brief bursts if jocelyn will stand the load. Maybe 5 minutes out of each half hour, my seat of the pants guess.
                                                                  Joe
ID: 1046882 · Report as offensive

Message boards : Number crunching : Ghosts are just KILLING me...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.