Message boards :
Number crunching :
Lost "Ghost" task recovery protocol
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Are you by chance running a client made from the latest code branch with all the new and improved work_fetch code? When I attempted to recover the 27 ghosts I see I have, the recovery protocol did not work. I backleveled to a client made from an earlier branch without all the new changes and I just recovered my first 20 lost tasks as resends. Now just waiting for NNT to make more room for the last 7 tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Well 7.15.0 came from Juan and the GPUUG team. Nowhere else. But he has provided links to the base 7.15.0 builds from two different eras. One before all the new code for my bug fix went into the master and one after all the new code went in. Both would be identified as 7.15.0. One way to tell them apart is the older one is a dynamically linked application/X-sharedlib executable and the newer one is a statically linked application/X-executable executable. The icons for the two different clients are different. The icon for the shared lib executable is a letter icon. The one for the static executable has the standard Linux diamond /w gear icon that all normal Linux executables have. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
"7.15.0" is the current development identifier in the GitHub master branch. EVERY non-release version built by anybody (and I've built many different versions myself, for different tests) since about October 2018 will self identify as v7.15.0: they will all be different. You won't be able to deduce anything about the specific capabilities or attributes of any particular v7.15.0 without guidance from the individual developer who built it. |
j mercer Send message Joined: 3 Jun 99 Posts: 2422 Credit: 12,323,733 RAC: 1 |
Sorry if off topic and clueless. Not trying to highjack. What happened to Ghost Detector v1.05? This program Rocked. https://setiweb.ssl.berkeley.edu/forum_thread.php?id=61519 ... |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
What happened to Ghost Detector v1.05? This program Rocked. Interesting... never heard of it before. There's a caveat in there that it "scrapes" info from the science database so may cause one's IP to be blacklisted. However it also only seems to detect ghosts but not assist in recovering them. I find the number I have by simply finding how many local work units the machine has -- just range selecting the work unit files in /projects/setiathome.berkeley.edu will do that -- and the "In progress" count in "All tasks for computer". The difference between those two numbers is the number of ghosts; simply the difference between how many work units the scheduler records a computer having, and how many it actually has. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I think BoincTasks gives you a count of all tasks cached to compare with the server's version. I use an older but very similar aggregator called 'BoincView' which shows column totals in the page footer: that's enough for me. But I very rarely have any ghosts to exorcise these days. I'd actually like to see a column footer bar in the native BOINC Manager that could help like that, but I'm not going to ask for it until the team is more robust in programming terms. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
I've been able to recover hundreds so far today following the process, with two suggested addenda: 1) It is essential to wait until the "Suspending network activity - user request" message appears before exiting the BOINC manager. 2) The process to watch in System Monitor > Processes is simply "boinc". When it has disappeared, it's safe to restart the client/manager. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
OK... some excellent news for a change: I contacted Dr. Korpela about the low resend limit of 20 tasks per request. It was set this low due to issues they were having with CGI (possibly due to using FastCGI on the BOINC servers) however these issues were mostly resolved. So.... it's been increased to 40 resends per request. Half the time and effort required for ghostbusting now! |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK... some excellent news for a change: I contacted Dr. Korpela about the low resend limit of 20 tasks per request. It was set this low due to issues they were having with CGI (possibly due to using FastCGI on the BOINC servers) however these issues were mostly resolved. So.... it's been increased to 40 resends per request. Half the time and effort required for ghostbusting now! Wow, great news. I didn't think that would ever get changed to something more useful. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
One minor correction. It would be: | SETI@home | Sending scheduler request: To report completed tasks. rather than: | SETI@home | Sending scheduler request: To fetch work. in the event log, as NNT was already set. Do appreciate the concise process listing, as it prompted me to tackle and resolve a ton of ghosts I hadn't realized I had ... Too bad BOINC and/or SETI can't solve it so this doesn't happen, perhaps via a cc_config.xml diagnostic flag or the like. Wouldn't take more than comparing notes in a scheduler session as to # of tasks in progress between both ends and triggering a resend if there's a mismatch. No excuse for it being possible for this to occur. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
...It use to be an automatic process to get back ghosts, but in the end the load on the servers with this function turned on would bring them to a screaming halt so it was disabled in the end. ;-) Cheers. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
...It use to be an automatic process to get back ghosts, but in the end the load on the servers with this function turned on would bring them to a screaming halt so it was disabled in the end. ;-). If it was that much of a load, the check function must not have been implemented very efficiently. Sounds like a rewrite was in order, rather than nuking the process. I guess the question is, do the ghost tasks put more load on the infrastructure sitting there for a month or more than resolving them does? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Change title per Mr. Kevvy request. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK will do. Thank for the update. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
@Keith, not sure if you saw this : One minor correction. It would be: |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
@Keith, not sure if you saw this : Thanks. No one else noticed that. Corrected. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Greetings All Excellent work Keith and its all spelt out so an laymen could do it lol. One of my PC's lost its internet connection for some reason, I was out with family, came home and I had 100 ghosts on my account. Read the instructions (2 times) and then followed the steps and successfully recovered my ghosts. The only reason I think is that, the PC lost its internet connection at the same time it was communicating with SETI, who knows. Regards |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
I hope that this doesn't force a rewrite again (sorry, Keith) but after many times retrying I have concluded that it is impossible for me to force resends in the mornings here (Eastern timezone); instead I always get new work. I didn't notice that the time of day was the issue until now as I'm confident enough with causing it otherwise that I'm not the issue. It could be that the scheduler is very fast with a low load, or that there is some restriction, but I am following the same process with the same timings that has worked a hundred or more times in the evenings, and I can't get it to resend once. ¯\_(ツ)_/¯ May be something to keep in mind if anyone else has issues. Edit: I did also make sure to run the cache down to >80 WU lower than full. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.