Message boards :
Number crunching :
Orphan WUs and result
Message board moderation
| Author | Message |
|---|---|
Carolina Calling Send message Joined: 21 May 99 Posts: 9 Credit: 2,148,935 RAC: 0
|
My Intel Linux system locked up due to thermal overload and I restarted it. It seems there are lost files in the journalling filesystem where BOINC is located. I now find an orphaned work unit and result as well as a solo WU. BOINC doesn't show any of them. Is there any way to "reconnect" them such that they will be reported and processed respectively? Should I just delete them as lost causes? |
|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0
|
Hmmm.... From looking at your host summaries I'm going to guess this is the current Host ID (HID) you're running with on that machine? If so, then there isn't any really easy way to recover the 2 orphaned tasks showing on the other 2 HID's for that machine. <edit> I took a look at the 2 apparent orphans you're showing, and while it is possible in theory to recover from this, it's probably not worth the time, effort, and risk doing it for just ~70 credits. ;-) Alinator |
|
PaperDragon Send message Joined: 27 Aug 99 Posts: 170 Credit: 8,903,782 RAC: 4
|
Chedking your computer list, it looks like you may have the same computer listed multiple times. It is possible after the reboot BOINC had a file error which caused your computer to be assigned a new number. You can try a computer merge and see if that solves you problem. If that does not work, you are likely out of luck. SL |
|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0
|
Chedking your computer list, it looks like you may have the same computer listed multiple times. It is possible after the reboot BOINC had a file error which caused your computer to be assigned a new number. I was thinking about that, but won't the back end invalidate the older HID's tasks as 'Client Detached' in that case? Alinator |
|
PaperDragon Send message Joined: 27 Aug 99 Posts: 170 Credit: 8,903,782 RAC: 4
|
I was considering that, too. But looking at the results from the two older listed machines, they both have one result which is still waiting. So I figured it is worth an attempt at merging, since the results have not yet been marked non-returnable. SL |
|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0
|
I was considering that, too. But looking at the results from the two older listed machines, they both have one result which is still waiting. So I figured it is worth an attempt at merging, since the results have not yet been marked non-returnable. Agreed.... Nothing ventured, nothing gained, and it would be a virtually zero risk proposition. Plus the upside would be if they don't get resent, they at least would get invalidated and reissued without having to wait for the deadline to expire. Alinator |
Carolina Calling Send message Joined: 21 May 99 Posts: 9 Credit: 2,148,935 RAC: 0
|
I was considering that, too. But looking at the results from the two older listed machines, they both have one result which is still waiting. So I figured it is worth an attempt at merging, since the results have not yet been marked non-returnable. OK, I'll bite. How does one do a "merge". I looked in "my computers" and there's nothing obvious. I take it I can't go back to the old ID? Also, will this solve the issue that these work units are in projects/seti... but do not show up in BOINC? |
eaglescouter Send message Joined: 28 Dec 02 Posts: 162 Credit: 42,012,553 RAC: 0
|
I was considering that, too. But looking at the results from the two older listed machines, they both have one result which is still waiting. So I figured it is worth an attempt at merging, since the results have not yet been marked non-returnable. \\ In the my computers list, select one of the computers to be merged from the link in the left colunmn. On the next page, bottom of page "merge this computer". On the next page select all of the computers that should be merged with your first selection. It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :) |
Carolina Calling Send message Joined: 21 May 99 Posts: 9 Credit: 2,148,935 RAC: 0
|
Found merge by name. Done. However, the totals are now totally bogus... but what the hey. I've been doing this since practically day one because I think this is a "good idea". The merge actually solved a different problem than what I asked about (and one I hadn't realized I'd had). So, how does one fix the WU/Result and solo WU sitting by their lonesome in the SETI projects directory? I would imagine there's a file missing that's supposed to point to them to let BOINC know they're there (or what?). I get the distinct feeling fixing this will involve generating files with signatures using keys I don't have ... and I'm SOL. Thanks!! We fixed one problem at least! :-<) |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0
|
The next step to try would have BOINC Issue a "Reset Seti" then check to see if the WU's are reissued. If they are not then the "BOINC Detach" the reattach. The would tell the Seti Servers that the workunits are lost and reissue. Reset should get all the WU's inline. If the machine does other projects you need to reset those also. Please consider a Donation to the Seti Project. |
Carolina Calling Send message Joined: 21 May 99 Posts: 9 Credit: 2,148,935 RAC: 0
|
The next step to try would [be to] have BOINC Issue a "Reset Seti" then check to see if the WU's are reissued. If they are not then the "BOINC Detach" the[n] reattach. The would tell the Seti Servers that the workunits are lost and reissue. Oh, boy. While a reset was, in theory, a good idea, it turns out that the HTTP server boinc2 has a REALLY hard time delivering the files. Typically, it takes over five minutes to START a transfer and then resets connections well before completion. The GPL license file has yet even to start transfer and the 5.27 executable aborts before five percent has transferred I really have to wonder how anyone ever attaches to SETI@home in the first place. This has been going on since yesterday. The thought of detaching and reattaching seriously give me pause ... |
|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0
|
Yes, at times like this when the project is saturated, resets and detach/attach cycles are not very high on my list of troubleshooting techniques. At this point there are a few things you can do to work around the problem: 1.) If you have a backup copy of the stock science app (5.27) you could shut down BOINC, copy it into the SAH project folder, and then change its status from 0 (zero) to 1 in the file info section of the client_state file. After you restart BOINC just abort the pending DL for it (if necessary). 2.) Install one of the optimized apps from the Coop. When you restart after installing it, you can abort the DL as before. 3.) Just ride it out. Eventually this overload logjam will end and the project will able to support project initializations again (although your guess is as good as mine as to when that will be). In any event, I wouldn't worry about the 2 orphans you have. There are plenty of people who have DL'ed a full 10 day cache worth of long deadline tasks and then just blown them off without a second thought, so I'm not going to pillory you over a couple which got that way through no fault of yours directly. ;-) Alinator <edit> I just took a look at your account summary again, and I don't see the two HID's which had the orphans, so can we assume that Pappa's suggestions at least got the project to realize they where orphans and do something about it? |
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
so I'm not going to pillory you over a couple which got that way through no fault of yours directly. ;-) Gee, what fun are you then? I had already invited the crowd!!!!! |
|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0
|
LOL... I'm beta testing my New Year's resolution to try and be a kinder, gentler Alinator! :-D Alinator |
Carolina Calling Send message Joined: 21 May 99 Posts: 9 Credit: 2,148,935 RAC: 0
|
... I just took a look at your account summary again, and I don't see the two HID's which had the orphans, so can we assume that Pappa's suggestions at least got the project to realize they where orphans and do something about it? Actually, the old WUs DID get included in the reload. One has successfully reloaded and the other is (at this moment) 50.16% reloaded. The application is hanging BUT I'm curl'ing it separately and will stuff it into the SETI project directory when I get all of it. I'm getting between 64 to 150 KB per retry so I restart with an offset. I get to do that between 13 to 30 times and I'll have it ... (Joy and rapture unforseen... WSG) |
|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0
|
Well, that's progress anyway and somewhat reassuring to see that the task recovery methods will work even when the project is under duress. As it turns out I was bringing one of my hosts which had a complete HDD failure back online this week and was stuck at the app DL just like yours was. I chose to go the opti route (which I run normally anyway), since I don't have a lot of tolerance for piecemeal DL'ing a 2 meg + file manually (that should be getting a priority considering they just made their latest recruitment mailing recently). ;-) Alinator |
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.