Message boards :
Number crunching :
Ghost WU issue (and some talk about deadlines)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next
Author | Message |
---|---|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
I just did a little test. Hit the update button on my quad rig a dozen times or so. The first attempt resulted in a http internal server error. Refreshed the results page, and voila! Another WU shown that I did not get. Tried the button a few more times, could not connect to server. Then one more button push, and another http error. Refreshed the results page and there it was, one more WU the server thinks I have that I do not. Tested and confirmed. From my log:
Results page for my Intel host now shows: 534644251 129353870 17 May 2007 18:15:22 UTC 11 Jun 2007 16:48:02 UTC In Progress Unknown New --- --- --- 534643482 129353633 17 May 2007 18:13:59 UTC 11 Jun 2007 16:46:39 UTC In Progress Unknown New --- --- --- These are not waiting for me to work on. Don't know what the time differential is either... What I'd be interested in knowing is if you can get any new work without unloading and reloading the manager? IOW, once the condition happens does it continue to happen until the manager is unloaded and reloaded, or is it simply a server-side issue? |
Rene Send message Joined: 22 Mar 04 Posts: 53 Credit: 323,591 RAC: 0 |
Here's a small part of the messages from the manager: (after re-opening network usage) 17-5-2007 9:20:33|SETI@home|Requesting 86400 seconds of new work And a bit later on... 17-5-2007 9:37:42|SETI@home|Sending scheduler request: To fetch work Note: time notation DD-MM-YYYY... ;-) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I just did a little test. Hit the update button on my quad rig a dozen times or so. The first attempt resulted in a http internal server error. Refreshed the results page, and voila! Another WU shown that I did not get. Tried the button a few more times, could not connect to server. Then one more button push, and another http error. Refreshed the results page and there it was, one more WU the server thinks I have that I do not. Also confirmed. 'HTTP internal server error' coincides with a ghost WU, 'couldn't connect to server' doesn't. But for host 2901600 I now have: Last time contacted server 16 May 2007 21:33:44 UTC 534648073 129355132 17 May 2007 18:29:16 UTC 11 Jun 2007 17:02:10 UTC In Progress Unknown New so it's not just failing to send the WU (or rather, the instruction to tell the client to download the WU) - it's failing to update its own table to acknowledge that the host has contacted the server. The scheduler request was for 38040 seconds of work, which would have been multiple WUs, yet I only got one ghost: if I cut the cache value right down so it only asks for 1 WU, will that help, I wonder? |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Editing subject again Self-depreciating commentary in the subject title removed... Really promise this is the last time I'll change the title... LOL |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Posted a link to this thread over in the blog section in this post |
Rene Send message Joined: 22 Mar 04 Posts: 53 Credit: 323,591 RAC: 0 |
Also got one now on my Athlon running XP. "Scheduler request failed: HTTP internal server error" Suspended network usage on that one now. ;-) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
The scheduler request was for 38040 seconds of work, which would have been multiple WUs, yet I only got one ghost: if I cut the cache value right down so it only asks for 1 WU, will that help, I wonder? Well, after much fiddling with the magic jumping host venue (and several more ghost WUs), I got the scheduler request down to 1 WU (760 seconds) - and almost immediately got a successful scheduler RPC (which reset my host venue again....). Still "no work from project", though. Arrrrrgggghhhhh - while I was typing that, another request for 760 seconds got the internal server error, and another ghost. Supposition disproved: back to the drawing board. |
Conrad Human Send message Joined: 17 Nov 00 Posts: 67 Credit: 2,009,224 RAC: 0 |
What would have been nice if i could mark a unfinished unit to be resend to me by the sceduler . Oh please uncle brune i need some work |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Humble suggestion before I leave for school: Since a fair number of us are seeing these ghost units come up, it might make some sense to not attempt to get new work for a while. The more occurrances of this happening just makes the clutter harder to clean up and may be what's causing the other messages of "no work from project". We might want to give the team some time to figure out what is going on. They can likely go back and pull session logs based on those of us who have reported specific events. Just a thought, and remember, it is the thought that counts... :) |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Gavin Shaw Send message Joined: 8 Aug 00 Posts: 1116 Credit: 1,304,337 RAC: 0 |
I also had several http errors in a row and now have several ghost units that correspond to when I get those errors. Same here. I asked my system to update, got a HTTP internal server error and now have workunits listed in my result page, but they are not on my computers. To the best of my knowledge I have actually only received a total of 6 or 8 units since the new server went online and those were several days ago. I have had nothing since. It work several days just to report the results from those units and others I had before the server died. I have spent the week working on Rosetta, since I can not get anything here. the sooner everything is back to normal here here, the sooner I can get going again. And those who are getting actual workunits to work on, consider yourselves lucky. There are some of us who have had nothing for days. Never surrender and never give up. In the darkest hour there is always hope. |
AstroNerdBoy Send message Joined: 3 Jun 99 Posts: 1 Credit: 19,448,583 RAC: 0 |
And those who are getting actual workunits to work on, consider yourselves lucky. There are some of us who have had nothing for days. In another forum, there was a suggestion to uninstall BOINC, delete the BOINC directory, and re-install fresh. Since it has been two days since my last computer made a successful communication with the SETI server(s), I decided to try it myself. I didn't delete the old directory but did rename it. Sure enough, while there are communication problems seen, I have both CPU's on my machine now processing new data. Now, I've read through this thread and I'm guessing the three WU in the "Tasks" section were ghost ones since in the past, all completed WU's were sitting in the "Transfers" section until they were successfully able to transfer. Would this be a correct statement? |
ecpa Send message Joined: 3 Apr 99 Posts: 35 Credit: 9,588,416 RAC: 0 |
I also had several http errors in a row and now have several ghost units that correspond to when I get those errors. Got about 50 or more ghosts on my 7 computers. I hope this will be resolved soon. |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
And those who are getting actual workunits to work on, consider yourselves lucky. There are some of us who have had nothing for days. You are lucky. You have real work. Ghosts are results that show up on the results page of your Seti account but not in Tasks section. |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
It is possible that a other problem will come in to play pretty soon. If ghost units have a short deadline then they time-out after a couple off days, they will be resent to other hosts possible again as ghosts. If this happens to often the WU will get the "to many results" flag with-out ever being crunched. Maybe the Seti staff needs to think about shutting down downloads until the problem is solved. |
GreggyBee Send message Joined: 9 Mar 01 Posts: 203 Credit: 1,600,521 RAC: 0 |
Just spotted the thread, and checked my results page: 25 ghost units; unfortunately, I rebooted this morning and lost the message log- so I'll keep tabs on what happens if (when) things have settled down- in the meantime, I've set it to 'no new tasks'. Besides Beta managed to dump 12 Astropulse units on me before Bruno fell over: the most-processed has taken 45 3/4 hours to crunch 23.8%!!! So, I've got enough to keep me going for weeks. PLUS, Proteins@ could do with a few more active crunchers, and they're quick WU's. Patience, my friends; positive thoughts; and a gentle 'thump' for thumper 8:P /Edit miscounted the Astropulse numbers |
Kirsten Send message Joined: 7 Jul 00 Posts: 190 Credit: 566,047 RAC: 0 |
Let me ask a very stupid question: has the internal server error, that produce ghost units, anything to do with the fact that I am using KWSN's optimized applications? I have got nothing but ghost units for my two hosts the last couple of days. I saw that another user uninstalled BOINC, manually deleted his BOINC folder and reinstalled BOINC. All the hosts he did this to is now receiving work. His "untouched" hosts are still getting ghosts and/or no new work. (This is not a solution for me, as I am running other BOINC projects instead of SETI for the time being. At least I think it is bad BOINC behaviour.) The above mentioned solution does start from scratch, though. It made me think of my optimized applications and the app_info.xml Kind regards Kirsten |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Let me ask a very stupid question: has the internal server error, that produce ghost units, anything to do with the fact that I am using KWSN's optimized applications? YES!!! I was just about to post the same thing. The following has worked for me on three systems - two late version 5.8 BOINC, and a 5.3.12.tx. All were service installs, running appropriate Chicken 2.2B, and had run completely dry. Recipe: Rename app_info.xml so it won't be recognised Restart BOINC (service) Update SETI - may not get through first time, but keep trying Restore app_info.xml to original name Wait until all transfers have finished Restart BOINC (service) Outcome - decent sized cache (if I haven't nabbed them all already, LOL), still running optimised, time to open a beer. |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
Let me ask a very stupid question: has the internal server error, that produce ghost units, anything to do with the fact that I am using KWSN's optimized applications? My first thought to this question was NO, the optimized science app should not affect BOINC when fetching work. But then I remember reading somewhere that the app_info file does take a small amount of time at the server to parse. So maybe it could be possible!. I still have 1 SETI WU that is partly crunched, and 1 Beta that will be finished in about 20 minutes. If I don't get any new WUs when the last SETI WU is finished, I may try removing the optimized app temporarily, worth a try! P.S. You can switch back to using the optimized app part-way through a WU, though it does cause the "stderr" file to have confusing information in it this result is an example where I switched back to the optimized app. Sir Arthur C Clarke 1917-2008 |
GreggyBee Send message Joined: 9 Mar 01 Posts: 203 Credit: 1,600,521 RAC: 0 |
Hey Richard H- Thanx, I loved the recipe: Rename app_info.xml so it won't be recognised Restart BOINC (service) Update SETI - may not get through first time, but keep trying Restore app_info.xml to original name Wait until all transfers have finished Restart BOINC (service) It's working for me too: this should be posted in the Tech News thread ASAP |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.