Message boards :
Number crunching :
Hanging workunit and odd credit claims problem solved.
Message board moderation
Author | Message |
---|---|
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Thanks to some clues from Josef Segur and Tetsuji Rai, I've located the cause of the "overflow results that hang" problem. It appears that it wasn't an application issue at all, but a problem with the splitter that resulted in thresholds that were set too low in such a way that the triplet finder attempted to do an infinite amount of work. This is also the cause of some long running workunits asking for too little credit. The problem has been fixed and I've started the new splitters. Once these results get cleared the problem should go away. I'm working on a script to fix the problem. You can identify workunits made by the new splitter because they will have values in the @SETIEric@qoto.org (Mastodon) |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Thanks Eric... Any estimate on how many of the affected units are still out there? I'm still hesitant about retrieving work again... Brian |
Digger Send message Joined: 4 Dec 99 Posts: 614 Credit: 21,053 RAC: 0 |
Eric that's great news... and thanks to Tetsuji and Josef for the assist! |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Some are easy to find (ones that that more than five results sent because of errors.) I'll try to post a "the coast is really clear" message when I'm sure I've gotten rid of some of the problem units. @SETIEric@qoto.org (Mastodon) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
My rough guess is about 1% of old WUs will have the problem, or maybe 4000 of the approximately 400000 which are ready to send. With the download rate near 6 WUs per second (17 Mbps), those 400000 will be sent in the next 18 hours or so. If the script can keep those with the problem from being resent, the problem will be fully gone within a day or so. Joe |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
OK... Thanks to both of you for the update... |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
I would just like to add my thanks along with Digger and Brian Silvers , to , Josef W. Segur and Tetsuji Maverick Rai , for helping Eric out, thanks very much Eric for posting this information byron |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Those are pretty good guesses. The total number of bad workunits created was 45056. About 7000 were still around. I think I've gotten them all cancelled at this point. Eric @SETIEric@qoto.org (Mastodon) |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
HI Eric , OT Just a quick question ... I was crunching an AstroPulse app. in July of 2003 ........ when I and 700 other people ... were .. Beta testing .... Boinc ... Versoin ... 0.1.xx Will you be using that ..... AstroPulse app. ___ to start your new build ... of the new ... AstroPulse app. ? Byron |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
We've got a new graduate student (Josh von Korff) working on Astropulse. He found a number of bugs in the existing app, some of them serious in terms of the science. He's working on a new build. Most of the code is based upon the old version you were beta testing. I can't give you a good ETA for the new version. Josh is at Arecibo right now helping Jeff and Dan with the data recorder and he needs my help for some problems he's having. Eric @SETIEric@qoto.org (Mastodon) |
Aurora Borealis Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0 |
Eric could you take a look at the computer result from post in Q&A Boinc V7.2.42 Win7 i5 3.33G 4GB, GTX470 |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Thanks, that means somethings going wrong with at least one of the splitters... I'll check it out. Eric Eric could you take a look at the computer result from post in Q&A @SETIEric@qoto.org (Mastodon) |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
thanks Eriic ... that's Great News ! even Greater News is the work Jeff , Dan and Josh von Korff are doing dowm in Arecibo with the New ....... Multi-Beam Receiver Promises New Vistas for SETI Research woo hoo the Great Scientific Ahventure ..... of SETI ....cotinues ........... thanks Eric Best Wishes Byron Vancouver Canada |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Thanks, that means somethings going wrong with at least one of the splitters... I take that back. The workunits look fine on this end. It probably means there's something between BOINC and the server that's messing things up. Personal firewall??? @SETIEric@qoto.org (Mastodon) |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Hi Rom , sorry OT ___ :( Just wanted to let everyone know that Eric did replied to __my Post ____ 8 __ posts down ___ the following was copied from here: http://setiathome.berkeley.edu/forum_thread.php?id=31554#327182
thanks Eriic ... that's Great News ! even Greater News is the work Jeff , Dan and Josh von Korff are doing dowm in Arecibo with the New ....... Multi-Beam Receiver Promises New Vistas for SETI Research woo hoo the Great Scientific Ahventure ..... of SETI ....cotinues ........... thanks Eric Best Wishes Byron Vancouver Canada Hi Rom , sorry OT ___ :( Just wanted to let everyone know that Eric did replied to my post the above was copied from here: http://setiathome.berkeley.edu/forum_thread.php?id=31554#327182 |
paul milton Send message Joined: 24 Feb 03 Posts: 56 Credit: 73,265 RAC: 0 |
Thanks, that means somethings going wrong with at least one of the splitters... dont know, but i just got that same error (i believe) 6/5/2006 5:27:16 AM|SETI@home|Unrecoverable error for result 27mr99aa.2066.31536.409648.3.192_1 ( - exit code -6 (0xfffffffa)) wich i just downloaded 6/5/2006 5:27:13 AM|SETI@home|Finished download of file 27mr99aa.2066.31536.409648.3.192 SETI@home error -6 Bad workunit header !swi.data_type || !found || !swi.nsamples File: ..\\seti_header.cpp Line: 219 first time ive seen that one.. edit: uhhhh, scratch that, it appears to have just been "unsent".. man that was fast. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Pain dosen't hurt, when it's all you have ever felt" |
Tetsuji Maverick Rai Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 |
Thanks, that means somethings going wrong with at least one of the splitters... abort it!! I checked its sibling (27mr99aa.2066.31536.409648.3.19) and found it has no header!! Thank for notifying. -Tetsuji EDIT: there are many other similar wu's, but I think this is a temporary issue; I guess these were created while Eric was working on the previous problem. I emailed Eric on this. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Thanks, that means somethings going wrong with at least one of the splitters... I'll check it out again. Eric @SETIEric@qoto.org (Mastodon) |
Tetsuji Maverick Rai Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 |
Hi Eric, it's 4:30am there. Are you ok? Sound like workaholic... or is it because you are at Kitt Peak? When I sent email on this issue, I added "emergency" in the title because I guessed it would be some hours later when you would wake up in the morning, but in reality you responded within 30 minutes,, 3:30am PDT! Take care.... -Tetsuji EDIT: okay, I bet 1000 credits this issue is because of the debugging code for the previous problem, so we won't have this problem later on! No? Either way I like anomaly if it's harmless :) (this is harmless in a sense, I think) Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
Thanks, that means somethings going wrong with at least one of the splitters... Eric, I've got a slew of units that just don't seem to be processing on all three of my machines. One was showing that it had been running for over 50 hours and still had 60 some hours to go. I stopped it and chose another one, but it seems to be in that same loop. I did this on all three computers and still nothing looks normal. When I ask for the graphic, it doesn't show anything happening. What's up? Allen |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.