Hanging workunit and odd credit claims problem solved.

Message boards : Number crunching : Hanging workunit and odd credit claims problem solved.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 326708 - Posted: 5 Jun 2006, 1:18:00 UTC


Thanks to some clues from Josef Segur and Tetsuji Rai, I've located the cause of the "overflow results that hang" problem. It appears that it wasn't an application issue at all, but a problem with the splitter that resulted in thresholds that were set too low in such a way that the triplet finder attempted to do an infinite amount of work.

This is also the cause of some long running workunits asking for too little credit.

The problem has been fixed and I've started the new splitters. Once these results get cleared the problem should go away. I'm working on a script to fix the problem.

You can identify workunits made by the new splitter because they will have values in the header field between -12288 and -13312. If you have problems with the new workunits, let me know.


@SETIEric@qoto.org (Mastodon)

ID: 326708 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 326714 - Posted: 5 Jun 2006, 1:23:27 UTC - in response to Message 326708.  


The problem has been fixed and I've started the new splitters. Once these results get cleared the problem should go away. I'm working on a script to fix the problem.


Thanks Eric... Any estimate on how many of the affected units are still out there? I'm still hesitant about retrieving work again...

Brian

ID: 326714 · Report as offensive
Profile Digger
Volunteer tester

Send message
Joined: 4 Dec 99
Posts: 614
Credit: 21,053
RAC: 0
United States
Message 326723 - Posted: 5 Jun 2006, 1:30:29 UTC
Last modified: 5 Jun 2006, 1:45:32 UTC


Eric that's great news... and thanks to Tetsuji and Josef for the assist!

ID: 326723 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 326729 - Posted: 5 Jun 2006, 1:35:57 UTC - in response to Message 326714.  


The problem has been fixed and I've started the new splitters. Once these results get cleared the problem should go away. I'm working on a script to fix the problem.


Thanks Eric... Any estimate on how many of the affected units are still out there? I'm still hesitant about retrieving work again...



Some are easy to find (ones that that more than five results sent because of errors.)

I'll try to post a "the coast is really clear" message when I'm sure I've gotten rid of some of the problem units.

@SETIEric@qoto.org (Mastodon)

ID: 326729 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 326755 - Posted: 5 Jun 2006, 2:10:55 UTC - in response to Message 326714.  


The problem has been fixed and I've started the new splitters. Once these results get cleared the problem should go away. I'm working on a script to fix the problem.


Thanks Eric... Any estimate on how many of the affected units are still out there? I'm still hesitant about retrieving work again...

Brian

My rough guess is about 1% of old WUs will have the problem, or maybe 4000 of the approximately 400000 which are ready to send. With the download rate near 6 WUs per second (17 Mbps), those 400000 will be sent in the next 18 hours or so. If the script can keep those with the problem from being resent, the problem will be fully gone within a day or so.
                                         Joe
ID: 326755 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 326761 - Posted: 5 Jun 2006, 2:22:50 UTC - in response to Message 326755.  


The problem has been fixed and I've started the new splitters. Once these results get cleared the problem should go away. I'm working on a script to fix the problem.


Thanks Eric... Any estimate on how many of the affected units are still out there? I'm still hesitant about retrieving work again...

Brian

My rough guess is about 1% of old WUs will have the problem, or maybe 4000 of the approximately 400000 which are ready to send. With the download rate near 6 WUs per second (17 Mbps), those 400000 will be sent in the next 18 hours or so. If the script can keep those with the problem from being resent, the problem will be fully gone within a day or so.
                                         Joe


OK... Thanks to both of you for the update...
ID: 326761 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 326891 - Posted: 5 Jun 2006, 4:11:38 UTC - in response to Message 326708.  





The problem has been fixed and I've started the new splitters. Once these results get cleared the problem should go away. I'm working on a script to fix the problem


Eric Korpela
Project scientist



I would just like to add my thanks along with Digger and Brian Silvers ,

to , Josef W. Segur and Tetsuji Maverick Rai , for helping Eric out,

thanks very much Eric for posting this information

byron


ID: 326891 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 327059 - Posted: 5 Jun 2006, 6:08:50 UTC - in response to Message 326755.  


My rough guess is about 1% of old WUs will have the problem, or maybe 4000 of the approximately 400000 which are ready to send. With the download rate near 6 WUs per second (17 Mbps), those 400000 will be sent in the next 18 hours or so. If the script can keep those with the problem from being resent, the problem will be fully gone within a day or so.
                                         Joe


Those are pretty good guesses. The total number of bad workunits created was 45056. About 7000 were still around. I think I've gotten them all cancelled at this point.

Eric
@SETIEric@qoto.org (Mastodon)

ID: 327059 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 327146 - Posted: 5 Jun 2006, 7:14:09 UTC - in response to Message 327059.  
Last modified: 5 Jun 2006, 7:28:33 UTC



HI Eric ,

OT

Just a quick question ...

I was crunching an AstroPulse app. in July of 2003 ........ when I and 700 other people ... were .. Beta testing .... Boinc ... Versoin ... 0.1.xx

Will you be using that ..... AstroPulse app. ___ to start your new build ... of the new ... AstroPulse app. ?

Byron


ID: 327146 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 327182 - Posted: 5 Jun 2006, 7:41:01 UTC - in response to Message 327146.  


I was crunching an AstroPulse app. in July of 2003 ........ when I and 700 other people ... were .. Beta testing .... Boinc ... Versoin ... 0.1.xx

Will you be using that ..... AstroPulse app. ___ to start your new build ... of the new ... AstroPulse app. ?


We've got a new graduate student (Josh von Korff) working on Astropulse. He found a number of bugs in the existing app, some of them serious in terms of the science. He's working on a new build. Most of the code is based upon the old version you were beta testing.

I can't give you a good ETA for the new version. Josh is at Arecibo right now helping Jeff and Dan with the data recorder and he needs my help for some problems he's having.

Eric
@SETIEric@qoto.org (Mastodon)

ID: 327182 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 327190 - Posted: 5 Jun 2006, 7:51:44 UTC

Eric could you take a look at the computer result from post in Q&A



Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 327190 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 327192 - Posted: 5 Jun 2006, 7:58:09 UTC - in response to Message 327190.  

Thanks, that means somethings going wrong with at least one of the splitters...
I'll check it out.

Eric

Eric could you take a look at the computer result from post in Q&A



@SETIEric@qoto.org (Mastodon)

ID: 327192 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 327198 - Posted: 5 Jun 2006, 8:08:16 UTC - in response to Message 327182.  
Last modified: 5 Jun 2006, 8:18:12 UTC





I was crunching an AstroPulse app. in July of 2003 ........ when I and 700 other people ... were .. Beta testing .... Boinc ... Versoin ... 0.1.xx

Will you be using that ..... AstroPulse app. ___ to start your new build ... of the new ... AstroPulse app. ?

Byron



We've got a new graduate student (Josh von Korff) working on Astropulse. He found a number of bugs in the existing app, some of them serious in terms of the science. He's working on a new build. Most of the code is based upon the old version you were beta testing.

I can't give you a good ETA for the new version. Josh is at Arecibo right now helping Jeff and Dan with the data recorder and he needs my help for some problems he's having.

Eric




thanks Eriic ...

that's Great News !

even Greater News is the work Jeff , Dan and Josh von Korff are doing dowm in Arecibo

with the New .......


Multi-Beam Receiver Promises New Vistas for SETI Research

woo hoo

the Great Scientific Ahventure ..... of SETI ....cotinues ...........

thanks Eric
Best Wishes
Byron
Vancouver
Canada



ID: 327198 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 327201 - Posted: 5 Jun 2006, 8:18:00 UTC - in response to Message 327192.  

Thanks, that means somethings going wrong with at least one of the splitters...
I'll check it out.


I take that back. The workunits look fine on this end. It probably means there's something between BOINC and the server that's messing things up. Personal firewall???


@SETIEric@qoto.org (Mastodon)

ID: 327201 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 327208 - Posted: 5 Jun 2006, 8:36:34 UTC - in response to Message 327180.  
Last modified: 5 Jun 2006, 9:10:12 UTC





Hi Rom ,

sorry OT ___ :(

Just wanted to let everyone know that Eric did replied to __my Post ____ 8 __ posts down ___

the following was copied from here:

http://setiathome.berkeley.edu/forum_thread.php?id=31554#327182




HI Eric ,

OT

Just a quick question ...

I was crunching an AstroPulse app. in July of 2003 ........ when I and 700 other people ... were .. Beta testing .... Boinc ... Versoin ... 0.1.xx

Will you be using that ..... AstroPulse app. ___ to start your new build ... of the new ... AstroPulse app. ?

Byron








We've got a new graduate student (Josh von Korff) working on Astropulse. He found a number of bugs in the existing app, some of them serious in terms of the science. He's working on a new build. Most of the code is based upon the old version you were beta testing.

I can't give you a good ETA for the new version. Josh is at Arecibo right now helping Jeff and Dan with the data recorder and he needs my help for some problems he's having.

Eric




thanks Eriic ...

that's Great News !

even Greater News is the work Jeff , Dan and Josh von Korff are doing dowm in Arecibo

with the New .......


Multi-Beam Receiver Promises New Vistas for SETI Research

woo hoo

the Great Scientific Ahventure ..... of SETI ....cotinues ...........

thanks Eric
Best Wishes
Byron
Vancouver
Canada



Hi Rom ,

sorry OT ___ :(

Just wanted to let everyone know that Eric did replied to my post

the above was copied from here:

http://setiathome.berkeley.edu/forum_thread.php?id=31554#327182


ID: 327208 · Report as offensive
paul milton
Avatar

Send message
Joined: 24 Feb 03
Posts: 56
Credit: 73,265
RAC: 0
United States
Message 327232 - Posted: 5 Jun 2006, 9:32:04 UTC - in response to Message 327201.  
Last modified: 5 Jun 2006, 9:40:55 UTC

Thanks, that means somethings going wrong with at least one of the splitters...
I'll check it out.


I take that back. The workunits look fine on this end. It probably means there's something between BOINC and the server that's messing things up. Personal firewall???



dont know, but i just got that same error (i believe)

6/5/2006 5:27:16 AM|SETI@home|Unrecoverable error for result 27mr99aa.2066.31536.409648.3.192_1 ( - exit code -6 (0xfffffffa))

wich i just downloaded
6/5/2006 5:27:13 AM|SETI@home|Finished download of file 27mr99aa.2066.31536.409648.3.192


SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 219



first time ive seen that one..

edit: uhhhh, scratch that, it appears to have just been "unsent".. man that was fast.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Pain dosen't hurt, when it's all you have ever felt"
ID: 327232 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 327245 - Posted: 5 Jun 2006, 9:55:59 UTC - in response to Message 327232.  
Last modified: 5 Jun 2006, 10:55:34 UTC

Thanks, that means somethings going wrong with at least one of the splitters...
I'll check it out.


I take that back. The workunits look fine on this end. It probably means there's something between BOINC and the server that's messing things up. Personal firewall???



dont know, but i just got that same error (i believe)

6/5/2006 5:27:16 AM|SETI@home|Unrecoverable error for result 27mr99aa.2066.31536.409648.3.192_1 ( - exit code -6 (0xfffffffa))

wich i just downloaded
6/5/2006 5:27:13 AM|SETI@home|Finished download of file 27mr99aa.2066.31536.409648.3.192


SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 219



first time ive seen that one..

edit: uhhhh, scratch that, it appears to have just been "unsent".. man that was fast.


abort it!! I checked its sibling (27mr99aa.2066.31536.409648.3.19) and found it has no header!!

Thank for notifying.

-Tetsuji

EDIT: there are many other similar wu's, but I think this is a temporary issue; I guess these were created while Eric was working on the previous problem. I emailed Eric on this.
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 327245 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 327268 - Posted: 5 Jun 2006, 10:37:54 UTC - in response to Message 327245.  

Thanks, that means somethings going wrong with at least one of the splitters...
I'll check it out.


I take that back. The workunits look fine on this end. It probably means there's something between BOINC and the server that's messing things up. Personal firewall???



dont know, but i just got that same error (i believe)




I'll check it out again.

Eric
@SETIEric@qoto.org (Mastodon)

ID: 327268 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 327319 - Posted: 5 Jun 2006, 11:42:14 UTC
Last modified: 5 Jun 2006, 11:52:37 UTC

Hi Eric, it's 4:30am there. Are you ok? Sound like workaholic... or is it because you are at Kitt Peak? When I sent email on this issue, I added "emergency" in the title because I guessed it would be some hours later when you would wake up in the morning, but in reality you responded within 30 minutes,, 3:30am PDT!

Take care....

-Tetsuji

EDIT: okay, I bet 1000 credits this issue is because of the debugging code for the previous problem, so we won't have this problem later on! No? Either way I like anomaly if it's harmless :) (this is harmless in a sense, I think)

Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 327319 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 327909 - Posted: 5 Jun 2006, 21:51:09 UTC - in response to Message 327268.  

Thanks, that means somethings going wrong with at least one of the splitters...
I'll check it out.


I take that back. The workunits look fine on this end. It probably means there's something between BOINC and the server that's messing things up. Personal firewall???



dont know, but i just got that same error (i believe)




I'll check it out again.

Eric


Eric,

I've got a slew of units that just don't seem to be processing on all three of my machines. One was showing that it had been running for over 50 hours and still had 60 some hours to go. I stopped it and chose another one, but it seems to be in that same loop. I did this on all three computers and still nothing looks normal. When I ask for the graphic, it doesn't show anything happening.

What's up?

Allen


ID: 327909 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Hanging workunit and odd credit claims problem solved.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.