Hanging workunit and odd credit claims problem solved.

Message boards : Number crunching : Hanging workunit and odd credit claims problem solved.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Wayne Miller
Avatar

Send message
Joined: 15 Mar 03
Posts: 15
Credit: 881,758
RAC: 9
United States
Message 327925 - Posted: 5 Jun 2006, 22:03:23 UTC

I also have a work unit hanging like this. I don't see any activity except for the clock.
ID: 327925 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 327955 - Posted: 5 Jun 2006, 22:34:48 UTC - in response to Message 327909.  


I've got a slew of units that just don't seem to be processing on all three of my machines. One was showing that it had been running for over 50 hours and still had 60 some hours to go. I stopped it and chose another one, but it seems to be in that same loop. I did this on all three computers and still nothing looks normal. When I ask for the graphic, it doesn't show anything happening.


Can you give me the name of this workunit, or better yet a link to its page on the server?

Thanks,

Eric
@SETIEric@qoto.org (Mastodon)

ID: 327955 · Report as offensive
Wayne Miller
Avatar

Send message
Joined: 15 Mar 03
Posts: 15
Credit: 881,758
RAC: 9
United States
Message 327967 - Posted: 5 Jun 2006, 22:47:46 UTC

Here it is, 27mr99aa.29951.15681.579834.3.238_4
Thanks for your help..
ID: 327967 · Report as offensive
Profile Digger
Volunteer tester

Send message
Joined: 4 Dec 99
Posts: 614
Credit: 21,053
RAC: 0
United States
Message 327971 - Posted: 5 Jun 2006, 22:49:36 UTC


Quick question:

Are there some workunits that have simply been wiped out entirely in correcting this problem? I crunched, uploaded, and reported a workunit yesterday that was completed successfully in 6.5 hours but now does not appear in my account information at all. It appears to have simply vanished off the face of the earth. It is in my BoincView log as downloaded and crunched but does not appear in my results list. I'm quite confused so if someone might enlighten me I would be much obliged :)

Thanks,

Dig

ID: 327971 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 327974 - Posted: 5 Jun 2006, 22:51:17 UTC
Last modified: 5 Jun 2006, 22:51:52 UTC

Eric the link to Result ID of Wayne Millers' wu is 27mr99aa.29951.15681.579834.3.238_4
ID: 327974 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 327983 - Posted: 5 Jun 2006, 23:00:51 UTC - in response to Message 327955.  


I've got a slew of units that just don't seem to be processing on all three of my machines. One was showing that it had been running for over 50 hours and still had 60 some hours to go. I stopped it and chose another one, but it seems to be in that same loop. I did this on all three computers and still nothing looks normal. When I ask for the graphic, it doesn't show anything happening.


Can you give me the name of this workunit, or better yet a link to its page on the server?

Thanks,

Eric


I take it this means that the issue(s) with the splitters have not been corrected?
ID: 327983 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 327994 - Posted: 5 Jun 2006, 23:10:14 UTC - in response to Message 327983.  

I take it this means that the issue(s) with the splitters have not been corrected?


The wu in question here wuid-8030467 was split in May, so doesn't say anything about splitter-fix working or not...
ID: 327994 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 328016 - Posted: 5 Jun 2006, 23:26:56 UTC - in response to Message 327994.  

I take it this means that the issue(s) with the splitters have not been corrected?


The wu in question here wuid-8030467 was split in May, so doesn't say anything about splitter-fix working or not...


Yep, that's one of the bad ones. Note that it has been cancelled. Unfortunately, not until it had been send out four times.
@SETIEric@qoto.org (Mastodon)

ID: 328016 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 328035 - Posted: 5 Jun 2006, 23:43:59 UTC - in response to Message 327955.  


I've got a slew of units that just don't seem to be processing on all three of my machines. One was showing that it had been running for over 50 hours and still had 60 some hours to go. I stopped it and chose another one, but it seems to be in that same loop. I did this on all three computers and still nothing looks normal. When I ask for the graphic, it doesn't show anything happening.


Can you give me the name of this workunit, or better yet a link to its page on the server?

Thanks,

Eric


Hi Eric,

I'm not sure exactly what you want, but here is what appears in the Work area of Boinc Manager.

27mr99aa.29951.15681.579834.3.67_0

However, there are several more that when started, seemed that they too were going backwards and not producing any graphics. Some of which start with the same numbers as the unit above, but some differ as well. What should I do with these units? I don't seem to be able to delete them from the Boinc Mgr program. Do I need to find their home on my drive and kill them off there or will Seti somehow rid me of them?

Thanks, Allen
ID: 328035 · Report as offensive
Profile [AF>france>pas-de-calais]symaski62
Volunteer tester

Send message
Joined: 12 Aug 05
Posts: 258
Credit: 100,548
RAC: 0
France
Message 328045 - Posted: 5 Jun 2006, 23:55:59 UTC - in response to Message 327974.  

Eric the link to Result ID of Wayne Millers' wu is 27mr99aa.29951.15681.579834.3.238_4


http://setiathome.berkeley.edu/workunit.php?wuid=80304676

errors => Cancelled

:)
SETI@Home Informational message -9 result_overflow
with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding.
ID: 328045 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 328056 - Posted: 6 Jun 2006, 0:06:00 UTC - in response to Message 328016.  
Last modified: 6 Jun 2006, 0:07:39 UTC

I take it this means that the issue(s) with the splitters have not been corrected?


The wu in question here wuid-8030467 was split in May, so doesn't say anything about splitter-fix working or not...


Yep, that's one of the bad ones. Note that it has been cancelled. Unfortunately, not until it had been send out four times.



Eric,

Here are a couple more....

27mr99aa.29951.15681.579834.3.82_3
27mr99aa.29951.15681.579834.3.88_0

I'm not sure how many more I have. Without trying them, is there anything one can do to get rid of them?

ID: 328056 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 328082 - Posted: 6 Jun 2006, 0:16:29 UTC - in response to Message 328056.  


I'm not sure how many more I have. Without trying them, is there anything one can do to get rid of them?


If the beginnings of their names appear in this list then you should probably delete or abort them.

Eric

@SETIEric@qoto.org (Mastodon)

ID: 328082 · Report as offensive
Wayne Miller
Avatar

Send message
Joined: 15 Mar 03
Posts: 15
Credit: 881,758
RAC: 9
United States
Message 328146 - Posted: 6 Jun 2006, 0:45:37 UTC

Thank you Eric...
ID: 328146 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 328176 - Posted: 6 Jun 2006, 1:04:18 UTC - in response to Message 328082.  


I'm not sure how many more I have. Without trying them, is there anything one can do to get rid of them?


If the beginnings of their names appear in this list then you should probably delete or abort them.

Eric


Thanks for the list, Eric.
ID: 328176 · Report as offensive
Matt Thul

Send message
Joined: 25 May 99
Posts: 9
Credit: 9,788,837
RAC: 0
United States
Message 329748 - Posted: 7 Jun 2006, 15:23:15 UTC

I have encountered a few that continue to use proc without updating the state.sah file.

28fe99aa.18483.22561.934666.3 does not appear to be on the list of known problems.


ID: 329748 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 329756 - Posted: 7 Jun 2006, 15:44:21 UTC

The following unit hung up at 8+ hours work time and had to be restarted at that point claimed credit for only 4 hours of cpu time:20mr99aa.22885.24002.379824.3.94_2
ID: 329756 · Report as offensive
Matt Thul

Send message
Joined: 25 May 99
Posts: 9
Credit: 9,788,837
RAC: 0
United States
Message 329824 - Posted: 7 Jun 2006, 17:06:24 UTC

Ok, here's a list of units (not on Eric's 'bad' list) which I found busily using cpu, but neither updating percentage complete nor updating the state.sah file for more than 24 hours.

Note these were scattered across a half-dozen servers, and other work units (on those running more than one unit) appeared to be proceeding normally.

28fe99aa.18483.22561.934666.3.112
03mr99ab.9678.28129.1047142.3.0
20mr99aa.15250.20145.984656.3.71
30au99ab.18587.29250.573572.3.172
24mr99ab.175.2848.697140.3.2
22mr99aa.23950.25105.54842.3.101
23mr99aa.23594.22768.565904.3.154
14ja99aa.26346.5648.947164.3.60

Is there anything else I should look at when I find more of them?
ID: 329824 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 330311 - Posted: 7 Jun 2006, 23:48:56 UTC - in response to Message 329824.  

I'll check out whether they should be in the list or not. It could be that I missed a few...

Eric


Ok, here's a list of units (not on Eric's 'bad' list) which I found busily using cpu, but neither updating percentage complete nor updating the state.sah file for more than 24 hours.

Note these were scattered across a half-dozen servers, and other work units (on those running more than one unit) appeared to be proceeding normally.

28fe99aa.18483.22561.934666.3.112
03mr99ab.9678.28129.1047142.3.0
20mr99aa.15250.20145.984656.3.71
30au99ab.18587.29250.573572.3.172
24mr99ab.175.2848.697140.3.2
22mr99aa.23950.25105.54842.3.101
23mr99aa.23594.22768.565904.3.154
14ja99aa.26346.5648.947164.3.60

Is there anything else I should look at when I find more of them?


@SETIEric@qoto.org (Mastodon)

ID: 330311 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 330684 - Posted: 8 Jun 2006, 3:45:45 UTC - in response to Message 330311.  

If you've still got any of these, could you send me the workunit file and the contents of stderr? (email to my last name at ssl.berkeley.edu) Which core client version are you using?

Ok, here's a list of units (not on Eric's 'bad' list) which I found busily using cpu, but neither updating percentage complete nor updating the state.sah file for more than 24 hours.

Note these were scattered across a half-dozen servers, and other work units (on those running more than one unit) appeared to be proceeding normally.

28fe99aa.18483.22561.934666.3.112
03mr99ab.9678.28129.1047142.3.0
20mr99aa.15250.20145.984656.3.71
30au99ab.18587.29250.573572.3.172
24mr99ab.175.2848.697140.3.2
22mr99aa.23950.25105.54842.3.101
23mr99aa.23594.22768.565904.3.154
14ja99aa.26346.5648.947164.3.60

Is there anything else I should look at when I find more of them?

@SETIEric@qoto.org (Mastodon)

ID: 330684 · Report as offensive
EricVonDaniken

Send message
Joined: 17 Apr 04
Posts: 177
Credit: 67,881
RAC: 0
United States
Message 337023 - Posted: 14 Jun 2006, 21:22:43 UTC

T2600 CPU
WinXP SP2 w/ latest patches.

BOINC client 5.4.9 for windows_intelx86
setiathome_5.15_windows_intelx86.exe

WU 22mr99ab.5386.10081.736064.3.0_0
at this post,
8 hours of CPU time 23% progress 9 hours to completion

My average is ~02:20:00 per WU
ID: 337023 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Hanging workunit and odd credit claims problem solved.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.