Can't find good ans. 4 LOSS of WUs..?

Message boards : Number crunching : Can't find good ans. 4 LOSS of WUs..?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile [Cx]
Avatar

Send message
Joined: 25 Jul 05
Posts: 141
Credit: 25,742
RAC: 0
United Kingdom
Message 210190 - Posted: 11 Dec 2005, 11:31:36 UTC

I've tried to INFORM myself as much as possible, cuz we all know how clueless it is to duplicate questions already posted and replied to...

I've read the Tech News, and get the general idea as to WHY there's been a delay with u/l / d/l of WUs...

I've read through a few threads -- some of them quite long -- in search of Clues, hoping I wouldn't have to post...

Now that I am, I'd appreciate constructive and informative answers, please. If you're aching to flame because you think I'm "asking the same question as thousands of others", please don't waste your time.
(Nothing you say will move me to give you the time of day to reply to your flame.) ;)
Thanks.

OK then...

SITUATION:
Obviously being similar to many at present -- completed WUs aren't going through / being u/l'd.
Message screen is reporting "Temporarily failed upload of [WU] / Backing off [min] [sec] on upload.."
and "Scheduler request .. failed / No schedulers responded / Deferring communication with project.."

((Sidebar: How do you get your Messages to also list a "Reason"? Someone posted their messages showing a "Reason" line; I don't get any "Reason"s at all!))

PROBLEM:
IN STARK CONTRAST to others I've read about, I DO NOT have a "screen-full" (no matter what size! ;) ) of WUs waiting to u/l, AND YET I have **no movement** in my Credit count for WUs completed.
.. and it's been A WEEK since my last credit, and I've "completed" several WUs..!

QUESTION:
Since the implication that NOT having a list of WUs waiting to Upload (u/l) is that they **should have** u/l'd, then where's my Credit?

ALTERNATIVELY,
Is there any chance that a "failed" u/l eventually removes itself (?!?) from the Transfers Q?

ALSO:
If you have a COMPLETED WU, and it has FAILED to u/l, and *then* you select it and click "Suspend", WHAT is _supposed_ to happen to/with the WU?
(Asking this because.. I tried to "suspend" a WU whose [WORK tab] Status was "Uploading", and I've since Shutdown & rebooted my PC to find the WU is **GONE**!!)

I have NOT "Reset" the Project.
I have NOT "Detach"ed
I HAVE clicked "No New Work" (for the time being)

2ND Q:
Could there be "completed" WUs sitting on my PC that BOINC has somehow "lost track" of, that may still be salvaged and u/l'd?

Okay.. that's my lot. Hope someone, or several someones, with many more clues than myself, is/are able to Clue Me In. Thanks!

-( moi )-
ID: 210190 · Report as offensive
Profile cjsoftuk
Volunteer tester

Send message
Joined: 3 Sep 04
Posts: 248
Credit: 183,721
RAC: 0
United Kingdom
Message 210196 - Posted: 11 Dec 2005, 11:42:54 UTC

Question 1: Why haven't I got credit?
Answer: The Validators are off at the moment. No credits are being produced.

Question 2: Is there a chance that a failed u/l removes itself?
Answer: NO. BOINC will never destroy work that is completed.

Question 3: If you click suspend on an U/ling WU, what happens?
Answer: As far as I know, nothing should happen. However, it seems that those who are a little inquisitive have found a new bug. Well done on finding this bug. Reporting to BOINC Bug Database.

Question 4: Can BOINC possibly have lost track of WUs?
Answer: It is possible that if BOINC was killed by Windows while it was writing the Client_State.xml file, then yes, it is possible it would lose work. Not without very careful editing of client_state.xml would you be able to recover them.
ID: 210196 · Report as offensive
J D K
Volunteer tester
Avatar

Send message
Joined: 26 May 04
Posts: 1295
Credit: 311,371
RAC: 0
United States
Message 210266 - Posted: 11 Dec 2005, 13:33:25 UTC

Your WUs


40504969 10 Dec 2005 3:50:06 UTC 24 Dec 2005 3:50:06 UTC In Progress Unknown New --- --- --- 167513580 40075906 7 Dec 2005 3:08:59 UTC 8 Dec 2005 3:47:24 UTC Over Client error Downloading 0.00 0.00 --- 166721572 39885694 6 Dec 2005 3:27:22 UTC 20 Dec 2005 3:27:22 UTC In Progress Unknown New --- --- --- 166681972 39875953 6 Dec 2005 2:16:00 UTC 8 Dec 2005 3:47:24 UTC Over Success Done 19,957.33 43.23 pending 165726240 39641989 5 Dec 2005 0:34:55 UTC 8 Dec 2005 3:47:24 UTC Over Success Done 15,864.43 34.37 pending 165719608 39640364 5 Dec 2005 0:24:46 UTC 5 Dec 2005 2:32:14 UTC Over Success Done 6,678.22 13.89 pending 165522804 39592262 4 Dec 2005 19:32:37 UTC 5 Dec 2005 10:56:54 UTC Over Success Done 463.64 0.96 pending 165366054 39554076 4 Dec 2005 15:45:30 UTC 5 Dec 2005 10:56:54 UTC Over Validate error Done 15,751.76 --- --- 164844749 39426971 4 Dec 2005 1:53:03 UTC 4 Dec 2005 19:42:46 UTC Over Success Done 13,756.47 28.60 30.14 163938425 39206217 3 Dec 2005 1:14:52 UTC 4 Dec 2005 20:39:17 UTC Over Validate error Done 17,286.52 --- --- 163188796 39024369 2 Dec 2005 5:21:44 UTC 4 Dec 2005 1:42:53 UTC Over Success Done 15,971.16 33.21 32.52 162968063 38972087 1 Dec 2005 23:07:47 UTC 3 Dec 2005 1:14:52 UTC Over Client error Computing 14,458.08 30.06 --- 162799766 38931183 1 Dec 2005 18:40:03 UTC 1 Dec 2005 23:07:47 UTC Over Client error Downloading 0.00 0.00 --- 162140666 38771431 1 Dec 2005 1:04:41 UTC 2 Dec 2005 5:11:36 UTC Over Success Done 16,480.97 36.76 22.10 160584490 38392070 29 Nov 2005 2:14:16 UTC 1 Dec 2005 0:51:45 UTC Over Success Done 17,873.32 39.87 31.99 157778815 37709437 25 Nov 2005 10:58:30 UTC 26 Nov 2005 19:46:43 UTC Over Success Done 16,227.16 35.66 20.95




166627461 39862590 6 Dec 2005 0:37:10 UTC 20 Dec 2005 0:37:10 UTC In Progress Unknown New --- --- --- 158942330 37990611 26 Nov 2005 23:23:36 UTC 10 Dec 2005 23:23:36 UTC Over No reply New 0.00 --- --- 157396634 37616554 24 Nov 2005 22:28:13 UTC 3 Dec 2005 17:29:51 UTC Over Success Done 15,542.05 23.04 23.04 155245066 37089134 21 Nov 2005 22:27:42 UTC 24 Nov 2005 22:38:22 UTC Over Success Done 18,822.59 28.79 28.79
And the beat goes on
Sonny and Cher

BOINC Wiki

ID: 210266 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 210337 - Posted: 11 Dec 2005, 15:05:54 UTC
Last modified: 11 Dec 2005, 15:08:22 UTC

The simple answer is you result have probably uploaded properly onto Beckley's server. To keep things humming a little better, they have probably stopped processing the data for the moment and just cashing it. To me it is logical that they don't want to add to the stress on the database servers until the backlog catches up. I know that my result have not been updated yet and the credits wont be issued until they have been validate. Credits will probably take up to a week after the current crisis is over since the backlog will then need to be processed.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 210337 · Report as offensive
Bill Barto

Send message
Joined: 28 Jun 99
Posts: 864
Credit: 58,712,313
RAC: 91
United States
Message 210364 - Posted: 11 Dec 2005, 15:29:12 UTC

dasilvac,

On this workunit:

http://setiathome.berkeley.edu/result.php?resultid=167513580

you clicked on the "abort" button under the "transfer" tab. This action deletes the result from the transfer queue and prevents you from getting credit for the result. If you have done this to any other result you have also lost them.
ID: 210364 · Report as offensive
Profile [Cx]
Avatar

Send message
Joined: 25 Jul 05
Posts: 141
Credit: 25,742
RAC: 0
United Kingdom
Message 210382 - Posted: 11 Dec 2005, 15:54:14 UTC - in response to Message 210364.  

On this workunit:

http://setiathome.berkeley.edu/result.php?resultid=167513580

you clicked on the "abort" button under the "transfer" tab. This action deletes the result from the transfer queue and prevents you from getting credit for the result. If you have done this to any other result you have also lost them.



Good Gracious..!

So, whereas a more sensible assumption that:
Aborting a *TRANSFER* would keep a completed Work Unit in the "Work" list as "Ready to Upload",..

INSTEAD, BOINC / SETI decides that aborting any TRANSFER should completely invalidate the whole Work Unit?!?!?

Does this suggest that once a WU is tagged for TRANSFER, it's "beyond" just being "completed", and hence it means that touching ANYthing in the TRANSFER screen is practically dangerous?

As far as I'm concerned, that's a waste of cycles. *grin*
What if I have a legit reason to either shutdown my PC or disconnect my internet or otherwise feel I should stop data moving in/out of my PC?

Someone PLEASE tell me why there is an ABORT button on the TRANSFER screen, if its sole purpose is to waste computing cycles by killing a completed Work Unit?
OR.. perhaps.. is this going to be amended in a future version?
(Or is it in 5.x, since I need to upgrade anyway from my 4.45?)

Thanks..

.Cx.


THANKS:
CJSoft - helped me clarify a few things.
Jim K - appreciated, though I knew where to look up my work units. ;)
Aurora B - I got that impression from the Tech Notes, but yeah, thanks. :]
Bill B - I hadn't bothered to look that deeply before, so thanks for that..! :)
ID: 210382 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 210393 - Posted: 11 Dec 2005, 16:05:17 UTC

How do you get your Messages to also list a "Reason"? Someone posted their messages showing a "Reason" line; I don't get any "Reason"s at all!

The messages tab should contain the same data as is posted to the log files. I hav seen some rare cases where this is not so, but, the most authoritive source of actual messages is the output log files. If you are interested make a copy and read them.

Most messages can be found in the WIki, but, as the BOINC Client Software has evolved, well, so do the messages. We do try to indicate versions where possible.

In the older system the logs contained the error number as -106 for example, in about 4.72 this was changed to "System I/O". The error codes are listed in the Wiki for those that are not printed as text.
ID: 210393 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 210419 - Posted: 11 Dec 2005, 16:26:18 UTC

As always Paul is right. If you hit abort button, it deletes the result and you have lost the credits. It should only be used as a last resort.
BTW I've personally never lost more than a few minutes on a WU just from manually shutting the down Boinc software.

I sympathize with you loss. But, it just a painful example that you should not try to micromanage the Boinc software. It is design to provide information, not as a tool for 'control freaks'. I apologize if this term does not apply to you, but this board seems to be filled with people that it does fit.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 210419 · Report as offensive
Bill Barto

Send message
Joined: 28 Jun 99
Posts: 864
Credit: 58,712,313
RAC: 91
United States
Message 210459 - Posted: 11 Dec 2005, 17:01:51 UTC - in response to Message 210419.  

As always Paul is right. If you hit abort button, it deletes the result and you have lost the credits. It should only be used as a last resort.
BTW I've personally never lost more than a few minutes on a WU just from manually shutting the down Boinc software.

I sympathize with you loss. But, it just a painful example that you should not try to micromanage the Boinc software. It is design to provide information, not as a tool for 'control freaks'. I apologize if this term does not apply to you, but this board seems to be filled with people that it does fit.


Paul?
ID: 210459 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 210469 - Posted: 11 Dec 2005, 17:14:15 UTC - in response to Message 210459.  

As always Paul is right. If you hit abort button, it deletes the result and you have lost the credits. It should only be used as a last resort.
BTW I've personally never lost more than a few minutes on a WU just from manually shutting the down Boinc software.

I sympathize with you loss. But, it just a painful example that you should not try to micromanage the Boinc software. It is design to provide information, not as a tool for 'control freaks'. I apologize if this term does not apply to you, but this board seems to be filled with people that it does fit.


Paul?

Sorry, still waking up, I was agreeing with your post.....

BUT, you have to admit PAUL is always right.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 210469 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 210484 - Posted: 11 Dec 2005, 17:22:52 UTC - in response to Message 210469.  

Sorry, still waking up

Same thing about that Beckley's server, which is probably a hidden commercial for Becks beer, right? ;)
ID: 210484 · Report as offensive
Bill Barto

Send message
Joined: 28 Jun 99
Posts: 864
Credit: 58,712,313
RAC: 91
United States
Message 210507 - Posted: 11 Dec 2005, 17:42:36 UTC - in response to Message 210469.  

As always Paul is right. If you hit abort button, it deletes the result and you have lost the credits. It should only be used as a last resort.
BTW I've personally never lost more than a few minutes on a WU just from manually shutting the down Boinc software.

I sympathize with you loss. But, it just a painful example that you should not try to micromanage the Boinc software. It is design to provide information, not as a tool for 'control freaks'. I apologize if this term does not apply to you, but this board seems to be filled with people that it does fit.


Paul?

Sorry, still waking up, I was agreeing with your post.....

BUT, you have to admit PAUL is always right.


Can't argue with that.
ID: 210507 · Report as offensive
Profile [Cx]
Avatar

Send message
Joined: 25 Jul 05
Posts: 141
Credit: 25,742
RAC: 0
United Kingdom
Message 210514 - Posted: 11 Dec 2005, 17:53:37 UTC - in response to Message 210419.  

As always Paul is right. If you hit abort button, it deletes the result and you have lost the credits. It should only be used as a last resort.
BTW I've personally never lost more than a few minutes on a WU just from manually shutting the down Boinc software.

I sympathize with you loss. But, it just a painful example that you should not try to micromanage the Boinc software. It is design to provide information, not as a tool for 'control freaks'. I apologize if this term does not apply to you, but this board seems to be filled with people that it does fit.



Thanks again, folks.

Paul, I've belatedly realized that the "Reason" message that doesn't show for me is most likely because I'm still running 4.45! So... sorry about that.


.. and yes, I know I'm guilty for the occasional micro-management, but you'll note I only did that _once_ and that was before I bothered to read Tech Notes to figure out why things weren't transferring correctly.

Though not a "control freak", I do tend to "fiddle" and poke around when something isn't going the way I'd assume it "normally" would... Oh well, what's a few credits / work hours? At least, for The Science, it's been covered by at least another 3 'puters... so, no major loss there, right? :)

Say... I'd be happy to close this thread now... Seems appropriate, no?

Thx again,

.Cx.

ID: 210514 · Report as offensive
Jack Gulley

Send message
Joined: 4 Mar 03
Posts: 423
Credit: 526,566
RAC: 0
United States
Message 210526 - Posted: 11 Dec 2005, 18:15:41 UTC - in response to Message 210419.  

If you hit abort button, it deletes the result and you have lost the credits.

Very poor user interface implementation. Obviously by someone that has never done and used serious user interface programming before. The button says "Abort Transfer" and implies ending the current transfer attempt. It does not say "Delete Result", which it should say based on this description of what it does. That panel needs two options to replace the miss labeled one. A "Suspend Transfer" and a "Delete Result" button that does what it says it does. Plus and "delete" of results or any file must have a second window that ask the user to confirm that they want to "remove that result" and never send it back for credit.

'control freak'

Someone who attempts to exert absolute control over someone else so that they become dependent on them and become incapable of independent decisions or breaking free from that control.

Hum.. now just who is being a control freak here. (BOINC?)
ID: 210526 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 210545 - Posted: 11 Dec 2005, 18:25:14 UTC - in response to Message 210526.  
Last modified: 11 Dec 2005, 18:26:45 UTC

{edited. I should read}
ID: 210545 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 210563 - Posted: 11 Dec 2005, 18:51:34 UTC - in response to Message 210526.  
Last modified: 11 Dec 2005, 18:55:15 UTC

If you hit abort button, it deletes the result and you have lost the credits.

Very poor user interface implementation. Obviously by someone that has never done and used serious user interface programming before. The button says "Abort Transfer" and implies ending the current transfer attempt. It does not say "Delete Result", which it should say based on this description of what it does. That panel needs two options to replace the miss labeled one. A "Suspend Transfer" and a "Delete Result" button that does what it says it does. Plus and "delete" of results or any file must have a second window that ask the user to confirm that they want to "remove that result" and never send it back for credit.

'control freak'

Someone who attempts to exert absolute control over someone else so that they become dependent on them and become incapable of independent decisions or breaking free from that control.

Hum.. now just who is being a control freak here. (BOINC?)


I agree this is definitely a bad choice of wording. I also, think you're right about an additional button and to clarifying the meaning. We must provide for the control freaks.

Being a charter member of 'Control Freaks Anonymous', I can see it others, even when it's in its early stages. I was merely raising a red flag.


Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 210563 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 210567 - Posted: 11 Dec 2005, 18:58:52 UTC - in response to Message 210382.  

So, whereas a more sensible assumption that:
Aborting a *TRANSFER* would keep a completed Work Unit in the "Work" list as "Ready to Upload",..

INSTEAD, BOINC / SETI decides that aborting any TRANSFER should completely invalidate the whole Work Unit?!?!?

The tooltip when you hover the Abort button says:
"Click 'Abort transfer' to delete the file from the transfer queue. This will prevent you from being granted credit for this result."
That's fairly clear, though perhaps a confirmation dialog after clicking the button would more likely be read. (Much as I hate those "Are you sure?" dialogs, there ARE cases where they are justified.)
Does this suggest that once a WU is tagged for TRANSFER, it's "beyond" just being "completed", and hence it means that touching ANYthing in the TRANSFER screen is practically dangerous?

I'd define it the other way; a result is not "completed" until it has been uploaded AND reported to the scheduler.
Someone PLEASE tell me why there is an ABORT button on the TRANSFER screen, if its sole purpose is to waste computing cycles by killing a completed Work Unit?

Here's one example: In the Test Project there have been several times when a block of setiathome_enhanced WUs had errors so they have been cancelled. That puts them in a state where uploaded results won't be validated, so an Abort makes sense.

The present situation suggests that addition of a Suspend/Resume button for Transfers would be a good idea. I suspect if some skilled coder submitted changes to the BOINC code to add that, it would be accepted. It's not just a trivial addition to BOINC Manager; BOINC itself has only the Retry and Abort actions for file transfers.
                                               Joe
ID: 210567 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 211367 - Posted: 12 Dec 2005, 12:07:32 UTC

Paul is not always right ...

I distinctly remember one mistake in January 1976. :)

Seriously guys, I don't have a lock on the truth. Wish I did ...
ID: 211367 · Report as offensive
Profile [Cx]
Avatar

Send message
Joined: 25 Jul 05
Posts: 141
Credit: 25,742
RAC: 0
United Kingdom
Message 212361 - Posted: 13 Dec 2005, 2:48:27 UTC

THANKS, EVERYONE...

.. for all your responses, comments, and input!

.. and thanks for not flaming the nearly-n00b!

. o O ( Ruh-roh... guess what you've just invited..! )

Thanks especially to the "regulars" for their extreme patience with, and tolerance of, The Clueless Wandering Sheep Who Ask (too many?) Questions.

Your kind assistance is greatly appreciated.

Regards,

.Cx.
ID: 212361 · Report as offensive
TPR_Mojo
Volunteer tester

Send message
Joined: 18 Apr 00
Posts: 323
Credit: 7,001,052
RAC: 0
United Kingdom
Message 212370 - Posted: 13 Dec 2005, 2:51:29 UTC - in response to Message 211367.  

Paul is not always right ...

I distinctly remember one mistake in January 1976. :)



tut-tut Mr Buck must try harder ;)

ID: 212370 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Can't find good ans. 4 LOSS of WUs..?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.