WTH just happened?

Message boards : Number crunching : WTH just happened?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1619616 - Posted: 28 Dec 2014, 4:01:20 UTC

restarted the machine and then this

12/27/2014 10:51:37 PM | SETI@home | Didn't resend lost task 06my12ai.21138.8349.438086664203.12.62_0 (expired)

how can they expire in 3 minutes ?

bummer
I came down with a bad case of i don't give a crap
ID: 1619616 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1619644 - Posted: 28 Dec 2014, 5:40:26 UTC

Reminds me of ghost work units we used to get a lot of back in the old days.
[/quote]

Old James
ID: 1619644 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1619667 - Posted: 28 Dec 2014, 6:49:25 UTC - in response to Message 1619616.  
Last modified: 28 Dec 2014, 6:50:25 UTC

restarted the machine and then this

12/27/2014 10:51:37 PM | SETI@home | Didn't resend lost task 06my12ai.21138.8349.438086664203.12.62_0 (expired)

how can they expire in 3 minutes ?

bummer

If the resend lost work feature cannot resend a task, it kills it by marking it as past deadline (so another replication will be created and go to another host ASAP). That's the expired part. Figuring out why those 117 tasks were lost and were not resent is more difficult, you haven't given enough details.
                                                                   Joe
ID: 1619667 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1619729 - Posted: 28 Dec 2014, 10:52:53 UTC - in response to Message 1619667.  
Last modified: 28 Dec 2014, 10:53:07 UTC

restarted the machine and then this

12/27/2014 10:51:37 PM | SETI@home | Didn't resend lost task 06my12ai.21138.8349.438086664203.12.62_0 (expired)

how can they expire in 3 minutes ?

bummer

If the resend lost work feature cannot resend a task, it kills it by marking it as past deadline (so another replication will be created and go to another host ASAP). That's the expired part. Figuring out why those 117 tasks were lost and were not resent is more difficult, you haven't given enough details.
                                                                   Joe

Why the tasks weren't resent was because none of the app_versions have any 'Number of tasks completed', So no APR yet,
until they have their 11 validations tasks tend to get expired rather than resent,

Why they got lost in the first place is the real question.

Claggy
ID: 1619729 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1619791 - Posted: 28 Dec 2014, 15:10:53 UTC

Well i have taken that machine offline until i can figure out what is causing the issue with it

sry about the busted WU's
I came down with a bad case of i don't give a crap
ID: 1619791 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1619795 - Posted: 28 Dec 2014, 15:32:31 UTC - in response to Message 1619791.  
Last modified: 28 Dec 2014, 15:32:51 UTC

Well i have taken that machine offline until i can figure out what is causing the issue with it

sry about the busted WU's

What does the logs say? Did it have some failed scheduler contacts at the time those Wu's were attempted to be sent?

Claggy
ID: 1619795 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1619797 - Posted: 28 Dec 2014, 15:39:46 UTC
Last modified: 28 Dec 2014, 15:40:58 UTC

Well it seems nothing was written in the logs so i assume there is something wrong with the boinc install

but looking at my windows machine right now it seems it cannot download any WU's either , seems the download servers need their pipes roto rootered

judging by that all my cuda WU's are processed and uploaded and the regular CPU WU's are getting low it couldn't make a connection for quiet a while

there is a gazillion WU's waiting for download
I came down with a bad case of i don't give a crap
ID: 1619797 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1619813 - Posted: 28 Dec 2014, 16:42:05 UTC - in response to Message 1619797.  

Well it seems nothing was written in the logs so i assume there is something wrong with the boinc install

There's unlikely to be anything wrong with the boinc install.

Please post the Event Log from about 28 Dec 2014, 2:30:00 UTC, to 5:00:00 UTC,
If it doesn't show in the Event Log now, you should be able to find that time peroid in the stdoutdae.txt or stdoutdae.old files in the /var/lib/boinc-client directory (assuming that is the Data directory)

Claggy
ID: 1619813 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1619842 - Posted: 28 Dec 2014, 18:58:54 UTC

already nuked the machine from orbit (only way to make sure...) and re installed

will be going up in about an hour then i will see what happens
I came down with a bad case of i don't give a crap
ID: 1619842 · Report as offensive

Message boards : Number crunching : WTH just happened?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.