Panic Mode On (106) Server Problems?

Message boards : Number crunching : Panic Mode On (106) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 29 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1867186 - Posted: 14 May 2017, 0:56:45 UTC

. . @ anyone who can explain this.

. . I managed to ghost 100 WUs on my "Mi-Burrito" rig (running Linux/CUDA80) but I have managed to get the timing right to get the "alternative" method of recovering ghosted tasks to work. I have now used it 5 times with 4 of them successfully resulting in the resend of 20 "lost" tasks each time. But I now have 108 ghosted tasks. The first time I ran the recovery method successfully the ghost numbers dropped by the 20 I expected. But each time since the number of ghosts has increased byodd amounts. The first time it did I thought it to be simply a glitch. But the definition of scientific proof is that it must be repeatable. On the first run the number of ghosts decreased from 100 to 80, but on the second run it increased to 91, 3rd-99, 4th-108. I believe that is sufficient to indicate something is not right. I will not be attempting to recover them any further or I will have the biggest collection of ghosted units I have ever had.

Stephen

?????
ID: 1867186 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1867210 - Posted: 14 May 2017, 3:24:04 UTC - in response to Message 1867186.  

. . @ anyone who can explain this.

. . I managed to ghost 100 WUs on my "Mi-Burrito" rig (running Linux/CUDA80) but I have managed to get the timing right to get the "alternative" method of recovering ghosted tasks to work. I have now used it 5 times with 4 of them successfully resulting in the resend of 20 "lost" tasks each time. But I now have 108 ghosted tasks. The first time I ran the recovery method successfully the ghost numbers dropped by the 20 I expected. But each time since the number of ghosts has increased byodd amounts. The first time it did I thought it to be simply a glitch. But the definition of scientific proof is that it must be repeatable. On the first run the number of ghosts decreased from 100 to 80, but on the second run it increased to 91, 3rd-99, 4th-108. I believe that is sufficient to indicate something is not right. I will not be attempting to recover them any further or I will have the biggest collection of ghosted units I have ever had.

Stephen

?????

Chances are you didn't have "No new tasks" set before the initial scheduler request that you interrupted. In that case, the scheduler would be attempting to send you new tasks but couldn't, because the response was blocked. Hence, new ghosts get created. So,.......always be sure that "No new tasks" is set before each each request that you're going to interrupt.
ID: 1867210 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1867249 - Posted: 14 May 2017, 9:54:25 UTC - in response to Message 1867210.  

. . @ anyone who can explain this.

. . I managed to ghost 100 WUs on my "Mi-Burrito" rig (running Linux/CUDA80) but I have managed to get the timing right to get the "alternative" method of recovering ghosted tasks to work. I have now used it 5 times with 4 of them successfully resulting in the resend of 20 "lost" tasks each time. But I now have 108 ghosted tasks. The first time I ran the recovery method successfully the ghost numbers dropped by the 20 I expected. But each time since the number of ghosts has increased byodd amounts. The first time it did I thought it to be simply a glitch. But the definition of scientific proof is that it must be repeatable. On the first run the number of ghosts decreased from 100 to 80, but on the second run it increased to 91, 3rd-99, 4th-108. I believe that is sufficient to indicate something is not right. I will not be attempting to recover them any further or I will have the biggest collection of ghosted units I have ever had.

Stephen

?????

Chances are you didn't have "No new tasks" set before the initial scheduler request that you interrupted. In that case, the scheduler would be attempting to send you new tasks but couldn't, because the response was blocked. Hence, new ghosts get created. So,.......always be sure that "No new tasks" is set before each each request that you're going to interrupt.


. . Thanks for that reply ... I will now try again and make double sure that I have set NNT before the attempt. If I can recover them I will be delighted. I hate having ghosted tasks hangiong over my head (not to mention the inconvenience to the wingmen involved).

Stephen

:)
ID: 1867249 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1867381 - Posted: 15 May 2017, 1:34:01 UTC
Last modified: 15 May 2017, 1:35:27 UTC

You know, I still have dozens of Forgotten Pending Tasks from that fiasco back at the end of March. It's been Soooo long ago I can't remember the details, only that Everyone ended up with stranded tasks on those dates, mostly tasks that were pending before April XX. The exact cause is probably somewhere in the last 105 thread, it's been awhile. Mine are still there, just as Everyone 's, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=1040&state=2
It would be nice it those tasks were ever dealt with, one day soon.
ID: 1867381 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36876
Credit: 261,360,520
RAC: 489
Australia
Message 1867384 - Posted: 15 May 2017, 1:47:54 UTC
Last modified: 15 May 2017, 1:48:33 UTC

Every time the servers have been going M.I.A. lately, batches of pendings have gone into pending limbo, though they do eventually clear on there original report deadlines (6-8 weeks later).

Cheers.
ID: 1867384 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1867388 - Posted: 15 May 2017, 2:00:27 UTC

Forgotten about those. You're right ...... somewhere around the 1100 offset is a couple of days worth of pendings from both parties that haven't been cleared. Looks like most of them have deadlines in the last week of May. We will have to wait till then and hope the servers clear the deadwood.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1867388 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1867399 - Posted: 15 May 2017, 4:58:58 UTC - in response to Message 1867381.  

You know, I still have dozens of Forgotten Pending Tasks from that fiasco back at the end of March. It's been Soooo long ago I can't remember the details, only that Everyone ended up with stranded tasks on those dates, mostly tasks that were pending before April XX. The exact cause is probably somewhere in the last 105 thread, it's been awhile. Mine are still there, just as Everyone 's, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=1040&state=2
It would be nice it those tasks were ever dealt with, one day soon.


. . Actually, because they are all so old, if you run that ghost recovery procedure just once it should trigger them all being abandoned and then hopefully within a few days they will be gone completely.

Stephen

:)
ID: 1867399 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1867400 - Posted: 15 May 2017, 5:02:22 UTC - in response to Message 1867210.  


Chances are you didn't have "No new tasks" set before the initial scheduler request that you interrupted. In that case, the scheduler would be attempting to send you new tasks but couldn't, because the response was blocked. Hence, new ghosts get created. So,.......always be sure that "No new tasks" is set before each each request that you're going to interrupt.


. . That did the trick. It seems than in the kerfuffle of trying to juggle multiple windows in Linux I must have forgotten or somehow slipped up on setting NNT on those attempts that caused more ghosts than they recovered. They are now clearing and the balance is going down :)

. . Thanks again.

Stephen

:)
ID: 1867400 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1867402 - Posted: 15 May 2017, 5:31:31 UTC - in response to Message 1867399.  

You know, I still have dozens of Forgotten Pending Tasks from that fiasco back at the end of March. It's been Soooo long ago I can't remember the details, only that Everyone ended up with stranded tasks on those dates, mostly tasks that were pending before April XX. The exact cause is probably somewhere in the last 105 thread, it's been awhile. Mine are still there, just as Everyone 's, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=1040&state=2
It would be nice it those tasks were ever dealt with, one day soon.


. . Actually, because they are all so old, if you run that ghost recovery procedure just once it should trigger them all being abandoned and then hopefully within a few days they will be gone completely.

Stephen

:)
Actually, those are NOT Ghosts. Look at them again, they are Completed tasks waiting on the Validator. The Validator stopped working and when it was fixed it ignored All the Pending Completed tasks from before the fix. Everyone that had Pending tasks at that time were affected, Hundreds of Thousands of Stranded Completed tasks. We'll see if they are finally validated in a week or so when the Report deadlines start expiring. I'm not convinced they will be validated without outside intervention. We'll see soon.
ID: 1867402 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22541
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1867408 - Posted: 15 May 2017, 6:57:15 UTC

I assume you are talking about work units like this one:
https://setiathome.berkeley.edu/workunit.php?wuid=2491833482

Both initial parties completed their work in early April, and are status = "Completed, waiting for validation". Either someone will take pity on such Work Units and give the Validators the appropriate prod, or they will time out and be re-sent to others.
If the former route is taken then it's just a short delay in the progress of the project, if the latter route is taken then it becomes rather frustrating for the users in that we have expended power bills processing data that has been discarded and re-run.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1867408 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1867419 - Posted: 15 May 2017, 11:21:30 UTC - in response to Message 1867402.  
Last modified: 15 May 2017, 11:48:12 UTC

Actually, those are NOT Ghosts. Look at them again, they are Completed tasks waiting on the Validator. The Validator stopped working and when it was fixed it ignored All the Pending Completed tasks from before the fix. Everyone that had Pending tasks at that time were affected, Hundreds of Thousands of Stranded Completed tasks. We'll see if they are finally validated in a week or so when the Report deadlines start expiring. I'm not convinced they will be validated without outside intervention. We'll see soon.


. . Oops! Sorry about that chief!

. . I wonder how many of my pendings fall into that basket?

Stephen

[edit] I've had a peek and apparently several dozen, but still less than those pending because of delinquent wingmen. Hopefully when their deadlines arrive that will be handled like those with delinquent wingmen and a single new copy issued to corroborate the results. But it is a bit concerning.

??
ID: 1867419 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1867456 - Posted: 15 May 2017, 17:02:22 UTC

Anyone else seeing an increase in inconclusive validations on units from the last 4-5 days ?
Humans may rule the world...but bacteria run it...
ID: 1867456 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1867465 - Posted: 15 May 2017, 17:39:47 UTC - in response to Message 1867456.  

Anyone else seeing an increase in inconclusive validations on units from the last 4-5 days ?

I'm not seeing that on my 3 machines. My Inconclusives seem to always hover around 20-25 and never seem to deviate much above or below that count.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1867465 · Report as offensive
Mark Stevenson Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 8 Sep 11
Posts: 1736
Credit: 174,899,165
RAC: 91
United Kingdom
Message 1867467 - Posted: 15 May 2017, 17:44:09 UTC - in response to Message 1867456.  
Last modified: 15 May 2017, 17:45:09 UTC

Anyone else seeing an increase in inconclusive validations on units from the last 4-5 days ?


No more than usual , i always have some inconculsives but they end up validating , doubt there's much to worry about , just give them a few weeks and should be ok . If the inconclusives go to " Invalied " or " Error" then i start worrying somethings up with my computers , but a reboot sorts that 9 out of 10 times :-)
Life is what you make of it :-)

When i'm good i'm very good , but when i'm bad i'm shi#eloads better ;-) In't I " buttercups " p.m.s.l at authoritie !!;-)
ID: 1867467 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1867473 - Posted: 15 May 2017, 18:19:16 UTC - in response to Message 1867467.  

Tnx :-) I'm not worried about it, as in all cases at least one wingman has returned the same result. There are not many of them...I usually have about 10 that come and go, it is just that they've doubled in the last couple of days.
Humans may rule the world...but bacteria run it...
ID: 1867473 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1867528 - Posted: 15 May 2017, 23:14:53 UTC - in response to Message 1867473.  
Last modified: 15 May 2017, 23:15:40 UTC

Tnx :-) I'm not worried about it, as in all cases at least one wingman has returned the same result. There are not many of them...I usually have about 10 that come and go, it is just that they've doubled in the last couple of days.

Sometimes I'll see a spike in inconclusive tasks but it always turns out to a random host trashing tasks.
You might have been paired with a bad wingmate for several tasks.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1867528 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1867590 - Posted: 16 May 2017, 5:46:53 UTC - in response to Message 1867528.  
Last modified: 16 May 2017, 5:47:17 UTC

Tnx :-) I'm not worried about it, as in all cases at least one wingman has returned the same result. There are not many of them...I usually have about 10 that come and go, it is just that they've doubled in the last couple of days.

Sometimes I'll see a spike in inconclusive tasks but it always turns out to a random host trashing tasks.
You might have been paired with a bad wingmate for several tasks.

I've had that a few times in the past- something like 10-20 WUs were allocated to me and another machine, the other machine trashed every one and my inconclusives went through the roof for a day or 2.

At least my Pendings and Valids have stopped climbing & leveled off at their higher than usual levels.
Grant
Darwin NT
ID: 1867590 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1867604 - Posted: 17 May 2017, 1:48:41 UTC

and we are back..
ID: 1867604 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1867643 - Posted: 17 May 2017, 5:12:26 UTC - in response to Message 1867604.  

and we are back..

'Till the next random weirdness.
Grant
Darwin NT
ID: 1867643 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1867679 - Posted: 17 May 2017, 10:12:56 UTC

2018 Arecibo tapes seems never coming to an end..
ID: 1867679 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (106) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.