Panic Mode On (114) Server Problems?

Message boards : Number crunching : Panic Mode On (114) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 45 · Next

AuthorMessage
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1979648 - Posted: 10 Feb 2019, 4:09:31 UTC

Wow. I can see there has been a ton of "fun" happening here while I was on vacation. I managed to download some WUs so I hope things will run well for a while.
ID: 1979648 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979649 - Posted: 10 Feb 2019, 4:41:29 UTC - in response to Message 1979633.  

[Edit] I must have a magnet. On the host that was in backoff for 35 minutes, when I reported 100 tasks and got 100 in return. 30 of them were unrecoverable download errors. This is getting tiresome.

I'm still getting a few, but it is just a few.
And now it's more spread between my systems, where initially one system was getting half of it's downloads being failures and the other would only get the occasional one.
Grant
Darwin NT
ID: 1979649 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1979650 - Posted: 10 Feb 2019, 4:43:55 UTC - in response to Message 1979633.  



My large quantity of failed tasks is just because of my high turnaround rate. I ask for more work than most so am likely to get more of the missing tasks.


Gave up on mine:-)

Downloaded a lot of Einstein WU's, set NNT on Einstein to stop it flooding me and adjusted the resource share so that I am running 2 GPU's on Einstein for the next couple of days.

I am still getting some of the missing tasks but not enough to cause the larger back-off's that are causing problems.

Bed time now.
Kevin


ID: 1979650 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1979651 - Posted: 10 Feb 2019, 4:48:19 UTC

not getting any GPU MB's only CPU MB's

Ed F
ID: 1979651 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979653 - Posted: 10 Feb 2019, 5:05:14 UTC - in response to Message 1979651.  

not getting any GPU MB's only CPU MB's

Ed F

It looks like one of your systems has 200 GPU Ghost WUs then.

Actually, it looks several of your systems have lots of Ghost WUs...
Grant
Darwin NT
ID: 1979653 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1979659 - Posted: 10 Feb 2019, 5:23:06 UTC - in response to Message 1979653.  
Last modified: 10 Feb 2019, 5:25:27 UTC

how do I see ghosts?? what do they look like??
ID: 1979659 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1979661 - Posted: 10 Feb 2019, 5:27:26 UTC - in response to Message 1979659.  

If you have more tasks in your "work in progress" than the project allotment of 100 cpu task per cpu and 100 task per gpu, then you have 'ghosts' So you should only have 200 tasks for a host with a cpu and one gpu for example.
You can use the 'ghost recovery protocol' to get rid of them by reclaiming 'lost tasks'

Ghost Recovery Protocol
. . As follows;

. . Set project to No New Tasks

. . Disable network access and wait for a group of completed tasks to accumulate (enough to give you time to run through this procedure, the faster your upload speed the more you will need)

. . Open windows to file transfer, event log and preferences

. . Re-enable the network access and monitor the uploads in the file transfer and event log windows. When the last file has uploaded and the acknowledgement has appeared in event log, but BEFORE the work has been reported disable the network access again. This timing is critical as is the first step of setting NNT. I have the option to disable network access set in advance so I only need to click OK.

. . Shut down Boinc and wait a short period.

. . Restart BOINC, set manager to allow New Tasks. All the completed tasks should show under the tasks tab as ready to report. Re-enable the network and watch. You should get 20 resent tasks (they will show in event log as a list of resends).

. . For large numbers of ghosts this will have to be repeated until all are recovered.

. . If you have no tasks to upload then I don't know how you can trigger the resends.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1979661 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1979665 - Posted: 10 Feb 2019, 5:49:23 UTC

I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received.
So I tracked a task and it appears they recovered the drive, or came up with a work around.
ID: 1979665 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1979669 - Posted: 10 Feb 2019, 6:04:54 UTC - in response to Message 1979661.  

Ghosts are not the problem here ... I've been getting this:

Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Sending scheduler request: To fetch work.
Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Reporting 2 completed tasks
Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Requesting new tasks for NVIDIA GPU
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Scheduler request completed: got 0 new tasks
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | No tasks sent
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | No tasks are available for SETI@home v8
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Tasks for Intel GPU are available, but your preferences are set to not accept them
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | This computer has finished a daily quota of 35 tasks
ID: 1979669 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979670 - Posted: 10 Feb 2019, 6:05:10 UTC - in response to Message 1979665.  

I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received.
So I tracked a task and it appears they recovered the drive, or came up with a work around.

That's good news.
It'll bring an early end to the Invalids & Errors.

Hopefully they can re-issue those that have already errored out.
Grant
Darwin NT
ID: 1979670 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979671 - Posted: 10 Feb 2019, 6:08:43 UTC - in response to Message 1979669.  

Ghosts are not the problem here ... I've been getting this:
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | This computer has finished a daily quota of 35 tasks

Ghosts will reduce your Daily quota as they time out, but the biggest issue at the moment is the loss of a whole bunch of WUs that are resulting in download errors.
However as you return work and it is validated, your Daily quota will increase significantly.
Grant
Darwin NT
ID: 1979671 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979672 - Posted: 10 Feb 2019, 6:12:32 UTC

Here we go- Web site & forums slow, Tasks lists not displaying, "Project has no tasks available".
That time of day again (hopefully).
Grant
Darwin NT
ID: 1979672 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1979673 - Posted: 10 Feb 2019, 6:14:15 UTC - in response to Message 1979661.  

Ghosts are not the problem here ... I've been getting this:

Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Sending scheduler request: To fetch work.
Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Reporting 2 completed tasks
Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Requesting new tasks for NVIDIA GPU
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Scheduler request completed: got 0 new tasks
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | No tasks sent
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | No tasks are available for SETI@home v8
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Tasks for Intel GPU are available, but your preferences are set to not accept them
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | This computer has finished a daily quota of 35 tasks


on this computer
and the same thing on another when asking for CPU and NVIDA it will down load the same number of WU's as CPU WUs uploaded but not replace the NVIDIA WUs

(This computer is epfubuntu, the other computer is epfubuntu-r)

Ed F

P.S. now project has no tasks available ... looks like along night ...
ID: 1979673 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1979681 - Posted: 10 Feb 2019, 8:29:18 UTC - in response to Message 1979560.  


My WAG (Wild Arse Guess) is that most of the outright Errors should be mostly done in around 48 hours.
However we can look forward to getting at least a few Invalids for several months to come as those that are Pending get re-issued to another host when the original wingmate doesn't return the result for the WU they were able to download before this issue occurred (which will of course result in another batch of Errors due to the WU not being available for download for each of the WUs this occurs on).
Something to keep in mind for a while.

Looks like your WAG is correct, Got 4 more yesterday, but none during the last 11 hours.
Also a couple of Invalids so far...again, a correct prediction :)
Humans may rule the world...but bacteria run it...
ID: 1979681 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979683 - Posted: 10 Feb 2019, 8:44:43 UTC - in response to Message 1979681.  

Looks like your WAG is correct, Got 4 more yesterday, but none during the last 11 hours.
Also a couple of Invalids so far...again, a correct prediction :)

At least back then.

Eric has posted in the News thread they got the problem system back up & online, but it will take a while for everything to re-sync. So apart from a few more errors while things re-sync, this should be the worst of messiness done with and other the the odd one here & there things should be pretty much over (for that particular issue).
Grant
Darwin NT
ID: 1979683 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979686 - Posted: 10 Feb 2019, 9:29:04 UTC - in response to Message 1979665.  

I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received.
So I tracked a task and it appears they recovered the drive, or came up with a work around.


. . Isn't that one feature of RAID? Being able to rebuild the lost data when one drive fails?

Stephen

? ?
ID: 1979686 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979687 - Posted: 10 Feb 2019, 9:32:57 UTC - in response to Message 1979669.  
Last modified: 10 Feb 2019, 9:34:33 UTC

Ghosts are not the problem here ... I've been getting this:

Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | Scheduler request completed: got 0 new tasks
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | No tasks sent
Sun 10 Feb 2019 01:00:55 AM EST | SETI@home | This computer has finished a daily quota of 35 tasks


. . You get that message when a large number of tasks have failed, such as errored out, been aborted etc.

. . If you still have tasks to process I would recommend you try the process Keith outlined to you. If it shows as no ghosts then just wait for the quota to reach it's target and you will get new work as normal, should take less than a day if no further problems. If you do get resends repeat the process at intervals until there are no more ghosts.

Stephen

. .
ID: 1979687 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979692 - Posted: 10 Feb 2019, 9:53:23 UTC - in response to Message 1979686.  

I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received.
So I tracked a task and it appears they recovered the drive, or came up with a work around.


. . Isn't that one feature of RAID? Being able to rebuild the lost data when one drive fails?

One or two (Raid 5 or 6). But when the entire storage server dies you're up the creek without a paddle if it stays dead.
Grant
Darwin NT
ID: 1979692 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1979768 - Posted: 10 Feb 2019, 21:34:02 UTC

Back to Project has NO Tasks again...
Hosts down by hundreds of tasks...
Another day, another Panic.
ID: 1979768 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1979774 - Posted: 10 Feb 2019, 21:50:56 UTC
Last modified: 10 Feb 2019, 21:58:14 UTC

They started to reduce the backlog of wu and result deletions again when they fixed Georgem. Anytime the deleters crank up you can't get any work. Down to a less than a dozen gpu tasks now on one host.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1979774 · Report as offensive
Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 45 · Next

Message boards : Number crunching : Panic Mode On (114) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.