The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 98 · 99 · 100 · 101 · 102 · 103 · 104 . . . 107 · Next

AuthorMessage
Profile Cactus Bob
Avatar

Send message
Joined: 19 May 99
Posts: 209
Credit: 10,924,287
RAC: 29
Canada
Message 2048579 - Posted: 8 May 2020, 17:26:10 UTC

State: All (122) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (122) · Invalid (0) · Error (0)

A bunch of WU's got sent Apr 22 amd were completed Apr 23 so Valids still at 123

This is a bit like watching ice melt there is always that teeny bit that persists.
Sometimes I wonder, what happened to all the people I gave directions to?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SETI@home classic workunits 4,321
SETI@home classic CPU time 22,169 hours
ID: 2048579 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 2048583 - Posted: 8 May 2020, 19:14:19 UTC

Well there it is

State: All (700) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (700) · Invalid (0) · Error (0)

All task returned and validated. End of an era and I had such big plans for the WOW event, luckily I had put off spending any money just long enough.
ID: 2048583 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2048610 - Posted: 9 May 2020, 0:19:31 UTC - in response to Message 2048567.  

State: All (0) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (0) · Invalid (0) · Error (0)


. . That rather says it all :(

Stephen

:(
ID: 2048610 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2048612 - Posted: 9 May 2020, 1:02:52 UTC - in response to Message 2048610.  
Last modified: 9 May 2020, 1:03:25 UTC

9/5/20 11:00am AEST {9/5/20 1:00 am UTC}

State: All (2172) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (2155) · Invalid (10) · Error (7)

N.B. the 10 Invalids are 'cannot validate' or 'validation error' tasks with hordes of wingmen. The 7 Errors are all validated on other hosts.

Stephen

:(
ID: 2048612 · Report as offensive     Reply Quote
Profile doublechaz

Send message
Joined: 17 Nov 00
Posts: 90
Credit: 76,455,865
RAC: 735
United States
Message 2048633 - Posted: 9 May 2020, 4:36:23 UTC

Just returned my last unit. All zeros in my task list now.
ID: 2048633 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2048710 - Posted: 9 May 2020, 21:55:12 UTC - in response to Message 2048612.  

9/5/20 11:00am AEST {9/5/20 1:00 am UTC}

State: All (2172) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (2155) · Invalid (10) · Error (7)

N.B. the 10 Invalids are 'cannot validate' or 'validation error' tasks with hordes of wingmen. The 7 Errors are all validated on other hosts.

Stephen

:(

It's all out of your hands, You've done your part.
ID: 2048710 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 2048738 - Posted: 10 May 2020, 9:10:01 UTC


État: Tous (172) · En cours (0) · Validation en attente (0) · Validation non concluante (0) · Valide (171) · Invalide (1) · Erreur (0)


drop of about 45 tasks this night ^^
ID: 2048738 · Report as offensive     Reply Quote
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2048826 - Posted: 11 May 2020, 9:33:36 UTC

Greetings,

My stats are stagnant. They haven't moved for at least 2 weeks now:
State: All (331) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (329) · Invalid (2) · Error (0)

I've been watching the server stats gradually going down day by day, but mine remain the same. :|

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2048826 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2048829 - Posted: 11 May 2020, 11:04:05 UTC - in response to Message 2048826.  
Last modified: 11 May 2020, 11:18:55 UTC

The "Result turnaround time (last hour average) " has been greater than 1000 hours a few times that I have looked recently, That's > 6 weeks !
It's currently ~ 827h

I don't think that there is a Munin graph for that parameter, or for "Results waiting for db purging"

I'm guessing that there are probably <1% of the " Results out in the field" that are still needed.

Eric posted a couple of weeks ago on the SETI Facebook group that there were about 17k results needed. At that time it represented about 1.5% of the results in the field.

[Edit] This is the most recent comment that I have seen from Eric.
There are currently 17234 workunits that don't have a validated result. Once they are finished nothing more should go out. During normal operations, that would have been about 10 minutes worth. But a lot of people quit in March without returning their results. We've sent everything out again in an attempt to speed things up. I'd guess a few days more before those most of those results are back and validated.
ID: 2048829 · Report as offensive     Reply Quote
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 2048854 - Posted: 11 May 2020, 17:16:45 UTC - in response to Message 2048829.  
Last modified: 11 May 2020, 17:18:47 UTC

[Edit] This is the most recent comment that I have seen from Eric.
There are currently 17234 workunits that don't have a validated result. Once they are finished nothing more should go out. During normal operations, that would have been about 10 minutes worth. But a lot of people quit in March without returning their results. We've sent everything out again in an attempt to speed things up. I'd guess a few days more before those most of those results are back and validated.


Just minute ago Eric wrote in FB:
As far as the "waiting for validation" I think there's a bug in that code. When I do the query manually, I get a much smaller number.

We have 795 workunits waiting for a valid result.
ID: 2048854 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2048856 - Posted: 11 May 2020, 17:26:15 UTC - in response to Message 2048854.  

795 Workunits waiting for a valid Result out of 653,838 Results out in the field

That's approximately 0.12% of the outstanding Tasks that are needed !
ID: 2048856 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22188
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2048866 - Posted: 11 May 2020, 20:50:45 UTC

It's about time one or two folks with big caches of "work in progress" took a look at their tasks and abandoned the thousands for which the canonical result has already declared. Sitting on tasks for which the canonical result has already been declared does nothing for the science. I know there are a fair number of folks who have just stopped processing, or, like me, have data stuck on failed machines - there is little that can be done about them, but at least get rid of the excess where possible.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2048866 · Report as offensive     Reply Quote
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 2048879 - Posted: 11 May 2020, 22:18:37 UTC - in response to Message 2048866.  

OR, how about a server side abort?
ID: 2048879 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2048880 - Posted: 11 May 2020, 22:27:56 UTC - in response to Message 2048879.  

OR, how about a server side abort?

Abort all from the server side and resend the remaining 795 (or a little less now) in small batchs (like 5 WU per host max) with a very small deadline (like 3 days) all will be done very fast.
ID: 2048880 · Report as offensive     Reply Quote
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 2048881 - Posted: 11 May 2020, 22:35:08 UTC - in response to Message 2048880.  

Yep & all those that left would not receive any. Being a slower CPU only cruncher, I set NNT with the last few I had. Crunched them all by 10th April. Let all the fast hosts crunch, much quicker return.
ID: 2048881 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2048884 - Posted: 11 May 2020, 22:46:39 UTC

Aborting tasks now could case "Too Many errors" state.

If Eric wants the last few results quickly, he could probably make sure that all the Ghosts are re-sent if needed, and then Cancel Unstarted Tasks.

That would probably be more effective.
ID: 2048884 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2048885 - Posted: 11 May 2020, 22:48:58 UTC - in response to Message 2048884.  

Aborting tasks now could case "Too Many errors" state.

If Eric wants the last few results quickly, he could probably make sure that all the Ghosts are re-sent if needed, and then Cancel Unstarted Tasks.

That would probably be more effective.

But requires a lot more babysitting. Not believe Eric has time to cherry pick each WU.
ID: 2048885 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2048896 - Posted: 12 May 2020, 0:14:39 UTC - in response to Message 2048885.  

I think it would be 2 or 3 scripts, one of which has been done before

I'm not an expert on the BOINC server code, but I think it is basically switching an option on or off, then updating.

The load on the servers now is probably low enough to turn on the options that were previously off due to excessive load.
ID: 2048896 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22188
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2048934 - Posted: 12 May 2020, 12:33:29 UTC - in response to Message 2048884.  

Aborting tasks that have already been validated by others, and have a canonical result will not cause problems of work units dying due to too many errors. Once the canonical result has been declared any subsequent result returned is nothing but fluff.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2048934 · Report as offensive     Reply Quote
Profile doublechaz

Send message
Joined: 17 Nov 00
Posts: 90
Credit: 76,455,865
RAC: 735
United States
Message 2048946 - Posted: 12 May 2020, 14:10:39 UTC

I have recovered many tasks from failed machines.

1). Create or select a healthy machine.
2). Make sure it is signed up to the project.
3) Edit client_state.xml and change it's hostid to that of the failed machine.
4) Change it's rpc count to that of the failed machine.
5) Start the client.
6) Carry out the ghost recovery procedure.

You do have to make sure the base architectures match. You can't move ARM units to an Intel this way, but you can move Windows on Intel to Linux on AMD.

FWIW.
ID: 2048946 · Report as offensive     Reply Quote
Previous · 1 . . . 98 · 99 · 100 · 101 · 102 · 103 · 104 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.