Panic Mode On (114) Server Problems?

Message boards : Number crunching : Panic Mode On (114) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 35 · 36 · 37 · 38 · 39 · 40 · 41 . . . 45 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979447 - Posted: 8 Feb 2019, 23:01:51 UTC - in response to Message 1979367.  
Last modified: 8 Feb 2019, 23:02:12 UTC

Since then I've thought of a workround, tested it, and got two empty components of my central heating back into action. Yay!
But I'm not going to share it now, because:

1) Jeff is already on the case
2) I got a significant number of errors on the downloads
3) The scheduler is now down for maintenance.


. . FWIW ... I, not having used your workaround, have also received a large number of download errors since they resumed this am. I suspect that is a timeout issue for many of them since they have been in abeyance for a long time.

Stephen

:)
ID: 1979447 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1979448 - Posted: 8 Feb 2019, 23:04:09 UTC - in response to Message 1979447.  

. . FWIW ... I, not having used your workaround, have also received a large number of download errors since they resumed this am. I suspect that is a timeout issue for many of them since they have been in abeyance for a long time.

Nope, it's because of the storage failure as posted in the news thread. The WUs are no longer there to download, hence the download error.
Grant
Darwin NT
ID: 1979448 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1979451 - Posted: 8 Feb 2019, 23:10:11 UTC

Unfortunately it seems we are going to have to go through all 10 retries to download each of those bad WUs.
All my errors are resends ...
ID: 1979451 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979452 - Posted: 8 Feb 2019, 23:10:36 UTC - in response to Message 1979392.  

Well, if anybody would like to contribute a data center budget to the Seti project, I for sure would like to hear about it.
Upgrades could happen very quickly.
Meow.

. . I'll let you now when I win the lottery :)

Stephen

. . PS some would consider your farm virtually a data centre of itself ... :)

:)
ID: 1979452 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979453 - Posted: 8 Feb 2019, 23:19:00 UTC - in response to Message 1979399.  

I guess SETI has become an addiction for some..
I wonder if there is a 12 step program to cure that?
(buy Porsches instead) :-)

Keep it up. If you must.
You will notice, if you look a bit, that I am running much less hardware than I used to.
Simply due to the cost.
I still will reach a billion credits in the not too distant future.
But my RAC is not what it used to be.
And you would still lead others to believe that productive users contribute to problems with the servers?
I don't think many are listening to your arguments anymore.
If it still pleases you to continue, have at it.
Most of us know your logic falls very short of the truth.
As you were, carry on.
Meow.


. . Firstly ... I smiled, it is indeed an addiction but only to certain personalities of which I am one. :)

. . However my SETI budget could not buy even a 10 yo Porsche and run it too :)

. . Now to Kittyman ... nice restraint, proud of you dude :)

Stephen

:)
ID: 1979453 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24881
Credit: 3,081,182
RAC: 7
Ireland
Message 1979456 - Posted: 8 Feb 2019, 23:25:04 UTC - in response to Message 1979452.  

Well, if anybody would like to contribute a data center budget to the Seti project, I for sure would like to hear about it.
Upgrades could happen very quickly.
Meow.

. . I'll let you now when I win the lottery :)
I often do, but a Data Center win? I wish.
ID: 1979456 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979457 - Posted: 8 Feb 2019, 23:25:23 UTC - in response to Message 1979411.  

Hey Mark - only 180 days to go ;-)

I'll send you a coffee voucher when you get there


. . Maybe we should all fly over to the US and throw a party :)

. . On second thoughts the cost of the ticket would buy a 2080ti and I could dedicate it to his efforts :)

Stephen

:)
ID: 1979457 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979460 - Posted: 8 Feb 2019, 23:27:58 UTC - in response to Message 1979413.  

I'm a 2013 Chevy Volt kind of guy.
Meow.


ok since we are comparing sizes ... 2008 Toyota Aurion here :) Nice car actually :)

Stephen

:)
ID: 1979460 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979461 - Posted: 8 Feb 2019, 23:33:10 UTC - in response to Message 1979433.  
Last modified: 8 Feb 2019, 23:38:40 UTC

Lots of "Permanent HTTP errors" and "unrecoverable errors" for queued download tasks. Those are the lost 15% of tasks.

The vast majority of mine were AP WUs.
:-(


. . Did we lose the actual Tasks or just the WU downloads? I believe the latter is the case so they only need to be re-copied and resent.

. . I am hoping.

. . Thanks Grant for the clarification ..... :( :( :(

Stephen

??
ID: 1979461 · Report as offensive
woohoo
Volunteer tester

Send message
Joined: 30 Oct 13
Posts: 972
Credit: 165,671,404
RAC: 5
United States
Message 1979462 - Posted: 8 Feb 2019, 23:34:01 UTC

I have a Porsche but not a Porsche RAC
ID: 1979462 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979463 - Posted: 8 Feb 2019, 23:36:00 UTC - in response to Message 1979448.  

. . FWIW ... I, not having used your workaround, have also received a large number of download errors since they resumed this am. I suspect that is a timeout issue for many of them since they have been in abeyance for a long time.

Nope, it's because of the storage failure as posted in the news thread. The WUs are no longer there to download, hence the download error.


. . To quote Hardy Har Har,

. . Oh dear, oh dear, oh dear! :(

Stephen

:)
ID: 1979463 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1979464 - Posted: 8 Feb 2019, 23:39:20 UTC - in response to Message 1979461.  

. . Did we lose the actual Tasks or just the WU downloads? I believe the latter is the case so they only need to be re-copied and resent.

. . I am hoping.


No just the WU downloads. All my download error tasks have already gone out to two new wingmen.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1979464 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1979468 - Posted: 8 Feb 2019, 23:49:08 UTC - in response to Message 1979464.  
Last modified: 8 Feb 2019, 23:52:43 UTC

All my download error tasks have already gone out to two new wingmen.

Many of the re-sends I've picked up have also failed as the WUs still aren't there to be downloaded.
I think it's going to take them a while to sort this mess out.

Edit-
Here's an example.
blc21_2bit_guppi_58406_31708_HIP20440_0117.31291.0.21.44.25.vlar
Grant
Darwin NT
ID: 1979468 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1979471 - Posted: 8 Feb 2019, 23:57:58 UTC
Last modified: 8 Feb 2019, 23:59:42 UTC

I'm thinking this will affect the faster systems more than the slower ones.

Slow systems will have a full cache, so it could be days before they get to any WUs that have been resent. For the faster systems, they'll be getting all the resends- and since so little new work is being generated at the moment most of the work going out will be these resends, until they reach their error limit & are declared duds & no are longer resent.
As fast as Validated work reduces the Error count, the resends with no WU to actually download are just going to keep cranking it back up.
My error count continues to increase each time I check my tasks.


Given that these WUs are now erroring out as soon as the Manager attempts to download them, it should be mostly over in just a few hours.
Grant
Darwin NT
ID: 1979471 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1979472 - Posted: 8 Feb 2019, 23:58:33 UTC - in response to Message 1979468.  

Yes, a good dozen or so tasks in each new download are unrecoverable. So that forces the host into backoff timeouts. Until the lost tasks are cleared from the database, you will have to stay on top of your hosts to keep them reporting normally with manual updates.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1979472 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1979474 - Posted: 9 Feb 2019, 0:04:57 UTC - in response to Message 1979468.  

All my download error tasks have already gone out to two new wingmen.

Many of the re-sends I've picked up have also failed as the WUs still aren't there to be downloaded.
I think it's going to take them a while to sort this mess out.

Edit-
Here's an example.
blc21_2bit_guppi_58406_31708_HIP20440_0117.31291.0.21.44.25.vlar


. . OK, despite the errors there are 2 systems reporting a successful download and the task is "in progress". The second of which was the final downloaad attempt which says to me that the task is still in the system on that one ...

. . But from the posted news we have indeed lost some tasks completely.

Stephen

??
ID: 1979474 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1979480 - Posted: 9 Feb 2019, 0:42:56 UTC - in response to Message 1979472.  
Last modified: 9 Feb 2019, 0:44:41 UTC

Yes, a good dozen or so tasks in each new download are unrecoverable. So that forces the host into backoff timeouts. Until the lost tasks are cleared from the database, you will have to stay on top of your hosts to keep them reporting normally with manual updates.

If you manually abort the failed DL, and update, the host return to ask for new Work, some of them are good and some back to DL failed.
ID: 1979480 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1979484 - Posted: 9 Feb 2019, 0:50:15 UTC - in response to Message 1979474.  

. . OK, despite the errors there are 2 systems reporting a successful download and the task is "in progress".

In progress doesn't mean there's been a successful download, it just means it's been allocated to the host and there hasn't been a result or an error returned yet. It could be the download errored out, but if there were previous download errors, this one hasn't been reported yet due to the backoffs (one of the systems hasn't contacted the Scheduler in over 6 hours). Or they could be ghosts. Or it could be a successful download (but unlikely given all the other download failures).
Time will tell.

. . But from the posted news we have indeed lost some tasks completely.

Or they could run a query on the Database & any results that resulted in a WU erroring out completely on the 8th, 9th or 10th of Feb UTC could be re-processed.
The WUs can still be processed, but as I said in a previous post, it's going to take a while for them to clean up the mess.
Grant
Darwin NT
ID: 1979484 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1979485 - Posted: 9 Feb 2019, 0:51:06 UTC - in response to Message 1979480.  

There aren't any failed download tasks to abort. All you see is the message in the Event Log of giving up on the task.

Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 12dc06af.32725.1295.12.39.192.vlar: permanent HTTP error
Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 12dc06af.6741.172935.15.42.62.vlar: permanent HTTP error
Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 12dc06af.11999.1704.14.41.109: permanent HTTP error
Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 02no11ab.8915.11110.7.34.154: permanent HTTP error

But you then get forced into 24 hour backoff. You have to wait out another five minutes to again contact the schedulers and hope you don't get more lost tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1979485 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1979487 - Posted: 9 Feb 2019, 0:53:23 UTC - in response to Message 1979437.  

Looks like there's still a download issue- takes a minute or so for downloads to actually start downloading, occasionally pausing while downloading, and some barely cracking the 1kB/s mark by the time they finish.
But at least if you get allocated some work, it will eventually download.

Whatever the download issue was, it seems to have sorted itself out now. When I can get work, downloads are back around 200+kB/s again.
Grant
Darwin NT
ID: 1979487 · Report as offensive
Previous · 1 . . . 35 · 36 · 37 · 38 · 39 · 40 · 41 . . . 45 · Next

Message boards : Number crunching : Panic Mode On (114) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.