The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 107 · Next

AuthorMessage
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2038791 - Posted: 18 Mar 2020, 22:55:01 UTC - in response to Message 2038789.  
Last modified: 18 Mar 2020, 22:56:06 UTC

we may indeed go out with a bang!

Where is the kaboom? It's suppose to be a kaboom!

The new WU is flow without restrictions, the size of the DB WU count is >20MM and there are no problems t all.

Did they finally find the fix for the DB size problem? Just now when the curtains are ready to close?

I am not sure I fully agree with you as the replica database is 2116 seconds behind. On a bright note though it is good that work is flowing and I have even had a couple of BLC 35 _3 tasks passed through my system so resends are getting through hopefully we will start to see some bloat disappear soon. This will happen for sure when tasks are not been sent out anymore
ID: 2038791 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2038806 - Posted: 18 Mar 2020, 23:56:23 UTC - in response to Message 2038730.  

By SSP All unstarted tapes has been removed. The beginning of the end?


. . I had noticed that the eternally non-starting tapes (2 x Blc22, 2 x Blc34 plus 1 x Blc62) had been removed and thought ... Yay! But I had not noticed that ALL unstarted tapes had gone ...

. . I am guessing you are right, this is the beginning of the shut down ...

Stephen

? ?
(mixed feelings, mostly sad)
ID: 2038806 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2038807 - Posted: 18 Mar 2020, 23:58:01 UTC - in response to Message 2038782.  

I'll accept 16mr20af as automation, but I think 16se11ab must have been manually chosen.


. . +1

Stephen

. .
ID: 2038807 · Report as offensive     Reply Quote
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2038808 - Posted: 19 Mar 2020, 0:01:11 UTC - in response to Message 2038807.  

Clearing out the shelves just like toilet paper during a pandemic! :)
ID: 2038808 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2038809 - Posted: 19 Mar 2020, 0:01:55 UTC - in response to Message 2038789.  

we may indeed go out with a bang!

Where is the kaboom? It's suppose to be a kaboom!
The new WU is flow without restrictions, the size of the DB WU count is >20MM and there are no problems t all.
Did they finally find the fix for the DB size problem? Just now when the curtains are ready to close?


. . Of course, that would be according to Murphy's Law. Just like I have been waiting for a couple of years for the faster NBN connection to come and it arrived on Tuesday with just 2 weeks of SETI left ... mummble fritz whagaadang ....

Stephen

<throwing arms up in the air>
ID: 2038809 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2038810 - Posted: 19 Mar 2020, 0:04:39 UTC - in response to Message 2038791.  

I am not sure I fully agree with you as the replica database is 2116 seconds behind. On a bright note though it is good that work is flowing and I have even had a couple of BLC 35 _3 tasks passed through my system so resends are getting through hopefully we will start to see some bloat disappear soon. This will happen for sure when tasks are not been sent out anymore


. . it would happen faster if they would chop all task deadlines to the bone and force resends to machines that would actually complete them in the remaining time.

Stephen

? ?
ID: 2038810 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2038845 - Posted: 19 Mar 2020, 2:55:55 UTC - in response to Message 2038810.  

Yes I agree it certainly would have quicker Steven if deadlines were cut. It will be interesting to see what happens. It will be interesting to see how long the 1573 remaining channels of BLC of work take to process. I have no Intel but I am wondering whether or not they are doing this is a test run to see how long it takes to complete X number of channels then they will add X number to last us through until the end of the month. Simply speculation nothing more nothing less
ID: 2038845 · Report as offensive     Reply Quote
Profile doublechaz

Send message
Joined: 17 Nov 00
Posts: 90
Credit: 76,455,865
RAC: 735
United States
Message 2038854 - Posted: 19 Mar 2020, 5:28:07 UTC

Something just happened...

Uploading is working ATM, but reporting is stuck.
ID: 2038854 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13747
Credit: 208,696,464
RAC: 304
Australia
Message 2038856 - Posted: 19 Mar 2020, 5:49:06 UTC
Last modified: 19 Mar 2020, 5:58:33 UTC

Well, the Web site is back, but the forums i would describe as barely functional- over 5 minutes just to get to this point of being able to make a post (fingers crossed it'll go though and not vanish in to the ether).
And the Scheduler is still MIA.
Grant
Darwin NT
ID: 2038856 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2038857 - Posted: 19 Mar 2020, 6:00:18 UTC - in response to Message 2038856.  

Well, the Web site is back, but the forums i would describe as barely functional- over 5 minutes just to get to this point of being able to make a post (fingers crossed it'll go though and not vanish in to the ether).
And the Scheduler is still MIA.


+1
it is messed up and not working... in technical terms :-)
ID: 2038857 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13747
Credit: 208,696,464
RAC: 304
Australia
Message 2038859 - Posted: 19 Mar 2020, 6:04:38 UTC - in response to Message 2038857.  

it is messed up and not working... in technical terms :-)
Yep. It's borked.
Well and truly borked.
Grant
Darwin NT
ID: 2038859 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22225
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2038864 - Posted: 19 Mar 2020, 7:36:01 UTC

It looks as if the issues started about 4am (GMT), and have just got worse since then.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2038864 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2038865 - Posted: 19 Mar 2020, 8:11:18 UTC - in response to Message 2038864.  

It looks as if the issues started about 4am (GMT), and have just got worse since then.
My last successful scheduler request happened at 05:05:43 UTC and the last one that got new work at 04:58:37 UTC.
ID: 2038865 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22225
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2038867 - Posted: 19 Mar 2020, 8:35:54 UTC

There were a couple of very short failures around 04:00, but then the general failure kicked in about 90 minutes later - which ties in with your observations.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2038867 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2038869 - Posted: 19 Mar 2020, 8:56:47 UTC

SSP displays negative result creation rate for AstroPulse!

Current result creation rate **	0/sec	-0.0500/sec	0.2214/sec	3m
ID: 2038869 · Report as offensive     Reply Quote
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 2038871 - Posted: 19 Mar 2020, 9:09:32 UTC - in response to Message 2038869.  

I noticed that too.

Well, they can't have my AP's back 'til I'm done with them. ;^)
Member of the 20 Year Club



ID: 2038871 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2038872 - Posted: 19 Mar 2020, 9:09:45 UTC

Feels like we're going through the typical 'slow recovery from cold boot', that we've become familiar with at this time of day for the last three months.

No response from server (can't connect)
Bad response from server (internal error, http service unavailable)
Slow response from server - report only, small is better
Partial response from server - a few dribs and drabs of new work (I got nine tasks just now, on one machine only)

And so it progresses. Fingers crossed, we might be up and running in another couple of hours.
ID: 2038872 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19087
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2038873 - Posted: 19 Mar 2020, 9:11:49 UTC
Last modified: 19 Mar 2020, 9:13:05 UTC

My last good contact was,
19/03/2020 04:59:26 | SETI@home | Requesting new tasks for NVIDIA GPU
19/03/2020 04:59:29 | SETI@home | Scheduler request completed: got 0 new tasks
19/03/2020 04:59:29 | SETI@home | Project has no tasks available
19/03/2020 05:02:02 | SETI@home | Computation for task 06ap10ab.9042.9478.14.41.80_0 finished
19/03/2020 05:02:02 | SETI@home | Starting task 16mr20ae.21582.2112.3.30.61_0
19/03/2020 05:02:04 | SETI@home | Started upload of 06ap10ab.9042.9478.14.41.80_0_r319936499_0
19/03/2020 05:02:08 | SETI@home | Finished upload of 06ap10ab.9042.9478.14.41.80_0_r319936499_0
19/03/2020 05:04:34 | SETI@home | Sending scheduler request: To fetch work.
19/03/2020 05:04:34 | SETI@home | Reporting 1 completed tasks
19/03/2020 05:04:34 | SETI@home | Requesting new tasks for NVIDIA GPU
19/03/2020 05:04:56 | SETI@home | Scheduler request failed: Couldn't connect to server
19/03/2020 05:04:58 |  | Project communication failed: attempting access to reference site


But I have just got this, partial progress
19/03/2020 09:07:16 | SETI@home | Fetching scheduler list
19/03/2020 09:07:18 | SETI@home | Master file download succeeded
19/03/2020 09:07:23 | SETI@home | Sending scheduler request: To report completed tasks.
19/03/2020 09:07:23 | SETI@home | Reporting 62 completed tasks
19/03/2020 09:07:23 | SETI@home | Requesting new tasks for NVIDIA GPU
19/03/2020 09:07:27 | SETI@home | Scheduler request completed: got 0 new tasks
19/03/2020 09:07:27 | SETI@home | Project has no tasks available


Times are GMT.
ID: 2038873 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2038877 - Posted: 19 Mar 2020, 9:38:38 UTC

It's starting to wake up and smell the coffee.

19/03/2020 09:18:47 | SETI@home | Sending scheduler request: To report completed tasks.
19/03/2020 09:18:47 | SETI@home | Reporting 128 completed tasks
19/03/2020 09:18:47 | SETI@home | Requesting new tasks for NVIDIA GPU
19/03/2020 09:18:47 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
19/03/2020 09:18:47 | SETI@home | [sched_op] NVIDIA GPU work request: 74310.46 seconds; 0.00 devices
19/03/2020 09:18:56 | SETI@home | Scheduler request completed: got 61 new tasks
ID: 2038877 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13747
Credit: 208,696,464
RAC: 304
Australia
Message 2038880 - Posted: 19 Mar 2020, 9:42:03 UTC
Last modified: 19 Mar 2020, 10:03:33 UTC

Scheduler is awake and responding, but so far the response is "Project has no tasks available".
At least the forums are responsive again. No waiting 5 min for the thread to load, then another 5 min for the post to go through...


Edit- and of course posting about it gets results. One of my systems managed to pick up some work. But of course we're now in the "can't actually download anything" stage of the recovery. Everything's gone in to backoff mode.
A few retries later and most of them have managed to eventually download.


Edit- other system has picked up work, but no amount of retrying will get it to download (nor suspending and re-enabling network access).

Edit-
Seriously- the Scheduler & Download servers have to be tied in with the forum software somehow. Almost 15 min of retries, no joy. Post about it, work starts downloading 30 sec later.
Grant
Darwin NT
ID: 2038880 · Report as offensive     Reply Quote
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.