The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 94 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2024849 - Posted: 25 Dec 2019, 3:10:35 UTC - in response to Message 2024843.  

my Anonymous platform host just got 14 new tasks :)

Still either timing out on the request and backing off or 0 tasks received.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2024849 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 2024851 - Posted: 25 Dec 2019, 3:21:27 UTC - in response to Message 2024849.  

my Anonymous platform host just got 14 new tasks :)

Still either timing out on the request and backing off or 0 tasks received.

Ditto
ID: 2024851 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13771
Credit: 208,696,464
RAC: 304
Australia
Message 2024854 - Posted: 25 Dec 2019, 3:36:25 UTC
Last modified: 25 Dec 2019, 3:39:56 UTC

Server status is back to green, and making regular Scheduler contact here, but "Project has no tasks available" is the only response so far.
But at least the responses are coming within 2-3 seconds.

Given the effective length of this outage, and the fact the system has struggled to recover after much shorter outages, it could take a couple of days for things to fully recover.


Edit-
Spoke too soon. Just had a 40 second wait for a Scheduler response. Not a good sign.
Grant
Darwin NT
ID: 2024854 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13771
Credit: 208,696,464
RAC: 304
Australia
Message 2024857 - Posted: 25 Dec 2019, 3:55:44 UTC
Last modified: 25 Dec 2019, 3:57:52 UTC

Now back to how things were with the previous Scheduler version- extended Scheduler response times, with occasional errors instead of a valid response.


It would be rather disappointing if they've gone to all the effort to revert the Scheduler, and it doesn't fix the problem (as the bug that stops Anonymous platforms from getting work could have been in this Scheduler version as well, it's only the recent issue resulting in extended Scheduler responses that has made it more apparent).

Any Stock hosts getting work? Is Resend lost tasks still on?
Grant
Darwin NT
ID: 2024857 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024858 - Posted: 25 Dec 2019, 4:03:28 UTC - in response to Message 2024857.  

Now back to how things were with the previous Scheduler version- extended Scheduler response times, with occasional errors instead of a valid response.


It would be rather disappointing if they've gone to all the effort to revert the Scheduler, and it doesn't fix the problem (as the bug that stops Anonymous platforms from getting work could have been in this Scheduler version as well, it's only the recent issue resulting in extended Scheduler responses that has made it more apparent).

Any Stock hosts getting work? Is Resend lost tasks still on?


the bug seems fixed since I have received new work on an Anonymous Platform host.

https://setiathome.berkeley.edu/results.php?hostid=8796013

check the timestamps of the most recent tasks.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024858 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65871
Credit: 55,293,173
RAC: 49
United States
Message 2024859 - Posted: 25 Dec 2019, 4:10:09 UTC - in response to Message 2024851.  

my Anonymous platform host just got 14 new tasks :)

Still either timing out on the request and backing off or 0 tasks received.

Ditto

More ditto here.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 2024859 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13771
Credit: 208,696,464
RAC: 304
Australia
Message 2024860 - Posted: 25 Dec 2019, 4:13:14 UTC - in response to Message 2024858.  
Last modified: 25 Dec 2019, 4:22:53 UTC

the bug seems fixed since I have received new work on an Anonymous Platform host.
Well that's good.
Now they just need to fix the underlying issue- WTF is causing the slow Scheduler responses resulting in "Project has no tasks available" messages, or a Scheduler error?
25/12/2019 13:11:35 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
25/12/2019 13:13:18 | SETI@home | Scheduler request failed: HTTP service unavailable
25/12/2019 13:14:24 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
25/12/2019 13:15:18 | SETI@home | Project has no tasks available
25/12/2019 13:24:25 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
25/12/2019 13:24:47 | SETI@home | Scheduler request failed: Couldn't connect to server
25/12/2019 13:26:03 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
25/12/2019 13:27:39 | SETI@home | Scheduler request failed: HTTP service unavailable
25/12/2019 13:30:35 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
25/12/2019 13:31:30 | SETI@home | Project has no tasks available


Every so often there is 2-3 second response of "Project has no tasks available." These are probably valid responses, the Feeder being out of work at the time. As for the others, the Feeder probably has work, but by the time the Scheduler gets around to respond the work has gone & then it's a "Project has no tasks available" message, or the Scheduler just errors out before even getting that far.


I think most of us will be waiting till after Christmas before we see much if any work.
Time for Eric & Co to take a break & come at it fresh after a rest.
Grant
Darwin NT
ID: 2024860 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2024862 - Posted: 25 Dec 2019, 4:50:38 UTC - in response to Message 2024858.  

Any Stock hosts getting work? Is Resend lost tasks still on?
I have tried to get work for both anonymous and stock for an hour or two now without success. Reporting completed jobs works as long as I don't try to report more than 50 at a time. 50 Takes about a minute and trying more results in a timeout. Requesting work without reporting gives 'Project has no tasks available' repeatedly and consistently and this response comes in a couple of seconds.

Server status page says the 'Ready to send' buffer is so full that the splitters are being throttled to almost zero. So it looks like very few people are getting anything.

Resend was never on at least for me. My hosts have accumulated hundreds of ghosts during this disaster and those ghosts have stayed as ghosts. I even tried the manual ghost recovery ritual without success! I guess some people have observed resends because the manual 'ritual' is using an interrupted scheduler request, so these timeouts may trigger it spontaneously. So the slow server response is causing the resends, not the other way around.
ID: 2024862 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13771
Credit: 208,696,464
RAC: 304
Australia
Message 2024864 - Posted: 25 Dec 2019, 4:54:49 UTC - in response to Message 2024862.  

I guess some people have observed resends because the manual 'ritual' is using an interrupted scheduler request, so these timeouts may trigger it spontaneously. So the slow server response is causing the resends, not the other way around.
Ah, interesting observation. A very plausible possibility.
Although if it were on that would be better, as turning it off would then fix the Scheduler delays. If it's not on, then something else is at fault.
Grant
Darwin NT
ID: 2024864 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13771
Credit: 208,696,464
RAC: 304
Australia
Message 2024865 - Posted: 25 Dec 2019, 4:57:32 UTC - in response to Message 2024863.  

Reinstalled Lunatics, and now getting tasks.
Somehow I managed to pickup 4 on one of my systems. Should keep it amused for a little while.
Still, at least it means work is possible (as is wining the lottery, just not very likely).
Grant
Darwin NT
ID: 2024865 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13771
Credit: 208,696,464
RAC: 304
Australia
Message 2024868 - Posted: 25 Dec 2019, 5:13:28 UTC - in response to Message 2024867.  
Last modified: 25 Dec 2019, 5:13:46 UTC

No joy! Back to "Project has no tasks available"
Yep, but the response times are back to 3-4 sec & no Scheduler errors. Though it would be nice if we could get quick response times and work.
Grant
Darwin NT
ID: 2024868 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65871
Credit: 55,293,173
RAC: 49
United States
Message 2024869 - Posted: 25 Dec 2019, 5:36:07 UTC

No joy here on an update, I've had better luck elsewhere today.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 2024869 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 2024875 - Posted: 25 Dec 2019, 6:05:39 UTC

30 SoGs on the Win box, nothing on either Linux box.
ID: 2024875 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13771
Credit: 208,696,464
RAC: 304
Australia
Message 2024884 - Posted: 25 Dec 2019, 7:25:52 UTC - in response to Message 2024875.  

30 SoGs on the Win box, nothing on either Linux box.
Dribs & drabs on both. Getting some work, maybe every 40-60min or so?
Grant
Darwin NT
ID: 2024884 · Report as offensive
Profile NorthCup

Send message
Joined: 6 Jun 99
Posts: 108
Credit: 50,093,984
RAC: 5
Germany
Message 2024886 - Posted: 25 Dec 2019, 8:03:56 UTC - in response to Message 2024884.  

+1 - it feels good - thanks to the seti-staff
ID: 2024886 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 2024888 - Posted: 25 Dec 2019, 8:14:51 UTC - in response to Message 2024886.  

+1 - it feels good - thanks to the seti-staff

+1
ID: 2024888 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 2024889 - Posted: 25 Dec 2019, 8:23:32 UTC
Last modified: 25 Dec 2019, 8:24:01 UTC

Both stock Windows machines getting work slowly, downloads stalling.

Will wait a day or two before starting up the Linux machine
ID: 2024889 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2024892 - Posted: 25 Dec 2019, 8:32:35 UTC - in response to Message 2024814.  
Last modified: 25 Dec 2019, 8:33:22 UTC

I don't know where you're getting the idea that returning to stock was a waste of time. I did it reasonably soon after the problem was identified and after a few attempts to get tasks and get back to crunching, all has gone as one would expect. There was a drop off in performance as the system had to go through the questionable decision which are the best apps. Once that was decided as being the "SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG)" for my GPU, I admit I aborted the remaining CUDA tasks and it has run the SoG app without breaks ever since. My cache is full and performance only down by ~15% as judged by RAC. And some of that might be because the computer has only grabbed one AP task in this period. This for Windows, obviously special sauce Linux performance is a different matter.


. . When I returned the Windows rig to stock I received work AOK, but when I set two Linux boxes to "stock" that stopped and I was not getting any work on any machine, all just got 'no tasks available'.

. . But strangely (perhaps purely coincidence) when I later turned off the 2 Linux machines the Windows box started getting work again. Go figure. And, now, after days of having my rigs shut down and an outage that was supposed to restore everything, I am again getting "no tasks" on the one rig still running, the Windows box.

. . I am starting to wonder if it's time to find something else to do ... :(

Stephen

:(
ID: 2024892 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2024893 - Posted: 25 Dec 2019, 8:35:31 UTC

Starting to get dribs and drabs of work. But all downloads are stalling out and backing off 5 hours.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2024893 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2024894 - Posted: 25 Dec 2019, 8:39:30 UTC - in response to Message 2024858.  

the bug seems fixed since I have received new work on an Anonymous Platform host.
https://setiathome.berkeley.edu/results.php?hostid=8796013
ok the timestamps of the most recent tasks.


. . The difference a few hours makes, I just checked that link and see a machine with 0 (zero) tasks in progress, so I am saying "NOT FIXED"

Stephen

:(
ID: 2024894 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.