The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · Next

AuthorMessage
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2024216 - Posted: 21 Dec 2019, 21:57:12 UTC - in response to Message 2024215.  

Look at the 'Application' column in BOINC Manager, advanced view. If it starts with the word 'Local:', BOINC is using Anonymous Platform. If the word 'Local:' is missing, it's running stock.

Yep, not local. I can probably keep it running as is until Anonymous Platform is fixed, but the tasks missing from task page is concerning. It's possible the WU's are getting thrown out anyway. It's winter and I need the heat so it's probably OK. :-)

The task page is not up to date due the server is some 40.000 secs behind, so what you see is the status for 40.000 secs ago.

And it's just getting worse, yesterday it was some 17.000 secs behind.
ID: 2024216 · Report as offensive
wujj123456

Send message
Joined: 5 Sep 04
Posts: 40
Credit: 20,877,975
RAC: 219
China
Message 2024221 - Posted: 21 Dec 2019, 22:13:48 UTC - in response to Message 2024216.  

The task page is not up to date due the server is some 40.000 secs behind, so what you see is the status for 40.000 secs ago.

And it's just getting worse, yesterday it was some 17.000 secs behind.

The task page is updated from the Replica, which as of now is 43961 seconds behind (more than 12 hours)
No worry about that, your downloaded tasks does exist, you just can't see them on the task pages.

Edit: JohnDK beat me to it.

Ah, thanks. It makes sense that these non-critical stats are from querying replica. It just means I need to check on hosts directly for now, which I am kinda already doing anyway...
ID: 2024221 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 2024224 - Posted: 21 Dec 2019, 22:29:48 UTC - in response to Message 2024205.  

Anonymous Platform here.
Scheduler requests take 40sec up to a timeout & result in
22/12/2019 04:43:47 | SETI@home | Scheduler request failed: Failure when receiving data from the peer
22/12/2019 05:35:29 | SETI@home | Scheduler request failed: Couldn't connect to server
22/12/2019 06:25:27 | SETI@home | Scheduler request failed: HTTP service unavailable
22/12/2019 06:32:28 | SETI@home | Scheduler request failed: HTTP internal server error

Once or twice I have got a valid response (taking almost 2 min, usual response time 3 sec) which results in a "Project has no tasks available" response.


Edit- both systems have all work completed & reported, they're just trying for new work.

Interestingly, my Windows system is still getting mostly Scheduler errors (30sec to 3min wait for a response), my Linux system is getting "Project has no tasks available" responses (30-50sec response time).
Grant
Darwin NT
ID: 2024224 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2024225 - Posted: 21 Dec 2019, 22:34:11 UTC

Greetings,

Ok, now I'm running stock again. I got cuda60 and sah WUs. This, on my main. My other Linux PC still has almost 16 hours of crunching to do before I work with it.

Instead of archiving my anonymous SETI directory, I just renamed it before restarting BOINC and resetting the project. Works for me. :)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2024225 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2024226 - Posted: 21 Dec 2019, 22:35:22 UTC

I was just sent two more groups of "Lost" tasks, I think that was all of them. So it appears 'Resend Lost Tasks' is turned on by default just as it was on BETA. Hopefully it will help cut down on Database bloat.
I also found you can download tasks while running Stock with 'Suspend GPU' set, that helps when the Server insists on sending you Apps that crash 5 seconds after they start. Once you have a few hundred tasks you can 'reschedule' the crashing tasks to an App that doesn't crash....such as CUDA90.
ID: 2024226 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2024228 - Posted: 21 Dec 2019, 22:41:43 UTC
Last modified: 21 Dec 2019, 22:42:06 UTC

One observation, when we ask for new work it returns: Sat 21 Dec 2019 05:39:10 PM EST | SETI@home | Scheduler request failed: HTTP internal server error

Maybe somebody forget to update the address of the server on the task started when the anonymous host ask for new work...
ID: 2024228 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2024229 - Posted: 21 Dec 2019, 22:42:44 UTC - in response to Message 2024226.  

So it appears 'Resend Lost Tasks' is turned on by default just as it was on BETA.
That probably accounts for the database sluggishness, all by itself - it was implicated in the November 2013 database event. That CAN'T have been deliberate, surely? Have you told Eric yet?

I haven't had any reply from Eric yet, but I won't pester him - but I will pass on what we know by the beginning of tomorrow's Berkeley daylight.
ID: 2024229 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2024231 - Posted: 21 Dec 2019, 22:44:55 UTC - in response to Message 2024228.  

One observation, when we ask for new work it returns: Sat 21 Dec 2019 05:39:10 PM EST | SETI@home | Scheduler request failed: HTTP internal server error

Maybe somebody forget to update the address of the server on the task started when the anonymous host ask for new work...


This is, I think, further evidence that something went very wrong in the disconnection noted in News and we're actually connecting to the Beta scheduler.
ID: 2024231 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2024233 - Posted: 21 Dec 2019, 22:47:24 UTC - in response to Message 2024228.  

One observation, when we ask for new work it returns: Sat 21 Dec 2019 05:39:10 PM EST | SETI@home | Scheduler request failed: HTTP internal server error

Maybe somebody forget to update the address of the server on the task started when the anonymous host ask for new work...
It's been doing that at intervals all day - among all the other grotty responses it's capable of sending out. I don't think it's anything to do with a bad address - it just uses up all the available time or memory and then crashes. Possibly because of all the extra work and database querying it's trying to do for a 'lost tasks' check.
ID: 2024233 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 2024234 - Posted: 21 Dec 2019, 22:49:06 UTC - in response to Message 2024223.  
Last modified: 21 Dec 2019, 22:50:35 UTC

It would seem to me that the current problems might be related to Eric's "BOINC Notice" he posted yesterday:

_____________________________________________________________________________________________________________________________
SETI@home: Some server issues today...
It's the Friday before a holiday week and the servers know it.

The file system containing the beta project uploads directory is having problems, so beta is down until further notice.

This problem may be affecting the rate at which the main project can handle results, so the validation and assimilation queues are getting large, which may affect the rate of work generation.
12/20/2019 17:10:04
_____________________________________________________________________________________________________________________________
ID: 2024234 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2024236 - Posted: 21 Dec 2019, 23:14:47 UTC - in response to Message 2024234.  

I'm sure they're related, but I'm not exactly sure how, yet.

I thought they might have upgraded the server software in a misguided attempt to rectify the database problems. Now I'm not so sure. Recent posts have suggested that they might simply have got the wires crossed (well, not quite as simple as that ...) and set the Beta server up to process the Main project - but with the Beta settings still in place. Too soon to tell, until we set some feedback from inside the project. Our scheduler here is still running on Synergy, though - what was Beta's scheduler running on?
ID: 2024236 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2024237 - Posted: 21 Dec 2019, 23:14:56 UTC - in response to Message 2024229.  

Have you told Eric yet? I haven't had any reply from Eric yet, but I won't pester him - but I will pass on what we know by the beginning of tomorrow's Berkeley daylight.
I told him about it a couple months ago, back when I was running Apps on BETA. I dunno, maybe remind him...
ID: 2024237 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2024239 - Posted: 21 Dec 2019, 23:18:25 UTC - in response to Message 2024237.  

Have you told Eric yet? I haven't had any reply from Eric yet, but I won't pester him - but I will pass on what we know by the beginning of tomorrow's Berkeley daylight.
I told him about it a couple months ago, back when I was running Apps on BETA. I dunno, maybe remind him...
I was asking about the new observation about Resend Lost Tasks being active on Main - that's nearer two hours ago than two months ago!

Don't worry about it - Mr. Kevvy has passed the new information on. And I'm going to bed.
ID: 2024239 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 2024244 - Posted: 21 Dec 2019, 23:57:39 UTC
Last modified: 22 Dec 2019, 0:20:55 UTC

It certainly doesn't like Anonymous Platform.

On my WIndows system I exited BOINC, backed up my Seti project folder, removed app_info.xml and restarted BOINC. First couple of Scheduler requests failed, then got work.
22/12/2019 09:09:52 | SETI@home | Scheduler request failed: Failure when receiving data from the peer
22/12/2019 09:14:37 | SETI@home | Scheduler request failed: Couldn't connect to server
22/12/2019 09:15:58 | SETI@home | Scheduler request completed: got 68 new tasks
22/12/2019 09:21:25 | SETI@home | Scheduler request failed: Couldn't connect to server
22/12/2019 09:23:26 | SETI@home | Scheduler request completed: got 20 new tasks

With anonymous platform, any successful Scheduler request results in "Project has no tasks available", and there were very, very, very few successful requests.


Edit- and even running stock, successful Scheduler requests are in the minority.
So far 9 failures, 4 successful (3 successful and getting work in a sequence of 4 requests), and one "project has no tasks available response."

It is very, very broken.
Grant
Darwin NT
ID: 2024244 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 2024252 - Posted: 22 Dec 2019, 0:54:17 UTC - in response to Message 2024244.  

I'm looking into the problem. Grrrrr.....
@SETIEric@qoto.org (Mastodon)

ID: 2024252 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2024253 - Posted: 22 Dec 2019, 1:00:21 UTC - in response to Message 2024252.  

I'm looking into the problem. Grrrrr.....


ID: 2024253 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11360
Credit: 29,581,041
RAC: 66
United States
Message 2024255 - Posted: 22 Dec 2019, 1:27:47 UTC - in response to Message 2024252.  

We are doing all we can do to help solve the problem.
https://boinc.berkeley.edu/dev/forum_thread.php?id=8105&postid=94433
https://boinc.berkeley.edu/dev/forum_thread.php?id=8105&postid=94439
Howling at the moon and seting our hair on fire often helps.
ID: 2024255 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 2024259 - Posted: 22 Dec 2019, 1:33:19 UTC

Well, running stock isn't helping that much.
So far I've picked up a dozen CUDA50 WUs, I did get 100 SoG WUs, but the downloads errored out for some reason. And I've picked up around 300 CUDA42 WUs, which take 30min to process (instead of the 9min or less with SoG).

I'll give it a while longer & see if i can get some SoG work that downloads OK, otherwise I might as well just set it for No New Tasks and wait till it's fixed.
Grant
Darwin NT
ID: 2024259 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024260 - Posted: 22 Dec 2019, 1:34:27 UTC - in response to Message 2024255.  

We are doing all we can do to help solve the problem.
https://boinc.berkeley.edu/dev/forum_thread.php?id=8105&postid=94433
https://boinc.berkeley.edu/dev/forum_thread.php?id=8105&postid=94439
Howling at the moon and seting our hair on fire often helps.


I'm out of hair, but I can still drink.
ID: 2024260 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11360
Credit: 29,581,041
RAC: 66
United States
Message 2024262 - Posted: 22 Dec 2019, 1:38:51 UTC - in response to Message 2024260.  

Juan maybe is waiting for you to arrive.
ID: 2024262 · Report as offensive
Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.