Panic Mode On (109) Server Problems?

Message boards : Number crunching : Panic Mode On (109) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 35 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1909611 - Posted: 30 Dec 2017, 17:11:53 UTC

Seems to be particularly Bad at the moment. All three Linux machines are being refused work, something about 'Project has no tasks available' even though RTS is hovering around 600k. I suspect that is a Red Herring and only appears due to asking for AP work. If you ask for just MB work, as with my Mac, the Server can't come up with a reason for Not sending work and merely says, No Tasks Sent. Fortunately, the Mac is still being sent work suggesting prejudice, but alas, even Windows machines are denied work on occasion. Strange how it always seems to start with the Linux machines though. One machine is right at 100 tasks in the red, will it recover, or will the Server run it out of work?
ID: 1909611 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1909702 - Posted: 30 Dec 2017, 23:46:29 UTC

The Server has decided to continue to Strangle 2 of my Linux machines. The third machine recovered to a full cache, for a while, and then continued to be sent less tasks than was being reported. The other two are getting very low on tasks and it appears the server would be content to run them out of GPU work. Both machines have 3 GPUs and should have around 330 tasks in the 1 day cache;
State: All (2474) · In progress (97) · Validation pending (1092)
State: All (1898) · In progress (98) · Validation pending (785)
Strange how the Server sends them about the same number of tasks instead of the number the Host is Requesting. What are the chances that after hours one is at 97 and the other 98?
I suppose I'm going to have to intervene or else watch the Server Starve my machines to death. Now down to 94 & 95.
ID: 1909702 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1909704 - Posted: 30 Dec 2017, 23:54:31 UTC - in response to Message 1909702.  
Last modified: 31 Dec 2017, 0:21:45 UTC

The Server has decided to continue to Strangle 2 of my Linux machines.

It's not just Linux systems.
I've had to triple update my Win10 i7 several times this morning to keep the work coming.

Edit- even my Vista C2D has required a bump or 2.

Edit- and it's not surprising. Very little Arecibo work about, and no AP. When there is Arecibo & AP work available, then you can get MB without extra effort being needed.
Grant
Darwin NT
ID: 1909704 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1909716 - Posted: 31 Dec 2017, 1:07:10 UTC - in response to Message 1909704.  
Last modified: 31 Dec 2017, 1:07:22 UTC

Edit- even my Vista C2D has required a bump or 2.

Make that several bumps.
Struggling to get work on either system.
Grant
Darwin NT
ID: 1909716 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1909719 - Posted: 31 Dec 2017, 1:10:08 UTC

Only the linux machine is massively down. It is down about 200 tasks. It is getting about 11 tasks per request on average for the past couple of hours. Nowhere near enough to replenish tasks after task retirement per request. Triple Update isn't doing much for it.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1909719 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1909720 - Posted: 31 Dec 2017, 1:13:44 UTC
Last modified: 31 Dec 2017, 1:41:31 UTC

My host get a lot of WU but all are Arecibo Vlars, maybe that is why your host did not get any new GPU work.
ID: 1909720 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1909723 - Posted: 31 Dec 2017, 1:22:27 UTC - in response to Message 1909720.  

The two computers that got any tasks of any amount (36) were Arecibo shorties and BLC. Need a couple of hundred each machine. Could be a Arecibo VLAR storm be the cause again?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1909723 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1909727 - Posted: 31 Dec 2017, 1:42:34 UTC - in response to Message 1909723.  

Nah, it's not a VLAR Storm. I have 3 computers that are down on CPU tasks too. Nothing is coming out the pipe ... or at least, not much.
ID: 1909727 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1909730 - Posted: 31 Dec 2017, 1:56:40 UTC - in response to Message 1909727.  
Last modified: 31 Dec 2017, 1:58:25 UTC

Yes, I have to agree. The pattern is nothing coming out of the servers. Down on cpu tasks all machines too. So, no handicap there on VLAR's. The fastest cpu machines, the Ryzens can't get enough cpu work either. Lots of no tasks to send messages.

[Edit] The Haveland graphs support that assessment. Number of tasks in progress is dropping over the last couple of hours.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1909730 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1909731 - Posted: 31 Dec 2017, 1:57:25 UTC
Last modified: 31 Dec 2017, 1:57:49 UTC

In Progress seems to be dropping. I suppose that means the Server is sending less tasks. I thought a VLAR Storm was alleged to be impossible with BLC tasks running on GPUs.



A couple machines just got topped off, another is still down by over 200 tasks.
ID: 1909731 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1909733 - Posted: 31 Dec 2017, 2:04:54 UTC - in response to Message 1909731.  

I thought a VLAR Storm was alleged to be impossible with BLC tasks running on GPUs.

We had one for several hours last week (or was it the week before?)
But this isn't that. It's just the problem with the Scheduler- little or no AP or Arecibo WUs means not even GBT work will be allocated when it's requested.
When the Arecibo and/or AP work becomes available again, then we'll be able to get GBT WUs again.
It's been this way for 12 months.
Grant
Darwin NT
ID: 1909733 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1909737 - Posted: 31 Dec 2017, 2:25:28 UTC - in response to Message 1909733.  

My read of the way the GBT splitters act is they run fine until the server cache reaches 600k, then they shut off and don't restart (or just dribble a little) and the Arecibo splitters take over.

When the server does it's Expired/TimedOut task check, the GBT splitters restart. I always see a blast of resends on a least 1 computer just before the GBT tasks reappear. They run for approx 1/2 an hour and shut down again.

So until the server does it's check, it is possible to get stuck in a VLAR storm with no GBT being split.
ID: 1909737 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1909742 - Posted: 31 Dec 2017, 3:20:49 UTC

I'm down to my last 2 gpu tasks on the Linux cruncher. Guess it's going to do Einstein for the rest of the evening.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1909742 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1909748 - Posted: 31 Dec 2017, 4:07:07 UTC - in response to Message 1909742.  

I'm down to my last 2 gpu tasks on the Linux cruncher. Guess it's going to do Einstein for the rest of the evening.

Even my C2D is struggling to get work today.
Usually it doesn't have a problem, even when the i7 does.
Grant
Darwin NT
ID: 1909748 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1909753 - Posted: 31 Dec 2017, 4:46:08 UTC

This seems eerily like the "Database slowness" (as Eric called it) back on 10 Nov.

We're having some as yet unexplained slowness with the our BOINC database. There don't seem to be any hardware issues. Temperatures are running normal and all the drives seem good. Yet for some reason the query that fills the "ready to send" queue is running about 10 times slower than it normally does.

Until I get it fixed, it means that on average we're sending out 3 workunits a second rather than 30+.
That was a holiday weekend, too.
ID: 1909753 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1909755 - Posted: 31 Dec 2017, 5:09:02 UTC

Can't connect to server for the last 10 minutes.
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | Sending scheduler request: To fetch work.
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | Reporting 2 completed tasks
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Info:    Trying 208.68.240.126...
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Info:  Connected to setiboinc.ssl.berkeley.edu (208.68.240.126) port 80 (#16)
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: POST /sah_cgi/cgi HTTP/1.1
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: Host: setiboinc.ssl.berkeley.edu
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (x86_64-pc-linux-gnu 7.8.3)
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: Accept: */*
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: Accept-Language: en_US
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: Content-Length: 36896
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server: Expect: 100-continue
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Sent header to server:
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | 
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
Sat 30 Dec 2017 09:05:59 PM PST | SETI@home | [http] [ID#1] Info:  We are completely uploaded and fine
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: HTTP/1.1 500 Internal Server Error
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: Date: Sun, 31 Dec 2017 05:05:59 GMT
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: Server: Apache/2.2.15 (Scientific Linux)
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: Content-Length: 647
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: Connection: close
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: Content-Type: text/html; charset=iso-8859-1
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server:
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <html><head>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <title>500 Internal Server Error</title>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: </head><body>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <h1>Internal Server Error</h1>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <p>The server encountered an internal error or
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: misconfiguration and was unable to complete
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: your request.</p>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <p>Please contact the server administrator,
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server:  boincadm@ssl.berkeley.edu and inform them of the time the error occurred,
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: and anything you might have done that may have
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: caused the error.</p>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <p>More information about this error may be available
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: in the server error log.</p>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <hr>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: <address>Apache/2.2.15 (Scientific Linux) Server at setiboinc.ssl.berkeley.edu Port 80</address>
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Received header from server: </body></html>
Sat 30 Dec 2017 09:06:40 PM PST |  | [http_xfer] [ID#1] HTTP: wrote 647 bytes
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | [http] [ID#1] Info:  Closing connection 16
Sat 30 Dec 2017 09:06:40 PM PST | SETI@home | Scheduler request failed: HTTP internal server error

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1909755 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1909767 - Posted: 31 Dec 2017, 6:44:56 UTC - in response to Message 1909753.  

This seems eerily like the "Database slowness" (as Eric called it) back on 10 Nov.

We're having some as yet unexplained slowness with the our BOINC database. There don't seem to be any hardware issues. Temperatures are running normal and all the drives seem good. Yet for some reason the query that fills the "ready to send" queue is running about 10 times slower than it normally does.

Until I get it fixed, it means that on average we're sending out 3 workunits a second rather than 30+.
That was a holiday weekend, too.


. . I guess the servers have a calender with the holidays marked on it ? A shame they are union servers :)

Stephen

:)
ID: 1909767 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1909768 - Posted: 31 Dec 2017, 6:48:15 UTC - in response to Message 1909755.  
Last modified: 31 Dec 2017, 6:48:33 UTC

Can't connect to server for the last 10 minutes.

There are almost always issues with the web site & scheduler around this time of day for anything from just 10min to 45min.
Grant
Darwin NT
ID: 1909768 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1909996 - Posted: 1 Jan 2018, 9:38:20 UTC

Hi, everyone...wish you a Happy New S@H Year !

Just a small question : I have the usual 5 minutes back-off time after a manual update request. Is there supposed to be an automatic scheduler request when that time runs out ? I seem to remember that used to be the case, but now the time just runs out and nothing happens. It may take over 30 minutes before a new scheduler request is sent to the server.
This doesn't have any negative consequences, of course...just finding it weird. I haven't done any changes to Boinc or S@H settings for many months...the only thing out of the ordinary is that my system was down for a week before Christmas when my monitor died.

...Ghia...
Humans may rule the world...but bacteria run it...
ID: 1909996 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1909999 - Posted: 1 Jan 2018, 9:51:01 UTC - in response to Message 1909996.  

Hi, everyone...wish you a Happy New S@H Year !

Just a small question : I have the usual 5 minutes back-off time after a manual update request. Is there supposed to be an automatic scheduler request when that time runs out ?

It depends.
If your cache is full, then it won't make another request till the next WU has been completed and uploaded.
But if your cache isn't full, even if another WU hasn't been completed, then after the 5min 3 sec delay it will ask for work again (although that also depends on your Store up to an additional x days setttings). The larger that value, the longer it will wait (ie the more WUs that will have to be returned) before requesting more work.
Grant
Darwin NT
ID: 1909999 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 35 · Next

Message boards : Number crunching : Panic Mode On (109) Server Problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.