Panic Mode On (109) Server Problems?

Message boards : Number crunching : Panic Mode On (109) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 35 · Next

AuthorMessage
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1911245 - Posted: 6 Jan 2018, 20:35:43 UTC - in response to Message 1911236.  

IIRC, the previous problem was downloads that started, but paused part way through and wouldn't restart. Is that happening again?

The DL some WU the some stops this is an example: After you hit the retry bottom they restart :

Sat 06 Jan 2018 02:41:54 PM EST | SETI@home | Started download of blc05_2bit_guppi_57976_06552_HIP74702_0024.909.818.22.45.128.vlar
Sat 06 Jan 2018 02:41:56 PM EST | | Internet access OK - project servers may be temporarily down.
Sat 06 Jan 2018 02:41:56 PM EST | SETI@home | Temporarily failed download of blc05_2bit_guppi_57976_07262_HIP74926_0026.863.818.22.45.164.vlar: transient HTTP error
Sat 06 Jan 2018 02:41:56 PM EST | SETI@home | Backing off 00:05:58 on download of blc05_2bit_guppi_57976_07262_HIP74926_0026.863.818.22.45.164.vlar
Sat 06 Jan 2018 02:41:56 PM EST | SETI@home | Temporarily failed download of blc05_2bit_guppi_57976_06552_HIP74702_0024.909.818.22.45.128.vlar: transient HTTP error
Sat 06 Jan 2018 02:41:56 PM EST | SETI@home | Backing off 00:06:33 on download of blc05_2bit_guppi_57976_06552_HIP74702_0024.909.818.22.45.128.vlar

+1 It's not just Juan


+1. It's not just them, either. I've had to hit retry with annoying regularity in the last 24 hours, which got things moving again, but even that is now having no effect. <simmering nicely>
Don't take life too seriously, as you'll never come out of it alive!
ID: 1911245 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1911246 - Posted: 6 Jan 2018, 20:35:55 UTC
Last modified: 6 Jan 2018, 20:39:57 UTC

OK, I got one for you.

06/01/2018 20:34:07 | SETI@home | Started download of blc05_2bit_guppi_57976_07596_HIP74209_0027.12052.818.21.44.106.vlar
06/01/2018 20:34:09 | SETI@home | [http] [ID#1362] Info:    Trying 208.68.240.127...
06/01/2018 20:34:09 | SETI@home | [http] [ID#1362] Info:  Empty reply from server
06/01/2018 20:34:09 | SETI@home | [http] HTTP error: Server returned nothing (no headers, no data)
That's a transient error telling me that Vader is on strike.

And this tells me that georgem is working and carrying the load. Note it's the same file.

06/01/2018 20:38:13 | SETI@home | Started download of blc05_2bit_guppi_57976_07596_HIP74209_0027.12052.818.21.44.106.vlar
06/01/2018 20:38:14 | SETI@home | [http] [ID#1367] Info:    Trying 208.68.240.119...
06/01/2018 20:38:14 | SETI@home | [http] [ID#1367] Received header from server: HTTP/1.1 200 OK
06/01/2018 20:38:17 | SETI@home | Finished download of blc05_2bit_guppi_57976_07596_HIP74209_0027.12052.818.21.44.106.vlar
ID: 1911246 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911254 - Posted: 6 Jan 2018, 20:47:28 UTC

I am seeing the same thing as Richard. I accidentally uncommented Vader and commented Georgem on one machine and started getting no replies from the server for a download request. Quickly fixed again with the correct download server exposure and all things good again. Vader is out to lunch since the recovery yesterday.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911254 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1911261 - Posted: 6 Jan 2018, 21:04:05 UTC - in response to Message 1911245.  

+1. It's not just them, either. I've had to hit retry with annoying regularity in the last 24 hours, which got things moving again, but even that is now having no effect. <simmering nicely>

Since uncommenting 208.68.240.119 in my Hosts file I haven't had a single download issue.
Prior to that, downloads would time out almost as soon as they started.

The Scheduler however is still randomly refusing to allocate work, but it looks like someone has been to work this morning (or at least logged in) as the Average-turnaround and Received-last-hour numbers are updating again.
Grant
Darwin NT
ID: 1911261 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1911299 - Posted: 6 Jan 2018, 22:37:47 UTC - in response to Message 1911261.  

The Scheduler however is still randomly refusing to allocate work, but it looks like someone has been to work this morning (or at least logged in) as the Average-turnaround and Received-last-hour numbers are updating again.

Well, Average-turnaround and Received-last-hour numbers did update for a while there. They're back on leave again.
Grant
Darwin NT
ID: 1911299 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911310 - Posted: 6 Jan 2018, 23:15:49 UTC

My Windows 10 machine is able to get work every single time while on a VPN connection.

My Linux machines have stuck downloads but retrying seems to get them unstuck.
~Chris

ID: 1911310 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1911313 - Posted: 6 Jan 2018, 23:20:34 UTC - in response to Message 1911310.  

My Linux machines have stuck downloads but retrying seems to get them unstuck.
Are you looking to see which of the two download servers that machine is trying to contact, at the time when the sticking happens?
ID: 1911313 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1911319 - Posted: 6 Jan 2018, 23:35:20 UTC - in response to Message 1911313.  

I've been having problems for the last 2 days with stuck downloads on different machines. I have to manually go into each machine and retry downloads. I get the impressions that too many work units try to download at the same time and some get blocked, so they get moved to retry in X minutes. Unfortunately, they get stuck there and never complete the retry. So manually doing it, corrects the issue. My 2 cents.
ID: 1911319 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911325 - Posted: 6 Jan 2018, 23:56:26 UTC - in response to Message 1911313.  

My Linux machines have stuck downloads but retrying seems to get them unstuck.
Are you looking to see which of the two download servers that machine is trying to contact, at the time when the sticking happens?


Just turned on a bit more logging with my Windows 10 machine. A download just got stuck from this server:

1/6/2018 5:53:55 PM | SETI@home | [http] [ID#321] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#319)
~Chris

ID: 1911325 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1911337 - Posted: 7 Jan 2018, 0:12:22 UTC - in response to Message 1911319.  

I've been having problems for the last 2 days with stuck downloads on different machines. I have to manually go into each machine and retry downloads. I get the impressions that too many work units try to download at the same time and some get blocked, so they get moved to retry in X minutes. Unfortunately, they get stuck there and never complete the retry. So manually doing it, corrects the issue. My 2 cents.
+ 1 Same experience here.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1911337 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1911342 - Posted: 7 Jan 2018, 0:24:39 UTC - in response to Message 1911242.  

I'm sorry to report no problems at this end at all.

Cheers.


. . I think at this stage that is taken as read :)

Stephen

:)
ID: 1911342 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1911344 - Posted: 7 Jan 2018, 0:27:10 UTC - in response to Message 1911337.  

I've been having problems for the last 2 days with stuck downloads on different machines. I have to manually go into each machine and retry downloads. I get the impressions that too many work units try to download at the same time and some get blocked, so they get moved to retry in X minutes. Unfortunately, they get stuck there and never complete the retry. So manually doing it, corrects the issue. My 2 cents.
+ 1 Same experience here.

Since uncommenting 208.68.240.119 in my Hosts file I haven't had a single download issue.
Prior to that, downloads would time out almost as soon as they started.
Grant
Darwin NT
ID: 1911344 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1911383 - Posted: 7 Jan 2018, 1:40:32 UTC - in response to Message 1911344.  
Last modified: 7 Jan 2018, 1:42:12 UTC


Since uncommenting 208.68.240.119 in my Hosts file I haven't had a single download issue.
Prior to that, downloads would time out almost as soon as they started.


. . Exactly the same here. I think Richard is on the right track when he says Vader has decided to take a holiday.

Stephen

:)
ID: 1911383 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1911400 - Posted: 7 Jan 2018, 2:27:19 UTC - in response to Message 1911299.  

The Scheduler however is still randomly refusing to allocate work, but it looks like someone has been to work this morning (or at least logged in) as the Average-turnaround and Received-last-hour numbers are updating again.

Well, Average-turnaround and Received-last-hour numbers did update for a while there. They're back on leave again.

And they're back from leave again. We'll see how long they last this time.
Grant
Darwin NT
ID: 1911400 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911413 - Posted: 7 Jan 2018, 3:29:02 UTC

Added 208.68.240.119 to my hosts file 3 hours ago, haven't had a problem downloading since. I just commented it out to see what happens overnight. 123 work units in my cache at the moment. We'll see what happens over the next 9 or so hours.
~Chris

ID: 1911413 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13886
Credit: 208,696,464
RAC: 304
Australia
Message 1911415 - Posted: 7 Jan 2018, 3:41:41 UTC - in response to Message 1911400.  

The Scheduler however is still randomly refusing to allocate work, but it looks like someone has been to work this morning (or at least logged in) as the Average-turnaround and Received-last-hour numbers are updating again.

Well, Average-turnaround and Received-last-hour numbers did update for a while there. They're back on leave again.

And they're back from leave again. We'll see how long they last this time.

About as long as last time.
Looks like they start to update again, then stop almost straight away.
Grant
Darwin NT
ID: 1911415 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1911449 - Posted: 7 Jan 2018, 9:14:58 UTC - in response to Message 1911337.  

I've been having problems for the last 2 days with stuck downloads on different machines. I have to manually go into each machine and retry downloads. I get the impressions that too many work units try to download at the same time and some get blocked, so they get moved to retry in X minutes. Unfortunately, they get stuck there and never complete the retry. So manually doing it, corrects the issue. My 2 cents.
+ 1 Same experience here.
Have either of you two actually read the recent posts in this thread? We know what the problem is, we know how to work round it. You can do that too.
ID: 1911449 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911484 - Posted: 7 Jan 2018, 14:16:00 UTC - in response to Message 1911413.  

Added 208.68.240.119 to my hosts file 3 hours ago, haven't had a problem downloading since. I just commented it out to see what happens overnight. 123 work units in my cache at the moment. We'll see what happens over the next 9 or so hours.


1 work unit got stuck overnight and my Windows 10 machine was down to 59 work units. For whatever reason, This machine can't let go of Vader when it gets a stuck work unit. My Linux machines hit Vader too, but within an hour or 2 the auto retry works for them. If I had time today I would look to see if a connection from the Windows 10 machine can be closed sooner (thought I saw something about a 300 second timeout for connections) or some other tweak could be done with the auto retry perhaps.
~Chris

ID: 1911484 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911499 - Posted: 7 Jan 2018, 15:38:47 UTC

@Chris, as Richard said, there is no need to do any further post-mortem on the problem. We know what the issue is with the download servers. Simple solution is to just comment out Vader server at 208.68.240.127 in the Hosts file and wait for staff to get it working next week.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911499 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911524 - Posted: 7 Jan 2018, 17:00:01 UTC - in response to Message 1911499.  

@Chris, as Richard said, there is no need to do any further post-mortem on the problem. We know what the issue is with the download servers. Simple solution is to just comment out Vader server at 208.68.240.127 in the Hosts file and wait for staff to get it working next week.


Sorry, wasn't really a post about the server side problem. But more of a learning opportunity for me on how the client side works - not just with with BOINC or SETI, but with networking, specifically with DNS, load balancing, and proxy servers. All of which I had at home just for fun to see how they work but it's been awhile.
~Chris

ID: 1911524 · Report as offensive
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 35 · Next

Message boards : Number crunching : Panic Mode On (109) Server Problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.