Panic Mode On (109) Server Problems?

Message boards : Number crunching : Panic Mode On (109) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 36 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1911383 - Posted: 7 Jan 2018, 1:40:32 UTC - in response to Message 1911344.  
Last modified: 7 Jan 2018, 1:42:12 UTC


Since uncommenting 208.68.240.119 in my Hosts file I haven't had a single download issue.
Prior to that, downloads would time out almost as soon as they started.


. . Exactly the same here. I think Richard is on the right track when he says Vader has decided to take a holiday.

Stephen

:)
ID: 1911383 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1911400 - Posted: 7 Jan 2018, 2:27:19 UTC - in response to Message 1911299.  

The Scheduler however is still randomly refusing to allocate work, but it looks like someone has been to work this morning (or at least logged in) as the Average-turnaround and Received-last-hour numbers are updating again.

Well, Average-turnaround and Received-last-hour numbers did update for a while there. They're back on leave again.

And they're back from leave again. We'll see how long they last this time.
Grant
Darwin NT
ID: 1911400 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911413 - Posted: 7 Jan 2018, 3:29:02 UTC

Added 208.68.240.119 to my hosts file 3 hours ago, haven't had a problem downloading since. I just commented it out to see what happens overnight. 123 work units in my cache at the moment. We'll see what happens over the next 9 or so hours.
~Chris

ID: 1911413 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1911415 - Posted: 7 Jan 2018, 3:41:41 UTC - in response to Message 1911400.  

The Scheduler however is still randomly refusing to allocate work, but it looks like someone has been to work this morning (or at least logged in) as the Average-turnaround and Received-last-hour numbers are updating again.

Well, Average-turnaround and Received-last-hour numbers did update for a while there. They're back on leave again.

And they're back from leave again. We'll see how long they last this time.

About as long as last time.
Looks like they start to update again, then stop almost straight away.
Grant
Darwin NT
ID: 1911415 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1911449 - Posted: 7 Jan 2018, 9:14:58 UTC - in response to Message 1911337.  

I've been having problems for the last 2 days with stuck downloads on different machines. I have to manually go into each machine and retry downloads. I get the impressions that too many work units try to download at the same time and some get blocked, so they get moved to retry in X minutes. Unfortunately, they get stuck there and never complete the retry. So manually doing it, corrects the issue. My 2 cents.
+ 1 Same experience here.
Have either of you two actually read the recent posts in this thread? We know what the problem is, we know how to work round it. You can do that too.
ID: 1911449 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911484 - Posted: 7 Jan 2018, 14:16:00 UTC - in response to Message 1911413.  

Added 208.68.240.119 to my hosts file 3 hours ago, haven't had a problem downloading since. I just commented it out to see what happens overnight. 123 work units in my cache at the moment. We'll see what happens over the next 9 or so hours.


1 work unit got stuck overnight and my Windows 10 machine was down to 59 work units. For whatever reason, This machine can't let go of Vader when it gets a stuck work unit. My Linux machines hit Vader too, but within an hour or 2 the auto retry works for them. If I had time today I would look to see if a connection from the Windows 10 machine can be closed sooner (thought I saw something about a 300 second timeout for connections) or some other tweak could be done with the auto retry perhaps.
~Chris

ID: 1911484 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911499 - Posted: 7 Jan 2018, 15:38:47 UTC

@Chris, as Richard said, there is no need to do any further post-mortem on the problem. We know what the issue is with the download servers. Simple solution is to just comment out Vader server at 208.68.240.127 in the Hosts file and wait for staff to get it working next week.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911499 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1911524 - Posted: 7 Jan 2018, 17:00:01 UTC - in response to Message 1911499.  

@Chris, as Richard said, there is no need to do any further post-mortem on the problem. We know what the issue is with the download servers. Simple solution is to just comment out Vader server at 208.68.240.127 in the Hosts file and wait for staff to get it working next week.


Sorry, wasn't really a post about the server side problem. But more of a learning opportunity for me on how the client side works - not just with with BOINC or SETI, but with networking, specifically with DNS, load balancing, and proxy servers. All of which I had at home just for fun to see how they work but it's been awhile.
~Chris

ID: 1911524 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1911531 - Posted: 7 Jan 2018, 17:42:03 UTC

Again, exposing my ignorance, I am not a code writer or analyst. I can however follow simple directions when I know where to look and what to change.

Where exactly is the 'hosts' file located, and how do I 'comment out' 208.68.240.127 ?

Some of us are not computer scientists but know enough to 'get around' and really want to contribute to the project but in posting work around and fixes sometimes the explanation assumes a level of basic knowledge that some of us just don't have........IMHO.

Thanks for any assistance.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1911531 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1911534 - Posted: 7 Jan 2018, 17:47:24 UTC - in response to Message 1911531.  
Last modified: 7 Jan 2018, 17:50:45 UTC

I think most of us are using the info Richard provided in the summer about bypassing the DNS servers and going direct.
http://setiathome.berkeley.edu/forum_thread.php?id=81638&postid=1875152

EDIT: And this. http://setiathome.berkeley.edu/forum_thread.php?id=81638&postid=1875158
ID: 1911534 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1911535 - Posted: 7 Jan 2018, 17:49:32 UTC - in response to Message 1911524.  
Last modified: 7 Jan 2018, 17:58:41 UTC

@Chris, as Richard said, there is no need to do any further post-mortem on the problem. We know what the issue is with the download servers. Simple solution is to just comment out Vader server at 208.68.240.127 in the Hosts file and wait for staff to get it working next week.
Sorry, wasn't really a post about the server side problem. But more of a learning opportunity for me on how the client side works - not just with with BOINC or SETI, but with networking, specifically with DNS, load balancing, and proxy servers. All of which I had at home just for fun to see how they work but it's been awhile.
It's generic, and very basic, TCP/IP networking. Add the hosts file to your list.

The unusual thing is the way SETI uses round-robin DNS. The best way to see this is from the command prompt.

Type 'ipconfig /flushdns' to clear your browsing history from the last 24 hours - it makes life easier.

Then trigger or retry a SETI download.

Now type 'ipconfig /displaydns'. You should, among the chatter, see

Windows IP Configuration

    boinc2.ssl.berkeley.edu
    ----------------------------------------
    Record Name . . . . . : boinc2.ssl.berkeley.edu
    Record Type . . . . . : 1
    Time To Live  . . . . : 120
    Data Length . . . . . : 4
    Section . . . . . . . : Answer
    A (Host) Record . . . : 208.68.240.127


    Record Name . . . . . : boinc2.ssl.berkeley.edu
    Record Type . . . . . : 1
    Time To Live  . . . . : 120
    Data Length . . . . . : 4
    Section . . . . . . . : Answer
    A (Host) Record . . . : 208.68.240.119
Two download IP addresses. BOINC (or any other Windows program) will try the first one. In this case, Vader - and the downloads failed.

But also note the very short TTL (Time To Live - seconds). Windows effectively never caches that IP address (although BOINC does). A new download will fetch a new DNS response, and may get the IPs the other way round - and succeed.

Edit - Yup. Went back to BOINC after posting that, and clicked 'retry now' for just one file. All 40 downloaded at the first attempt.

07/01/2018 17:44:22 | SETI@home | Backing off 00:03:39 on download of blc05_2bit_guppi_57976_10703_HIP74981_0036.11026.818.22.45.128.vlar
07/01/2018 17:54:07 | SETI@home | Started download of blc05_2bit_guppi_57976_11502_HIP91971_0038.11545.818.22.45.109.vlar
07/01/2018 17:54:10 | SETI@home | Finished download of blc05_2bit_guppi_57976_11502_HIP91971_0038.11545.818.22.45.109.vlar
10 minutes would seem to be a good time to wait.
ID: 1911535 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911539 - Posted: 7 Jan 2018, 17:58:19 UTC

This is going back to original posts made a couple of years ago, (guessing) when we had issues with the two download servers and the way they handle download requests in a round-robin order. Fix was to list the servers in your Hosts file and not rely on DNS discovery mechanisms. The Hosts file is in the C:\Windows\System32\drivers\etc directory. Use Notepad to edit it and add this entry:

208.68.240.118	setiboincdata.ssl.berkeley.edu	# upload server Oct 2016
208.68.240.119	boinc2.ssl.berkeley.edu		# Georgem download server Oct 2016
208.68.240.126	setiboinc.ssl.berkeley.edu	# scheduler Oct 2016
#208.68.240.127  vader.ssl.berkeley.edu         # Vader download server Oct 2016


The hashtag in front of the 208.68.240.127 address means the line is commented out and is not read or acted upon. Vader is the server that is causing the stalled downloads. For now, you should only use the Georgem download server.

You can do a ipconfig /flushdns from the command line to make sure the Hosts file is read and your old DNS cache is flushed. The downloads will start working again. You can check to see which download server is being used to get work by setting the http_debug flag option in the Event Log options menu.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911539 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1911541 - Posted: 7 Jan 2018, 18:01:54 UTC - in response to Message 1911539.  
Last modified: 7 Jan 2018, 18:05:09 UTC

Use Notepad to edit it ...
Run Notepad 'As Administrator' on Windows 7 or later - otherwise you won't be able to save edits.

You can do a ipconfig /flushdns from the command line to make sure the Hosts file is read and your old DNS cache is flushed. The downloads will start working again. You can check to see which download server is being used to get work by setting the http_debug flag option in the Event Log options menu.
Very rarely needed. The new values in the hosts file are read automatically whenever you save it, and supersede any previous values - whether from the file or cached.
ID: 1911541 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1911546 - Posted: 7 Jan 2018, 18:21:39 UTC
Last modified: 7 Jan 2018, 18:29:42 UTC

OK. I successfully inserted the script Keith provided, flushed the dns cache and got confirmation in the change to the hosts file from my WInpatrol app.

I only changed one machine and will monitor it for hangs.

Thanks again for the detailed help.

JE

edit Strangely the machine I did not change has not encountered any hung downloads.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1911546 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1911557 - Posted: 7 Jan 2018, 18:48:57 UTC - in response to Message 1911546.  

Glad to hear your downloads are working again. Shoutout to @Richard for pointing out Hosts is a protected access file that needs Administrator privileges to edit.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1911557 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1911564 - Posted: 7 Jan 2018, 19:00:35 UTC

Yes, thank you, Richard, for the reminder.
I had never used the hosts file in the past, but I am now, and it seems to have done the trick.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1911564 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1911571 - Posted: 7 Jan 2018, 19:14:04 UTC

07/01/2018 20:06:53 | SETI@home | Not requesting tasks: some download is stalled
07/01/2018 20:06:57 | SETI@home | Scheduler request completed
07/01/2018 20:07:59 | SETI@home | Started download of blc05_2bit_guppi_57976_10027_HIP74981_0034.26512.818.21.44.135.vlar
07/01/2018 20:08:01 | SETI@home | Temporarily failed download of blc05_2bit_guppi_57976_10027_HIP74981_0034.26512.818.21.44.135.vlar: transient HTTP error
07/01/2018 20:08:01 | SETI@home | Backing off 00:16:07 on download of blc05_2bit_guppi_57976_10027_HIP74981_0034.26512.818.21.44.135.vlar
07/01/2018 20:08:02 |  | Project communication failed: attempting access to reference site
07/01/2018 20:08:04 |  | Internet access OK - project servers may be temporarily down.

ID: 1911571 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22491
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1911572 - Posted: 7 Jan 2018, 19:17:08 UTC

I too just suffered a "bump" like that on one of my crunchers. It appears to have cleared now, but.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1911572 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1911574 - Posted: 7 Jan 2018, 19:24:02 UTC - in response to Message 1911564.  

Yes, thank you, Richard, for the reminder.
I had never used the hosts file in the past, but I am now, and it seems to have done the trick.
Yes, it can help.

I recommend that you (and every other new user) comment out the active lines as soon as this particular problem is over - they would cause a problem if it's GeorgeM that falls over next time, and prevent Vader taking over.

I've let Eric enjoy his weekend in peace - unless he's reading this thread - but I'm going to email him and Jeff to say that one of our servers is missing, before the lab opens for buisness tomorrow. I'll keep you posted.
ID: 1911574 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1911577 - Posted: 7 Jan 2018, 19:28:52 UTC - in response to Message 1911574.  

Yes, thank you, Richard, for the reminder.
I had never used the hosts file in the past, but I am now, and it seems to have done the trick.
Yes, it can help.

I recommend that you (and every other new user) comment out the active lines as soon as this particular problem is over - they would cause a problem if it's GeorgeM that falls over next time, and prevent Vader taking over.

I've let Eric enjoy his weekend in peace - unless he's reading this thread - but I'm going to email him and Jeff to say that one of our servers is missing, before the lab opens for buisness tomorrow. I'll keep you posted.

Will do, Richard.
I shall watch this thread for news that the problem has been fixed, and neuter the hosts file.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1911577 · Report as offensive
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 36 · Next

Message boards : Number crunching : Panic Mode On (109) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.