Panic Mode On (115) Server Problems?

Message boards : Number crunching : Panic Mode On (115) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 31 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1982798 - Posted: 1 Mar 2019, 10:45:38 UTC - in response to Message 1982797.  

Seems like it, once i updated the cc_config to accept only one at a time again it seems to go through now.

Not here.
After 20min on the retry button it's time for bed.
Grant
Darwin NT
ID: 1982798 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1982799 - Posted: 1 Mar 2019, 11:12:44 UTC - in response to Message 1982798.  
Last modified: 1 Mar 2019, 11:19:07 UTC

Seems like it, once i updated the cc_config to accept only one at a time again it seems to go through now.

Not here.
After 20min on the retry button it's time for bed.

Not here too. Good Night. Time to wake up here. LOL

<edit>After some time it starts to work. 1 WU at a time only. But is better then nothing.
ID: 1982799 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1982800 - Posted: 1 Mar 2019, 11:16:07 UTC - in response to Message 1982798.  

I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work.
The only struggle is when u got multiple big hosts behind the same WAN ip that also seems to trigger the "block"..

I'm not done there in "think outside the box" yet.. But it seems so because the one host that is at home has never stopped downloading since i fixed this, but the two hosts that reside in another location doesn't start up because there are two hosts from the same WAN ip that demands work and that seems to stop it.

I've set network mode to never on one host now to test.

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1982800 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1982801 - Posted: 1 Mar 2019, 11:22:51 UTC
Last modified: 1 Mar 2019, 11:37:26 UTC

2 at a time is working here now, more no.

But with 2 there are a lot of retrys. Back to 1 at a time.

<edit> After DL >100 WU it stops to work. All ends on Project Backoff
ID: 1982801 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1982804 - Posted: 1 Mar 2019, 11:50:02 UTC - in response to Message 1982800.  

I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work.
The only struggle is when u got multiple big hosts behind the same WAN ip that also seems to trigger the "block"..

I'm not done there in "think outside the box" yet.. But it seems so because the one host that is at home has never stopped downloading since i fixed this, but the two hosts that reside in another location doesn't start up because there are two hosts from the same WAN ip that demands work and that seems to stop it.

I've set network mode to never on one host now to test.
Now, that would be interesting to explore.

The problem seems to be in getting a network connection to the server. I think the blockage is at the server operating system level, before any actual request is passed up the stack to any BOINC software.

Like you, I have multiple machines behind a single NAT router. So they all have the same external, public, IP address, but obviously different private, non-routable, LAN IP addresses behind my firewall.

In the past, I've found that I can access a different BOINC server - for the GPUGrid project - from any machine on the network. But if I've tried to access GPUGrid from a different machine 'soon afterwards' - say within a minute or so - the second connection is refused. If I wait a couple of minutes and try again, I can use the second machine for a normal connection.

GPUGrid uses Apache/2.4.6 (CentOS): SETI uses Apache/2.2.15 (Scientific Linux). I don't know if either of those has an anti-DDoS setting that rejects repeated connection attempts from the same IP address but with different reply-to details. Do we have a network analyst in the house?
ID: 1982804 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1982805 - Posted: 1 Mar 2019, 11:56:53 UTC - in response to Message 1982804.  
Last modified: 1 Mar 2019, 12:00:25 UTC

I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work.
The only struggle is when u got multiple big hosts behind the same WAN ip that also seems to trigger the "block"..

I'm not done there in "think outside the box" yet.. But it seems so because the one host that is at home has never stopped downloading since i fixed this, but the two hosts that reside in another location doesn't start up because there are two hosts from the same WAN ip that demands work and that seems to stop it.

I've set network mode to never on one host now to test.
Now, that would be interesting to explore.

The problem seems to be in getting a network connection to the server. I think the blockage is at the server operating system level, before any actual request is passed up the stack to any BOINC software.

Like you, I have multiple machines behind a single NAT router. So they all have the same external, public, IP address, but obviously different private, non-routable, LAN IP addresses behind my firewall.

In the past, I've found that I can access a different BOINC server - for the GPUGrid project - from any machine on the network. But if I've tried to access GPUGrid from a different machine 'soon afterwards' - say within a minute or so - the second connection is refused. If I wait a couple of minutes and try again, I can use the second machine for a normal connection.

GPUGrid uses Apache/2.4.6 (CentOS): SETI uses Apache/2.2.15 (Scientific Linux). I don't know if either of those has an anti-DDoS setting that rejects repeated connection attempts from the same IP address but with different reply-to details. Do we have a network analyst in the house?

That not explain why my single host, with a dedicated single IP connection has the same problem.

And why it works and after DL >100 now it not works anymore.

If the servers considered report of >100 WU a DDos attack we, who have fast hosts, are in a serious problems.
ID: 1982805 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1982807 - Posted: 1 Mar 2019, 12:00:25 UTC - in response to Message 1982804.  

I'm not sure if that is it Richard.
My #2 computer has a different external IP and it is not acting any different than my others that are on NAT.
ID: 1982807 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1982808 - Posted: 1 Mar 2019, 12:06:17 UTC

Well, my crunchers are off to visit Einstein, and I'm off to visit my bed. G'Night, y'all. It's been real :)
ID: 1982808 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1982810 - Posted: 1 Mar 2019, 12:17:40 UTC - in response to Message 1982805.  

If the servers considered report of >100 WU a DDos attack we, who have fast hosts, are in a serious problems.
Well, a report of 100 tasks in one go would only require a single connection to the server OS. And reports go to a different server, which so far as we know isn't overloaded to the same extent. I don't think we have those problems on this occasion.

I wasn't meaning to suggest that my hypothetical DDoS-preventer was the sole cause of our problems: that seems to be simple overloading of the only available server. But knowing the mechanisms that might be coming into play might help us plan our remedial actions better.

For example, it might be more effective to refill just one machine, then rest and have a cup of coffee, before attempting to refill the next. Or might it be better to fire them all off at once? My observations seemed to suggest that the latter approach was less likely to be successful.
ID: 1982810 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1982812 - Posted: 1 Mar 2019, 12:35:49 UTC - in response to Message 1982811.  

We're just having a private conversation between those few of us who are interested. As with any conversation on a public street corner or in a cafe, others are free to join in or not as they choose.
ID: 1982812 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1982813 - Posted: 1 Mar 2019, 12:37:54 UTC - in response to Message 1982746.  

Got home to a relatively cool house, due to 1 system being out of all work & the other out of CPU work. Found lots of downloads, all in excessive backoff mode. Tried "Retry Pending transfers" with no joy.
From the looks of this thread, it's nice to know i'm not the only one.


. . You certainly are not, 2 machines empty, others on the way. Several failed downloads on each machine preventing getting more and nothing can provoke the downloads to actually work.

Stephen

:(
ID: 1982813 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1982814 - Posted: 1 Mar 2019, 12:40:43 UTC - in response to Message 1982757.  

Everything is ok here still and the AP's for today is into 3 figures. :-D

Cheers.


. . I am in the process of making a Wiggo voodoo doll ...

:)

Stephen

:)
ID: 1982814 · Report as offensive
mmonnin
Volunteer tester

Send message
Joined: 8 Jun 17
Posts: 58
Credit: 10,176,849
RAC: 0
United States
Message 1982815 - Posted: 1 Mar 2019, 12:46:42 UTC

Yesterday I was able to manually retry one at a time until an AP task was stuck on each PC. It wouldn't download so I couldn't get any more. I aborted the DL on one and received 100 tasks. 10 were able to download but nothing more. Hopefully this is fixed soon.
ID: 1982815 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1982816 - Posted: 1 Mar 2019, 12:49:19 UTC - in response to Message 1982811.  

S@H boards are a simple microcosm of life on planet earth hosting 7.5 billion people. Of them, BOINC users of distributed computing are about 4.5 million, and of S@H active users 93,000. Those actively complaining maybe 1 dozen at most.

Is it any wonder that people take no notice.


Make that a baker's dozen. I've been stuck since Wed and only have approx. 11 hrs. of CPU WUs left. I started Milkyway yesterday when I used up my GPU units, just to keep my GPUs busy. If things don't come back today, I guess just have to start letting Milkyway start crunching CPU units.


I don't buy computers, I build them!!
ID: 1982816 · Report as offensive
Sesson

Send message
Joined: 29 Feb 16
Posts: 43
Credit: 1,353,463
RAC: 3
Message 1982818 - Posted: 1 Mar 2019, 12:53:28 UTC

At the moment "boinc2.ssl.berkeley.edu" has two IP addresses assigned, 208.68.240.119 and 208.68.240.127. Only 208.68.240.119 works for me right now. HTTP connections to 208.68.240.127 are always stuck, probably because the web server is down and the operating system doesn't know where your downloads should go.
ID: 1982818 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1982820 - Posted: 1 Mar 2019, 13:09:00 UTC - in response to Message 1982818.  

At the moment "boinc2.ssl.berkeley.edu" has two IP addresses assigned, 208.68.240.119 and 208.68.240.127. Only 208.68.240.119 works for me right now. HTTP connections to 208.68.240.127 are always stuck, probably because the web server is down and the operating system doesn't know where your downloads should go.
That's odd. I see:

Pinging georgem.ssl.berkeley.edu [208.68.240.119] with 32 bytes of data:
Pinging vader.ssl.berkeley.edu [208.68.240.127] with 32 bytes of data:

It's georgem which is shown as disabled on the server status page, suggesting that .119 should fail, .127 should succeed. And with Vader being the older, slower, less capable machine, that would explain why it's having difficulty coping on its own.

I haven't checked the IP addresses on my own machines today, but I'll check next time I pay a bedside visit to one of my patients.
ID: 1982820 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1982821 - Posted: 1 Mar 2019, 13:23:52 UTC
Last modified: 1 Mar 2019, 13:28:47 UTC

It's Alive! Somebody fixed the server hangover. Thanks
ID: 1982821 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1982824 - Posted: 1 Mar 2019, 13:33:16 UTC - in response to Message 1982821.  

Are you sure you didn't wake up still drunk?
ID: 1982824 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1982825 - Posted: 1 Mar 2019, 13:33:40 UTC - in response to Message 1982820.  

Well, first up I checked my DNS cache:

boinc2.ssl.berkeley.edu
----------------------------------------
Record Name . . . . . : boinc2.ssl.berkeley.edu
Record Type . . . . . : 1
Time To Live  . . . . : 889
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 208.68.240.127

Record Name . . . . . : boinc2.ssl.berkeley.edu
Record Type . . . . . : 1
Time To Live  . . . . : 889
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 208.68.240.119
which is odd - we used to have a 5-minute rotation, so TTL was never above 300 seconds. Anyway, I retried the downloads - they all hit .127, and all failed.

I then tried the old standby - flushdns - and got a new TTL (589 seconds - still too high). But trying more downloads started with failures, then

01/03/2019 13:16:30 | SETI@home | Started download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.11.vlar
01/03/2019 13:16:31 | SETI@home | [http] [ID#5188] Info:    Trying 208.68.240.127...
01/03/2019 13:16:31 | SETI@home | [http] [ID#5188] Info:  Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8773)
01/03/2019 13:16:31 | SETI@home | [http] [ID#5188] Info:  Empty reply from server
01/03/2019 13:16:31 | SETI@home | [http] HTTP error: Server returned nothing (no headers, no data)
01/03/2019 13:16:32 | SETI@home | Temporarily failed download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.11.vlar: transient HTTP error
01/03/2019 13:16:32 | SETI@home | Started download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.242.vlar
01/03/2019 13:16:33 | SETI@home | [http] [ID#5189] Info:  Hostname boinc2.ssl.berkeley.edu was found in DNS cache
01/03/2019 13:16:33 | SETI@home | [http] [ID#5189] Info:    Trying 208.68.240.127...
01/03/2019 13:16:33 | SETI@home | [http] [ID#5189] Info:  Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774)
01/03/2019 13:16:36 | SETI@home | [http] [ID#5189] Info:  Connection #8774 to host boinc2.ssl.berkeley.edu left intact
01/03/2019 13:16:36 | SETI@home | Finished download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.242.vlar
01/03/2019 13:16:36 | SETI@home | Started download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.17.vlar
01/03/2019 13:16:37 | SETI@home | [http] [ID#5190] Info:  Re-using existing connection! (#8774) with host boinc2.ssl.berkeley.edu
01/03/2019 13:16:37 | SETI@home | [http] [ID#5190] Info:  Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774)
01/03/2019 13:16:40 | SETI@home | Finished download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.17.vlar
01/03/2019 13:16:40 | SETI@home | Started download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.248.vlar
01/03/2019 13:16:41 | SETI@home | [http] [ID#5191] Info:  Re-using existing connection! (#8774) with host boinc2.ssl.berkeley.edu
01/03/2019 13:16:41 | SETI@home | [http] [ID#5191] Info:  Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774)
01/03/2019 13:16:45 | SETI@home | Finished download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.248.vlar
and so on - about a dozen downloads in all, all successful on that "Re-using existing connection" to 208.68.240.127
ID: 1982825 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22539
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1982827 - Posted: 1 Mar 2019, 13:37:57 UTC

I have a computer elsewhere, and that is doing "OK", but main main pile are not getting anything, so it might be that a DNS table has got screwed somewhere, and that could take a good few hours to resolve....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1982827 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (115) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.