Panic Mode On (115) Server Problems?

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304	Message 1982798 - Posted: 1 Mar 2019, 10:45:38 UTC - in response to Message 1982797. Seems like it, once i updated the cc_config to accept only one at a time again it seems to go through now. Not here. After 20min on the retry button it's time for bed. Grant Darwin NT ID: 1982798 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1982799 - Posted: 1 Mar 2019, 11:12:44 UTC - in response to Message 1982798. Last modified: 1 Mar 2019, 11:19:07 UTC Seems like it, once i updated the cc_config to accept only one at a time again it seems to go through now. Not here. After 20min on the retry button it's time for bed. Not here too. Good Night. Time to wake up here. LOL <edit>After some time it starts to work. 1 WU at a time only. But is better then nothing. ID: 1982799 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1982800 - Posted: 1 Mar 2019, 11:16:07 UTC - in response to Message 1982798. I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work. The only struggle is when u got multiple big hosts behind the same WAN ip that also seems to trigger the "block".. I'm not done there in "think outside the box" yet.. But it seems so because the one host that is at home has never stopped downloading since i fixed this, but the two hosts that reside in another location doesn't start up because there are two hosts from the same WAN ip that demands work and that seems to stop it. I've set network mode to never on one host now to test. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1982800 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1982801 - Posted: 1 Mar 2019, 11:22:51 UTC Last modified: 1 Mar 2019, 11:37:26 UTC 2 at a time is working here now, more no. But with 2 there are a lot of retrys. Back to 1 at a time. <edit> After DL >100 WU it stops to work. All ends on Project Backoff ID: 1982801 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1982804 - Posted: 1 Mar 2019, 11:50:02 UTC - in response to Message 1982800. I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work. The only struggle is when u got multiple big hosts behind the same WAN ip that also seems to trigger the "block".. I'm not done there in "think outside the box" yet.. But it seems so because the one host that is at home has never stopped downloading since i fixed this, but the two hosts that reside in another location doesn't start up because there are two hosts from the same WAN ip that demands work and that seems to stop it. I've set network mode to never on one host now to test. Now, that would be interesting to explore. The problem seems to be in getting a network connection to the server. I think the blockage is at the server operating system level, before any actual request is passed up the stack to any BOINC software. Like you, I have multiple machines behind a single NAT router. So they all have the same external, public, IP address, but obviously different private, non-routable, LAN IP addresses behind my firewall. In the past, I've found that I can access a different BOINC server - for the GPUGrid project - from any machine on the network. But if I've tried to access GPUGrid from a different machine 'soon afterwards' - say within a minute or so - the second connection is refused. If I wait a couple of minutes and try again, I can use the second machine for a normal connection. GPUGrid uses Apache/2.4.6 (CentOS): SETI uses Apache/2.2.15 (Scientific Linux). I don't know if either of those has an anti-DDoS setting that rejects repeated connection attempts from the same IP address but with different reply-to details. Do we have a network analyst in the house? ID: 1982804 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1982805 - Posted: 1 Mar 2019, 11:56:53 UTC - in response to Message 1982804. Last modified: 1 Mar 2019, 12:00:25 UTC I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work. The only struggle is when u got multiple big hosts behind the same WAN ip that also seems to trigger the "block".. I'm not done there in "think outside the box" yet.. But it seems so because the one host that is at home has never stopped downloading since i fixed this, but the two hosts that reside in another location doesn't start up because there are two hosts from the same WAN ip that demands work and that seems to stop it. I've set network mode to never on one host now to test. Now, that would be interesting to explore. The problem seems to be in getting a network connection to the server. I think the blockage is at the server operating system level, before any actual request is passed up the stack to any BOINC software. Like you, I have multiple machines behind a single NAT router. So they all have the same external, public, IP address, but obviously different private, non-routable, LAN IP addresses behind my firewall. In the past, I've found that I can access a different BOINC server - for the GPUGrid project - from any machine on the network. But if I've tried to access GPUGrid from a different machine 'soon afterwards' - say within a minute or so - the second connection is refused. If I wait a couple of minutes and try again, I can use the second machine for a normal connection. GPUGrid uses Apache/2.4.6 (CentOS): SETI uses Apache/2.2.15 (Scientific Linux). I don't know if either of those has an anti-DDoS setting that rejects repeated connection attempts from the same IP address but with different reply-to details. Do we have a network analyst in the house? That not explain why my single host, with a dedicated single IP connection has the same problem. And why it works and after DL >100 now it not works anymore. If the servers considered report of >100 WU a DDos attack we, who have fast hosts, are in a serious problems. ID: 1982805 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1982807 - Posted: 1 Mar 2019, 12:00:25 UTC - in response to Message 1982804. I'm not sure if that is it Richard. My #2 computer has a different external IP and it is not acting any different than my others that are on NAT. ID: 1982807 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1982808 - Posted: 1 Mar 2019, 12:06:17 UTC Well, my crunchers are off to visit Einstein, and I'm off to visit my bed. G'Night, y'all. It's been real :) ID: 1982808 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1982810 - Posted: 1 Mar 2019, 12:17:40 UTC - in response to Message 1982805. If the servers considered report of >100 WU a DDos attack we, who have fast hosts, are in a serious problems. Well, a report of 100 tasks in one go would only require a single connection to the server OS. And reports go to a different server, which so far as we know isn't overloaded to the same extent. I don't think we have those problems on this occasion. I wasn't meaning to suggest that my hypothetical DDoS-preventer was the sole cause of our problems: that seems to be simple overloading of the only available server. But knowing the mechanisms that might be coming into play might help us plan our remedial actions better. For example, it might be more effective to refill just one machine, then rest and have a cup of coffee, before attempting to refill the next. Or might it be better to fire them all off at once? My observations seemed to suggest that the latter approach was less likely to be successful. ID: 1982810 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1982812 - Posted: 1 Mar 2019, 12:35:49 UTC - in response to Message 1982811. We're just having a private conversation between those few of us who are interested. As with any conversation on a public street corner or in a cafe, others are free to join in or not as they choose. ID: 1982812 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1982813 - Posted: 1 Mar 2019, 12:37:54 UTC - in response to Message 1982746. Got home to a relatively cool house, due to 1 system being out of all work & the other out of CPU work. Found lots of downloads, all in excessive backoff mode. Tried "Retry Pending transfers" with no joy. From the looks of this thread, it's nice to know i'm not the only one. . . You certainly are not, 2 machines empty, others on the way. Several failed downloads on each machine preventing getting more and nothing can provoke the downloads to actually work. Stephen :( ID: 1982813 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1982814 - Posted: 1 Mar 2019, 12:40:43 UTC - in response to Message 1982757. Everything is ok here still and the AP's for today is into 3 figures. :-D Cheers. . . I am in the process of making a Wiggo voodoo doll ... :) Stephen :) ID: 1982814 ·

mmonnin Volunteer tester Send message Joined: 8 Jun 17 Posts: 58 Credit: 10,176,849 RAC: 0	Message 1982815 - Posted: 1 Mar 2019, 12:46:42 UTC Yesterday I was able to manually retry one at a time until an AP task was stuck on each PC. It wouldn't download so I couldn't get any more. I aborted the DL on one and received 100 tasks. 10 were able to download but nothing more. Hopefully this is fixed soon. ID: 1982815 ·

Cliff Harding Volunteer tester Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67	Message 1982816 - Posted: 1 Mar 2019, 12:49:19 UTC - in response to Message 1982811. S@H boards are a simple microcosm of life on planet earth hosting 7.5 billion people. Of them, BOINC users of distributed computing are about 4.5 million, and of S@H active users 93,000. Those actively complaining maybe 1 dozen at most. Is it any wonder that people take no notice. Make that a baker's dozen. I've been stuck since Wed and only have approx. 11 hrs. of CPU WUs left. I started Milkyway yesterday when I used up my GPU units, just to keep my GPUs busy. If things don't come back today, I guess just have to start letting Milkyway start crunching CPU units. I don't buy computers, I build them!! ID: 1982816 ·

Sesson Send message Joined: 29 Feb 16 Posts: 43 Credit: 1,353,463 RAC: 3	Message 1982818 - Posted: 1 Mar 2019, 12:53:28 UTC At the moment "boinc2.ssl.berkeley.edu" has two IP addresses assigned, 208.68.240.119 and 208.68.240.127. Only 208.68.240.119 works for me right now. HTTP connections to 208.68.240.127 are always stuck, probably because the web server is down and the operating system doesn't know where your downloads should go. ID: 1982818 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1982820 - Posted: 1 Mar 2019, 13:09:00 UTC - in response to Message 1982818. At the moment "boinc2.ssl.berkeley.edu" has two IP addresses assigned, 208.68.240.119 and 208.68.240.127. Only 208.68.240.119 works for me right now. HTTP connections to 208.68.240.127 are always stuck, probably because the web server is down and the operating system doesn't know where your downloads should go. That's odd. I see: Pinging georgem.ssl.berkeley.edu [208.68.240.119] with 32 bytes of data: Pinging vader.ssl.berkeley.edu [208.68.240.127] with 32 bytes of data: It's georgem which is shown as disabled on the server status page, suggesting that .119 should fail, .127 should succeed. And with Vader being the older, slower, less capable machine, that would explain why it's having difficulty coping on its own. I haven't checked the IP addresses on my own machines today, but I'll check next time I pay a bedside visit to one of my patients. ID: 1982820 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1982821 - Posted: 1 Mar 2019, 13:23:52 UTC Last modified: 1 Mar 2019, 13:28:47 UTC It's Alive! Somebody fixed the server hangover. Thanks ID: 1982821 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1982824 - Posted: 1 Mar 2019, 13:33:16 UTC - in response to Message 1982821. Are you sure you didn't wake up still drunk? ID: 1982824 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1982825 - Posted: 1 Mar 2019, 13:33:40 UTC - in response to Message 1982820. Well, first up I checked my DNS cache: boinc2.ssl.berkeley.edu ---------------------------------------- Record Name . . . . . : boinc2.ssl.berkeley.edu Record Type . . . . . : 1 Time To Live . . . . : 889 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 208.68.240.127 Record Name . . . . . : boinc2.ssl.berkeley.edu Record Type . . . . . : 1 Time To Live . . . . : 889 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 208.68.240.119 which is odd - we used to have a 5-minute rotation, so TTL was never above 300 seconds. Anyway, I retried the downloads - they all hit .127, and all failed. I then tried the old standby - flushdns - and got a new TTL (589 seconds - still too high). But trying more downloads started with failures, then 01/03/2019 13:16:30 \| SETI@home \| Started download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.11.vlar 01/03/2019 13:16:31 \| SETI@home \| [http] [ID#5188] Info: Trying 208.68.240.127... 01/03/2019 13:16:31 \| SETI@home \| [http] [ID#5188] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8773) 01/03/2019 13:16:31 \| SETI@home \| [http] [ID#5188] Info: Empty reply from server 01/03/2019 13:16:31 \| SETI@home \| [http] HTTP error: Server returned nothing (no headers, no data) 01/03/2019 13:16:32 \| SETI@home \| Temporarily failed download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.11.vlar: transient HTTP error 01/03/2019 13:16:32 \| SETI@home \| Started download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.242.vlar 01/03/2019 13:16:33 \| SETI@home \| [http] [ID#5189] Info: Hostname boinc2.ssl.berkeley.edu was found in DNS cache 01/03/2019 13:16:33 \| SETI@home \| [http] [ID#5189] Info: Trying 208.68.240.127... 01/03/2019 13:16:33 \| SETI@home \| [http] [ID#5189] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774) 01/03/2019 13:16:36 \| SETI@home \| [http] [ID#5189] Info: Connection #8774 to host boinc2.ssl.berkeley.edu left intact 01/03/2019 13:16:36 \| SETI@home \| Finished download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.242.vlar 01/03/2019 13:16:36 \| SETI@home \| Started download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.17.vlar 01/03/2019 13:16:37 \| SETI@home \| [http] [ID#5190] Info: Re-using existing connection! (#8774) with host boinc2.ssl.berkeley.edu 01/03/2019 13:16:37 \| SETI@home \| [http] [ID#5190] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774) 01/03/2019 13:16:40 \| SETI@home \| Finished download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.17.vlar 01/03/2019 13:16:40 \| SETI@home \| Started download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.248.vlar 01/03/2019 13:16:41 \| SETI@home \| [http] [ID#5191] Info: Re-using existing connection! (#8774) with host boinc2.ssl.berkeley.edu 01/03/2019 13:16:41 \| SETI@home \| [http] [ID#5191] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774) 01/03/2019 13:16:45 \| SETI@home \| Finished download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.248.vlar and so on - about a dozen downloads in all, all successful on that "Re-using existing connection" to 208.68.240.127 ID: 1982825 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1982827 - Posted: 1 Mar 2019, 13:37:57 UTC I have a computer elsewhere, and that is doing "OK", but main main pile are not getting anything, so it might be that a DNS table has got screwed somewhere, and that could take a good few hours to resolve.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1982827 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.