Message boards :
Number crunching :
Panic Mode On (115) Server Problems?
Message board moderation
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 31 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Seems like it, once i updated the cc_config to accept only one at a time again it seems to go through now. Not here. After 20min on the retry button it's time for bed. Grant Darwin NT |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Seems like it, once i updated the cc_config to accept only one at a time again it seems to go through now. Not here too. Good Night. Time to wake up here. LOL <edit>After some time it starts to work. 1 WU at a time only. But is better then nothing. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work. The only struggle is when u got multiple big hosts behind the same WAN ip that also seems to trigger the "block".. I'm not done there in "think outside the box" yet.. But it seems so because the one host that is at home has never stopped downloading since i fixed this, but the two hosts that reside in another location doesn't start up because there are two hosts from the same WAN ip that demands work and that seems to stop it. I've set network mode to never on one host now to test. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
2 at a time is working here now, more no. But with 2 there are a lot of retrys. Back to 1 at a time. <edit> After DL >100 WU it stops to work. All ends on Project Backoff |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work.Now, that would be interesting to explore. The problem seems to be in getting a network connection to the server. I think the blockage is at the server operating system level, before any actual request is passed up the stack to any BOINC software. Like you, I have multiple machines behind a single NAT router. So they all have the same external, public, IP address, but obviously different private, non-routable, LAN IP addresses behind my firewall. In the past, I've found that I can access a different BOINC server - for the GPUGrid project - from any machine on the network. But if I've tried to access GPUGrid from a different machine 'soon afterwards' - say within a minute or so - the second connection is refused. If I wait a couple of minutes and try again, I can use the second machine for a normal connection. GPUGrid uses Apache/2.4.6 (CentOS): SETI uses Apache/2.2.15 (Scientific Linux). I don't know if either of those has an anti-DDoS setting that rejects repeated connection attempts from the same IP address but with different reply-to details. Do we have a network analyst in the house? |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I needed to restart boinc too. Seems like it takes some time for the IP to get accepted, once it got accepted it seems to work.Now, that would be interesting to explore. That not explain why my single host, with a dedicated single IP connection has the same problem. And why it works and after DL >100 now it not works anymore. If the servers considered report of >100 WU a DDos attack we, who have fast hosts, are in a serious problems. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I'm not sure if that is it Richard. My #2 computer has a different external IP and it is not acting any different than my others that are on NAT. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Well, my crunchers are off to visit Einstein, and I'm off to visit my bed. G'Night, y'all. It's been real :) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
If the servers considered report of >100 WU a DDos attack we, who have fast hosts, are in a serious problems.Well, a report of 100 tasks in one go would only require a single connection to the server OS. And reports go to a different server, which so far as we know isn't overloaded to the same extent. I don't think we have those problems on this occasion. I wasn't meaning to suggest that my hypothetical DDoS-preventer was the sole cause of our problems: that seems to be simple overloading of the only available server. But knowing the mechanisms that might be coming into play might help us plan our remedial actions better. For example, it might be more effective to refill just one machine, then rest and have a cup of coffee, before attempting to refill the next. Or might it be better to fire them all off at once? My observations seemed to suggest that the latter approach was less likely to be successful. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
We're just having a private conversation between those few of us who are interested. As with any conversation on a public street corner or in a cafe, others are free to join in or not as they choose. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Got home to a relatively cool house, due to 1 system being out of all work & the other out of CPU work. Found lots of downloads, all in excessive backoff mode. Tried "Retry Pending transfers" with no joy. . . You certainly are not, 2 machines empty, others on the way. Several failed downloads on each machine preventing getting more and nothing can provoke the downloads to actually work. Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Everything is ok here still and the AP's for today is into 3 figures. :-D . . I am in the process of making a Wiggo voodoo doll ... :) Stephen :) |
mmonnin Send message Joined: 8 Jun 17 Posts: 58 Credit: 10,176,849 RAC: 0 |
Yesterday I was able to manually retry one at a time until an AP task was stuck on each PC. It wouldn't download so I couldn't get any more. I aborted the DL on one and received 100 tasks. 10 were able to download but nothing more. Hopefully this is fixed soon. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
S@H boards are a simple microcosm of life on planet earth hosting 7.5 billion people. Of them, BOINC users of distributed computing are about 4.5 million, and of S@H active users 93,000. Those actively complaining maybe 1 dozen at most. Make that a baker's dozen. I've been stuck since Wed and only have approx. 11 hrs. of CPU WUs left. I started Milkyway yesterday when I used up my GPU units, just to keep my GPUs busy. If things don't come back today, I guess just have to start letting Milkyway start crunching CPU units. I don't buy computers, I build them!! |
Sesson Send message Joined: 29 Feb 16 Posts: 43 Credit: 1,353,463 RAC: 3 |
At the moment "boinc2.ssl.berkeley.edu" has two IP addresses assigned, 208.68.240.119 and 208.68.240.127. Only 208.68.240.119 works for me right now. HTTP connections to 208.68.240.127 are always stuck, probably because the web server is down and the operating system doesn't know where your downloads should go. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
At the moment "boinc2.ssl.berkeley.edu" has two IP addresses assigned, 208.68.240.119 and 208.68.240.127. Only 208.68.240.119 works for me right now. HTTP connections to 208.68.240.127 are always stuck, probably because the web server is down and the operating system doesn't know where your downloads should go.That's odd. I see: Pinging georgem.ssl.berkeley.edu [208.68.240.119] with 32 bytes of data: Pinging vader.ssl.berkeley.edu [208.68.240.127] with 32 bytes of data: It's georgem which is shown as disabled on the server status page, suggesting that .119 should fail, .127 should succeed. And with Vader being the older, slower, less capable machine, that would explain why it's having difficulty coping on its own. I haven't checked the IP addresses on my own machines today, but I'll check next time I pay a bedside visit to one of my patients. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
It's Alive! Somebody fixed the server hangover. Thanks |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Are you sure you didn't wake up still drunk? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Well, first up I checked my DNS cache: boinc2.ssl.berkeley.edu ---------------------------------------- Record Name . . . . . : boinc2.ssl.berkeley.edu Record Type . . . . . : 1 Time To Live . . . . : 889 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 208.68.240.127 Record Name . . . . . : boinc2.ssl.berkeley.edu Record Type . . . . . : 1 Time To Live . . . . : 889 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 208.68.240.119which is odd - we used to have a 5-minute rotation, so TTL was never above 300 seconds. Anyway, I retried the downloads - they all hit .127, and all failed. I then tried the old standby - flushdns - and got a new TTL (589 seconds - still too high). But trying more downloads started with failures, then 01/03/2019 13:16:30 | SETI@home | Started download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.11.vlar 01/03/2019 13:16:31 | SETI@home | [http] [ID#5188] Info: Trying 208.68.240.127... 01/03/2019 13:16:31 | SETI@home | [http] [ID#5188] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8773) 01/03/2019 13:16:31 | SETI@home | [http] [ID#5188] Info: Empty reply from server 01/03/2019 13:16:31 | SETI@home | [http] HTTP error: Server returned nothing (no headers, no data) 01/03/2019 13:16:32 | SETI@home | Temporarily failed download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.11.vlar: transient HTTP error 01/03/2019 13:16:32 | SETI@home | Started download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.242.vlar 01/03/2019 13:16:33 | SETI@home | [http] [ID#5189] Info: Hostname boinc2.ssl.berkeley.edu was found in DNS cache 01/03/2019 13:16:33 | SETI@home | [http] [ID#5189] Info: Trying 208.68.240.127... 01/03/2019 13:16:33 | SETI@home | [http] [ID#5189] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774) 01/03/2019 13:16:36 | SETI@home | [http] [ID#5189] Info: Connection #8774 to host boinc2.ssl.berkeley.edu left intact 01/03/2019 13:16:36 | SETI@home | Finished download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.242.vlar 01/03/2019 13:16:36 | SETI@home | Started download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.17.vlar 01/03/2019 13:16:37 | SETI@home | [http] [ID#5190] Info: Re-using existing connection! (#8774) with host boinc2.ssl.berkeley.edu 01/03/2019 13:16:37 | SETI@home | [http] [ID#5190] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774) 01/03/2019 13:16:40 | SETI@home | Finished download of blc36_2bit_guppi_58405_84300_HIP86137_0023.5835.409.22.45.17.vlar 01/03/2019 13:16:40 | SETI@home | Started download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.248.vlar 01/03/2019 13:16:41 | SETI@home | [http] [ID#5191] Info: Re-using existing connection! (#8774) with host boinc2.ssl.berkeley.edu 01/03/2019 13:16:41 | SETI@home | [http] [ID#5191] Info: Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#8774) 01/03/2019 13:16:45 | SETI@home | Finished download of blc36_2bit_guppi_58405_85309_GJ687_0026.5824.409.21.44.248.vlarand so on - about a dozen downloads in all, all successful on that "Re-using existing connection" to 208.68.240.127 |
rob smith Send message Joined: 7 Mar 03 Posts: 22539 Credit: 416,307,556 RAC: 380 |
I have a computer elsewhere, and that is doing "OK", but main main pile are not getting anything, so it might be that a DNS table has got screwed somewhere, and that could take a good few hours to resolve.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.