Message boards :
Number crunching :
Panic Mode On (26) Server problems
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · Next
Author | Message |
---|---|
ML1 Send message Joined: 25 Nov 01 Posts: 20291 Credit: 7,508,002 RAC: 20 |
OK, just to be sure amidst the plethora of postings... Is this all a problem with certain versions of Windows, or a Libcurl problem, or a Windows-Libcurl problem, or a particular version of Boinc, or something else? Not seen any DNS problems here for any of downloads or pings for either of the ssl2 s@h servers addresses... Am I missing something?... Good luck, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
@ Pappa: Regarding your first snippet, this is the part which has me scratching my head. You and Richard are showing zero for a TTL on hosts file entries (specifically localhost). However, here's the current cache for my XP 64 host: Windows IP Configuration 1.0.0.127.in-addr.arpa ---------------------------------------- Record Name . . . . . : 1.0.0.127.in-addr.arpa. Record Type . . . . . : 12 Time To Live . . . . : 581675 Data Length . . . . . : 8 Section . . . . . . . : Answer PTR Record . . . . . : localhost localhost ---------------------------------------- Record Name . . . . . : localhost Record Type . . . . . : 1 Time To Live . . . . : 581675 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 127.0.0.1 Curious. So where is that coming from (which is counting down just like any other one does and isn't affected by a flush)? Aliantor |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Curious. So where is that coming from (which is counting down just like any other one does and isn't affected by a flush)? If you check your registry, I think you will find a value for the max ttl set in there. My Vista system did not have settings for max ttl - the MS defaults are 24 hours for a successful lookup and 15 mins for an unsuccessful lookup - and displayed 0 in the dns cache. However, I have now set 5 mins (300 secs) as the value in the registry and get the following: boinc2.ssl.berkeley.edu ---------------------------------------- Record Name . . . . . : boinc2.ssl.berke Record Type . . . . . : 1 Time To Live . . . . : 300 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 208.68.240.18 boinc2.ssl.berkeley.edu ---------------------------------------- No records of type AAAA localhost ---------------------------------------- Record Name . . . . . : localhost Record Type . . . . . : 1 Time To Live . . . . : 300 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 127.0.0.1 localhost ---------------------------------------- Record Name . . . . . : localhost Record Type . . . . . : 28 Time To Live . . . . : 300 Data Length . . . . . : 16 Section . . . . . . . : Answer AAAA Record . . . . . : ::1 F. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I got (with XP 32) a zero TTL for hosts entries - localhost obviously, but confirmed by adding, and then removing, boinc2.ssl.berkeley.edu to/from hosts. Wasn't XP 64 based on Server 2003 code? That could well have different DNS handling, what with IPv6, the likelihood of an internal DNS server for active directory, etc. etc. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
OK, just to be sure amidst the plethora of postings... Well my take on it all is: 1.) One of the SAH DL servers has a problem. 2.) SAH's round robin DNS is doing what it is supposed to. 3.) Windows DNS Client service caching is not the cause, and never was. 4.) The problem is most likely in libcurl, BOINC, or a combination of the two, and has been a problem for sometime now. It just doesn't get the opportunity to raise its ugly head all that often. I mean there aren't that many long weekends in a year! :-) Alinator |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
In my VISTA host (x86) it looks different, this host has no troubles connecting. localhost ---------------------------------------- Recordnaam . . . . . : localhost Recordtype . . . . . : 1 Time-to-Live . . . . : 86400 Gegevenslengte . . . : 4 Sectie . . . . . . . : antwoord A-record (host). . . : 127.0.0.1 localhost ---------------------------------------- Recordnaam . . . . . : localhost Recordtype . . . . . : 28 Time-to-Live . . . . : 86400 Gegevenslengte . . . : 16 Sectie . . . . . . . : antwoord AAAA-record . . . . : ::1 But also NO downloads. Apart from trying to DownLoad DLL's? On my X64 host. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmmm.... You don't seem to have any entries pointing to the DL servers. You might have to force a retry on the transfers, or restart BOINC and/or reboot the machine to see it appear in /displaydns. Alinator |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
I got (with XP 32) a zero TTL for hosts entries - localhost obviously, but confirmed by adding, and then removing, boinc2.ssl.berkeley.edu to/from hosts. Richard XP and 2003 Server are the same code base. Issues with the Server side did not allow its release when XP Released. 64 bit was a work in progress. At that time it was "only" intel's Itanium's which had also issues. AMD brought in a true 64 bit Processor and Bus and XP 64 and 2K3 were released 2K3 64 bit is based on AMD's architexture. As I recall changes to the file system and 64 bit were the key things that stopped it (somewhere I still have my Server Bits T-shirt Active Directory Lab). Backfitting IPv6 has made some changes in the Stack handling. Please consider a Donation to the Seti Project. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Hmmmm.... Just restart BOINC - nothing else. I think we've established by exhaustion that this is nothing to do with Windows - so no reboot needed. After a BOINC restart, look at ipconfig /displaydns again. The download server should be listed (BOINC will try by itself, if the downloads have been waiting a long time). If the IP address ending .13 is listed first, you will need to repeat the treatment, maybe five minutes later. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
3.) Windows DNS Client service caching is not the cause, and never was. Given that I've been bit more than once by DNS caching in Windows, I think I'd have a bit of trouble with the "never was" part. But we've now got some pretty good evidence that libcurl isn't behaving as expected. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Agreed, that's where the definition for it is, but there isn't any TTL info in the hosts file. Remember that the hosts file dates from 1987, when every single computer on the internet had a complete, current list of every other computer on the internet. It doesn't have a TTL because it was (manually) replaced when the operator noticed it was a bit out of date. ... and when I learned DNS years ago, it was considered good form to have a long TTL -- and it should still be that way. The longest possible valid TTL is 42 days. I've mellowed a bit. Most of my zones have TTL set to 432,000 (5 days). |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
You might want to present this as a libcurl bug..... Agreed. ... and now we have some solid evidence. I'm guessing that the best fix will be for BOINC to simply reset libcurl periodically, instead of waiting for the libcurl developers to fix it. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Hi, now I do get a normal response: Network Card(s): 1 NIC(s) Installed. [01]: Marvell Yukon 88E8056 PCI-E Gigabit Ethernet Controller Connection Name: Local Area Connection DHCP Enabled: Yes DHCP Server: 192.168.2.1 IP address(es) [01]: 192.168.2.13 C:\Documents and Settings\Administrator.THUNDER>tracert setiathome.ssl.berkeley.edu Tracing route to setiathome.ssl.berkeley.edu [128.32.18.150] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms SX551E4C422 [192.168.2.1] 2 22 ms 21 ms 21 ms 195.190.249.32 3 24 ms 24 ms 24 ms iawxsrt-rt2-bb21-ge-1-1-0.wxs.nl [213.75.64.137] 4 24 ms 24 ms 24 ms 213.75.64.166 5 * * * Request timed out. 6 27 ms 26 ms 27 ms asd2-rou-1021.NL.eurorings.net [134.222.231.129] 7 122 ms 122 ms 121 ms nyk-s1-rou-1001.US.eurorings.net [134.222.226.170] 8 116 ms 117 ms 117 ms nyk-s1-rou-1021.US.eurorings.net [134.222.231.238] 9 131 ms 121 ms 122 ms ahbn-s1-rou-1041.US.eurorings.net [134.222.228.10] 10 122 ms 122 ms 121 ms ahbn-s1-rou-1001.US.eurorings.net [134.222.226.53] 11 122 ms 122 ms 122 ms eeq-exchange.tr01-asbnva01.transitrail.net [206.223.115.45] 12 139 ms 140 ms 139 ms te4-1.tr01-chcgil01.transitrail.net [137.164.129.11] 13 196 ms 196 ms 196 ms te4-1.tr01-sttlwa01.transitrail.net [137.164.129.2] 14 215 ms 215 ms 215 ms te4-1--260.tr01-plalca01.transitrail.net [137.164.129.34] 15 215 ms 215 ms 214 ms calren-2nd.tr01-plalca01.transitrail.net [137.164.131.94] 16 203 ms 203 ms 203 ms dc-svl-core1--svl-px1-10ge-2.cenic.net [137.164.46.12] 17 204 ms 206 ms 205 ms dc-oak-core1--svl-core1-ge-1.cenic.net [137.164.46.213] 18 205 ms 205 ms 205 ms dc-oak-agg2--oak-core1-10ge.cenic.net [137.164.47.116] 19 206 ms 205 ms 206 ms ucb--oak-dc2-ge.cenic.net [137.164.23.30] 20 206 ms 206 ms 205 ms t2-3.inr-201-eva.Berkeley.EDU [128.32.0.37] 21 206 ms 205 ms 205 ms g6-1.inr-230-spr.Berkeley.EDU [128.32.255.110] 22 * * * Request timed out. 23 207 ms 206 ms 206 ms thinman.ssl.berkeley.edu [128.32.18.150] Trace complete. Looks OK, to me. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Interestingly enough, I had the IP addresses stored in my hosts file on one machine (Intel P4) for 24 hours. I have removed the entry yesterday afternoon, in anticipation of the guys here fixing things & exited & restarted BOINC. Weird though, I have since not had any problems downloading anything. My other machine (AMD 2200+) hasn't had any problems all weekend long... I never had to change its hosts file, do any flushing of the DNS cache, etc. It downloaded work without a hitch all through these problems. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
If I look to the cricket graph.. (it's again allowed to post this URL?) Since the unplanned outage (damaged internet switch) and the impossible DL.. the DL/UL traffic decreased ~ 50 %. So only ~ 50 % of the members could enable (with work around) the DL, because they looked in the forum. And the other ~ 50 %? They are now angry and think about to leave? Not well advertisement. Or, it's because of the PCs 'of' [NEZ]? |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
I don't think, that so many look in the forum. Many (most ?) people usually shut down in the evening and start up next morning -> no problem with cached IPs. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
The download level shown in the cricket graphs is where I would expect it to be when there are no Astropulse being split. And as most crunchers are "set and forget", they won't even know there has been a problem with downloading so don't expect a mass exodus. F. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
the DL/UL traffic decreased ~ 50 %... That's what I would expect if only 50% of the download servers are in operation ;-) Gruß, Gundolf |
wulf 21 Send message Joined: 18 Apr 09 Posts: 93 Credit: 26,337,213 RAC: 43 |
so, summing it up: you think that the http_debug log that tells that it will try out both IPs is wrong and its really only trying the first one? |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
I'm not sure if you meant me, because you didn't "reply" but used "post to thread". However, I didn't say anything about any http_debug log; I only said that I expect the cricket graphs to be at 50% if only one of two servers is running. And I answered to Sutaru's post, as you can easily see in the header line of my post. Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.