Panic Mode On (26) Server problems

Message boards : Number crunching : Panic Mode On (26) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next

AuthorMessage
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 951147 - Posted: 30 Nov 2009, 18:25:10 UTC

Thank you Sutaru, adding the 2 addresses to hosts file solved it for all of my machines too.
ID: 951147 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 951150 - Posted: 30 Nov 2009, 18:27:46 UTC - in response to Message 951145.  
Last modified: 30 Nov 2009, 18:29:01 UTC

@ Richard:

LOL...

Agreed!

But when I start reading about workarounds which involve messing around with your network bindings, I start thinking the solution might be worse than the disease! :-D

Alinator
ID: 951150 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 951153 - Posted: 30 Nov 2009, 18:57:35 UTC - in response to Message 951145.  

As I was saying to Ned a couple of days ago, there are two separate branches to the problem, and hence two different 'workround' requirements.

In my opinion (and with 15 years of "IP works, or I don't eat" I think it's a solid opinion) these are just different sides of the same problem.

We can trace through the DNS issue pretty well. I can probably see more of it than most because when I say "my ISP" I mean my ISP -- I'm one of the owners.

I can follow the DNS through every server to my workstation.

It looks like BOINC (probably through a libcurl bug) caches the DNS result (which it got through a call to gethostbyname() so it doesn't know the TTL).

It would still be nice if someone could confirm that BOINC used one IP, that ping had a different IP, and net stop boinc/net start boinc made them use the same IP.
ID: 951153 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 951155 - Posted: 30 Nov 2009, 19:14:43 UTC - in response to Message 951141.  

Apparently, the default resolver TTL is only 300 seconds (or perhaps is honoring what it finds in the DNS record). :-(

The TTL in the zone file (the master set at SSL) is 300 (5 minutes).

Nothing should cache longer than that.

ID: 951155 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 951156 - Posted: 30 Nov 2009, 19:14:50 UTC - in response to Message 951153.  
Last modified: 30 Nov 2009, 19:18:38 UTC

I've been trying, but I ran out of stuck DL's to play with.

Of course, as I'm sure you're aware, it has been sworn to up and down that libcurl has been set to not cache locally.

OTOH, it was also said that they had straightened out NTLM authentication too, long before it actually was. Or at least I assume it was since I haven't seen anything about that in awhile. ;-)

<edit> WRT your other reply, that's what I'm seeing as the default behaviour in XP 64's DNS Client.

Alinator
ID: 951156 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 951157 - Posted: 30 Nov 2009, 19:18:18 UTC - in response to Message 950899.  
Last modified: 30 Nov 2009, 19:41:41 UTC

When you are in the Cmd window, you can also display the current entries in the DNS cache by typing ipconfig /displaydns

F.

Thanks Fred, hadn't known about that one - and very revealing it is too.

I'm using Windows XP 32-bit at service pack 3, and updated to this month's regular 'Update Tuesday' patches. I see there are some extra ones since then, but I haven't applied them yet. The following comments may not apply to other versions/patch levels of Windows.

First: the thing about the Windows resolver not honouring TTL is, on this system, an URBAN MYTH. It's FALSE. If I haven't downloaded anything in a while, boinc2.ssl.berkeley.edu is not in the cache. If I start a new download, boinc2.ssl.berkeley.edu appears, with a TTL of 300 (or just under, by the time I've found the command window and refreshed it). Subsequent refreshes show it counting down to 0, and then disappearing again. [BTW, I have no MaxCacheTtl value in the registry. The only dnscache parameter entry is ServiceDll].

Second: both IP adresses appear in the cache, clearly grouped together, and with the same TTL, in response to the download attempt. As someone else said, the significant thing seems to be the order they appear: .13 first = failure (for new downloads, or first attempt after restart for old downloads). .18 first = success, under current server conditions.

Third: mucking about with the hosts file (in my case, removing the '#' - comment symbol - I clearly put in after the last problem was resolved) puts the entry into dnscache with a TTL of 0 - I presume infinity. Reminds you whether it's active or not, without having to find the C:\WINDOWS\system32\drivers\etc address.

Fourth: retrying a stuck download does not bring boinc2.ssl.berkeley.edu into the ipconfig /displaydns list. My presumption is that no dns lookup takes place on a retry: the IP address is shown in http_debug, so it must be stored by something other than dns while BOINC is running.
ID: 951157 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 951159 - Posted: 30 Nov 2009, 19:18:59 UTC - in response to Message 951156.  

Of course, as I'm sure you're aware, it has been sworn to up and down that libcurl has been set to not cache locally.

As best I can tell, that's true. The $65536 question is, does the setting actually work.

It appears that by default, libcurl can keep a pool of connections that it re-uses. The best fix may be to completely shut down libcurl (so it releases everything, including the pool) every so often.

ID: 951159 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 951161 - Posted: 30 Nov 2009, 19:22:42 UTC - in response to Message 951156.  

Of course, as I'm sure you're aware, it has been sworn to up and down that libcurl has been set to not cache locally.

And DA will be asked that question, yet again, in the round-up of bug reports I'll be sending him late tonight or sometime tomorrow. I'm up to at least five by now.
ID: 951161 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 951163 - Posted: 30 Nov 2009, 19:32:00 UTC - in response to Message 951157.  
Last modified: 30 Nov 2009, 19:41:25 UTC

And.....

Fourth: retrying a stuck download does not bring boinc2.ssl.berkeley.edu into the ipconfig /displaydns list. My presumption is that no dns lookup takes place on a retry: the IP address is shown in http_debug, so it must be stored by something other than dns while BOINC is running.

Edit - I'll back-fill this one into the original list. I have a feeling I may need to refer to it again.....
ID: 951163 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 951164 - Posted: 30 Nov 2009, 19:34:54 UTC - in response to Message 951161.  

Of course, as I'm sure you're aware, it has been sworn to up and down that libcurl has been set to not cache locally.

And DA will be asked that question, yet again, in the round-up of bug reports I'll be sending him late tonight or sometime tomorrow. I'm up to at least five by now.

You might want to present this as a libcurl bug, and suggest that curl_global_cleanup() may need to be called periodically (followed by a call to curl_global_init()) to reinitialize the library.
ID: 951164 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 951165 - Posted: 30 Nov 2009, 19:38:13 UTC - in response to Message 951164.  

You might want to present this as a libcurl bug.....

Which takes us back to message 950541.
ID: 951165 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 951171 - Posted: 30 Nov 2009, 19:56:04 UTC - in response to Message 951157.  
Last modified: 30 Nov 2009, 19:57:24 UTC

First: the thing about the Windows resolver not honouring TTL is, on this system, an URBAN MYTH. It's FALSE. If I haven't downloaded anything in a while, boinc2.ssl.berkeley.edu is not in the cache. If I start a new download, boinc2.ssl.berkeley.edu appears, with a TTL of 300 (or just under, by the time I've found the command window and refreshed it). Subsequent refreshes show it counting down to 0, and then disappearing again.

That's true. Just tested myself. Works as it should. And that explains, why ipconfig /flushdns did never help (at least for me).


Fourth: retrying a stuck download does not bring boinc2.ssl.berkeley.edu into the ipconfig /displaydns list. My presumption is that no dns lookup takes place on a retry: the IP address is shown in http_debug, so it must be stored by something other than dns while BOINC is running.

Interestingly the page for the following internet connection check (google) is not stored that way. After a failed retry it appears in dns cache.
ID: 951171 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 951173 - Posted: 30 Nov 2009, 20:12:15 UTC - in response to Message 951153.  

As I was saying to Ned a couple of days ago, there are two separate branches to the problem, and hence two different 'workround' requirements.

In my opinion (and with 15 years of "IP works, or I don't eat" I think it's a solid opinion) these are just different sides of the same problem.

We can trace through the DNS issue pretty well. I can probably see more of it than most because when I say "my ISP" I mean my ISP -- I'm one of the owners.

I can follow the DNS through every server to my workstation.

It looks like BOINC (probably through a libcurl bug) caches the DNS result (which it got through a call to gethostbyname() so it doesn't know the TTL).


Looks like it doesn't support it at all.

11/30/2009 3:08:07 PM [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set
11/30/2009 3:08:07 PM [http_debug] [ID#0] info: timeout on name lookup is not supported


It would still be nice if someone could confirm that BOINC used one IP, that ping had a different IP, and net stop boinc/net start boinc made them use the same IP.


SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 951173 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 951174 - Posted: 30 Nov 2009, 20:20:25 UTC - in response to Message 951157.  
Last modified: 30 Nov 2009, 20:44:15 UTC

Thanks Fred, hadn't known about that one - and very revealing it is too.

...

First: the thing about the Windows resolver not honouring TTL is, on this system, an URBAN MYTH. It's FALSE. If I haven't downloaded anything in a while, boinc2.ssl.berkeley.edu is not in the cache. If I start a new download, boinc2.ssl.berkeley.edu appears, with a TTL of 300 (or just under, by the time I've found the command window and refreshed it). Subsequent refreshes show it counting down to 0, and then disappearing again. [BTW, I have no MaxCacheTtl value in the registry. The only dnscache parameter entry is ServiceDll].

...



Yes, I took a look into this to put this myth to rest for one and for all.

The maximum TTL parameter is an absolute maximum value Windows will keep an entry cached for, not an override for the value in the host record (unless it's longer than the parameter value that is).

The DNS Client Service dates back to Win 2K and the documentation clearly states this value is a not to be exceeded one.

There is nothing on the MS websites to indicate this has changed going forward.

IOW's, MS did do it right in this regard, and always has.

<edit> There does seem to be an exception coded into it for localhost, but of course that would make sense in order to minimize 'blackhole' queries.

Alinator
ID: 951174 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 951177 - Posted: 30 Nov 2009, 20:39:53 UTC - in response to Message 951173.  
Last modified: 30 Nov 2009, 20:57:12 UTC

Hi, about server panic, tried to re-install BOINC 6.10.18 on a WIN XP64 host, with a server running!
First I noticed no SETI WU's and in messages tab it was trying to DOWNLOAD DLL's???. Stopped BOINC, looked in Explorer and noticed NO BOINC DIR in X:\Documents and Settings\Applications\xxxxxx.xxx, files were all over the place . . .

Not a good idea, don't know exactly, what went wrong, but when trying to repair the install, it messed up an entire 500GByte drive!
Probably, BOINC-Install found an old version on a drive (in the network), which was there, alright.
And when examining the D drive on this (troubled)host, there was a BOINC dir in D:\Program Files, but not in C:\Documents and Setting\Applications ?
. As was set in Install options.

Most of the files has been installed a couple of times, in every 'Application' DIR, it found :(
All program entries from drive D, are history.
This isn't the server panic as mentioned above, so feel free to remove this post to a more usefull, if any, place :)
Will read the FAQ on SERVER Install . . .
ID: 951177 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 951178 - Posted: 30 Nov 2009, 20:47:00 UTC - in response to Message 951174.  

<edit> There does seem to be an exception coded into it for localhost, but of course that would make sense in order to minimize 'blackhole' queries.

Alinator

I think it gets that one from the default hosts file, as supplied with Windows.
ID: 951178 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 951181 - Posted: 30 Nov 2009, 20:55:20 UTC

208.68.240.18 worked for me, but I had to restart BOINC before it would use the updated IP address. Linux x86_64; BOINC version 6.4.7.
ID: 951181 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 951182 - Posted: 30 Nov 2009, 20:57:54 UTC - in response to Message 951178.  
Last modified: 30 Nov 2009, 21:14:47 UTC

<edit> There does seem to be an exception coded into it for localhost, but of course that would make sense in order to minimize 'blackhole' queries.

Alinator

I think it gets that one from the default hosts file, as supplied with Windows.


Agreed, that's where the definition for it is, but there isn't any TTL info in the hosts file.

I was trying to wrap my mind about where the whopping big TTL's came from in /displaydns (ranging from hundreds of thousands of seconds on this XP 64 host, to over 20 million (!!) on my 2k Pro host).

Alinator
ID: 951182 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 951186 - Posted: 30 Nov 2009, 21:10:02 UTC - in response to Message 951163.  
Last modified: 30 Nov 2009, 21:18:53 UTC

And.....

Fourth: retrying a stuck download does not bring boinc2.ssl.berkeley.edu into the ipconfig /displaydns list. My presumption is that no dns lookup takes place on a retry: the IP address is shown in http_debug, so it must be stored by something other than dns while BOINC is running.

Edit - I'll back-fill this one into the original list. I have a feeling I may need to refer to it again.....


Richard

I have one XP machine that has stuck D/l's 6.6.23. I created a Host entry for the .18 address. On saving the file the Stack refreshes and shows both the Forward .18 and the reverse address. Then going to Boinc and attempting the Download it fails. When the stack is reloaded any application using the stack is supposed reload the stack. So without a restart Boinc has the same "stuck value."

C:\Documents and Settings\Al>ipconfig /displaydns

Windows IP Configuration

         1.0.0.127.in-addr.arpa
         ----------------------------------------
         Record Name . . . . . : 1.0.0.127.in-addr.arpa.
         Record Type . . . . . : 12
         Time To Live  . . . . : 0
         Data Length . . . . . : 4
         Section . . . . . . . : Answer
         PTR Record  . . . . . : localhost

         18.240.68.208.in-addr.arpa
         ----------------------------------------
         Record Name . . . . . : 18.240.68.208.in-addr.arpa.
         Record Type . . . . . : 12
         Time To Live  . . . . : 0
         Data Length . . . . . : 4
         Section . . . . . . . : Answer
         PTR Record  . . . . . : boinc2.ssl.berkeley.edu

         boinc2.ssl.berkeley.edu
         ----------------------------------------
         Record Name . . . . . : boinc2.ssl.berkeley.edu
         Record Type . . . . . : 1
         Time To Live  . . . . : 0
         Data Length . . . . . : 4
         Section . . . . . . . : Answer
         A (Host) Record . . . : 208.68.240.18

         localhost
         ----------------------------------------
         Record Name . . . . . : localhost
         Record Type . . . . . : 1
         Time To Live  . . . . : 0
         Data Length . . . . . : 4
         Section . . . . . . . : Answer
         A (Host) Record . . . : 127.0.0.1


Regards

Edit: Note the TTL values never expire.


After Releasing the Host Record in the Host file and attempting a download

C:\Documents and Settings\Al>

C:\Documents and Settings\Al>ipconfig /displaydns

Windows IP Configuration

         1.0.0.127.in-addr.arpa
         ----------------------------------------
         Record Name . . . . . : 1.0.0.127.in-addr.arpa.
         Record Type . . . . . : 12
         Time To Live  . . . . : 0
         Data Length . . . . . : 4
         Section . . . . . . . : Answer
         PTR Record  . . . . . : localhost

         www.google.com
         ----------------------------------------
         Record Name . . . . . : www.google.com
         Record Type . . . . . : 5
         Time To Live  . . . . : 205
         Data Length . . . . . : 4
         Section . . . . . . . : Answer
         CNAME Record  . . . . : www.l.google.com

         localhost
         ----------------------------------------
         Record Name . . . . . : localhost
         Record Type . . . . . : 1
         Time To Live  . . . . : 0
         Data Length . . . . . : 4
         Section . . . . . . . : Answer
         A (Host) Record . . . : 127.0.0.1

Please consider a Donation to the Seti Project.

ID: 951186 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 951188 - Posted: 30 Nov 2009, 21:32:08 UTC - in response to Message 951186.  
Last modified: 30 Nov 2009, 21:32:44 UTC

I got this list after ipconfig /displaydns:

C:\Documents and Settings\Administrator.THUNDER>ipconfig /d

Windows IP Configuration

1.0.0.127.in-addr.arpa
----------------------------------------
Record Name . . . . . : 1.0.0.127.in-addr.arpa.
Record Type . . . . . : 12
Time To Live . . . . : 598713
Data Length . . . . . : 8
Section . . . . . . . : Answer
PTR Record . . . . . : localhost


i379.photobucket.com
----------------------------------------
Record Name . . . . . : i379.photobucket.com
Record Type . . . . . : 1
Time To Live . . . . : 302
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 209.17.73.20


bwhome.logitech.com
----------------------------------------
Record Name . . . . . : bwhome.logitech.com
Record Type . . . . . : 1
Time To Live . . . . : 45255
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 63.251.254.138


allprojectstats.com
----------------------------------------
Record Name . . . . . : allprojectstats.com
Record Type . . . . . : 1
Time To Live . . . . : 1190
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 188.40.47.202


bwsnotif01.logitech.com
----------------------------------------
Record Name . . . . . : bwsnotif01.logitech.com
Record Type . . . . . : 1
Time To Live . . . . : 45256
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 63.251.254.131


www.boincstats.com
----------------------------------------
Record Name . . . . . : www.boincstats.com
Record Type . . . . . : 1
Time To Live . . . . : 79391
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 217.67.244.100


wpad
----------------------------------------
Record data for type could not be displayed.


www.burstclick.com
----------------------------------------
Record Name . . . . . : www.burstclick.com
Record Type . . . . . : 1
Time To Live . . . . : 8958
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 208.97.175.94


valid.x86-secret.com
----------------------------------------
Record Name . . . . . : valid.x86-secret.com
Record Type . . . . . : 1
Time To Live . . . . : 32959
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 213.186.56.89


valid.canardpc.com
----------------------------------------
Record Name . . . . . : valid.canardpc.com
Record Type . . . . . : 5
Time To Live . . . . : 3336
Data Length . . . . . : 8
Section . . . . . . . : Answer
CNAME Record . . . . : cpc-prod2.canardpc.com


neil.wizzy-web.co.uk
----------------------------------------
Record Name . . . . . : neil.wizzy-web.co.uk
Record Type . . . . . : 1
Time To Live . . . . : 5547
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 209.126.222.125


bwsupdate01.logitech.com
----------------------------------------
Record Name . . . . . : bwsupdate01.logitech.com
Record Type . . . . . : 1
Time To Live . . . . : 45256
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 63.251.254.132


boinc.mundayweb.com
----------------------------------------
Record Name . . . . . : boinc.mundayweb.com
Record Type . . . . . : 1
Time To Live . . . . : 8959
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 91.198.165.221


sitecheck2.opera.com
----------------------------------------
Record Name . . . . . : sitecheck2.opera.com
Record Type . . . . . : 1
Time To Live . . . . : 37
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 91.203.99.45


Record Name . . . . .


Does anybody know why the adress has changed. Or isn't that correct?
ID: 951188 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next

Message boards : Number crunching : Panic Mode On (26) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.