Message boards :
Number crunching :
Panic Mode On (26) Server problems
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 13 · Next
Author | Message |
---|---|
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
OK... if that goes on like that, I soon can open my own "Technical News" thread. LOL On my dedicated cruncher the host file entries stopped to help. So I deleted them and although this machine still believed to find anything that ends with "berkeley.edu" is on that spanish book shop IP, the download was OK. Now half an hour later I finally get the right IPs for berkeley. And I deleted the host file entries also on my laptop. So now everything works here, hopefully it will last for a while. |
kittyman Send message Joined: 9 Jul 00 Posts: 51469 Credit: 1,018,363,574 RAC: 1,004 |
I found that using the ReSchedule tool for changing GPU>CPU worked for my Probably because the rescheduler shuts down Boinc and then restarts it when it has completed the rescheduling. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
I had six or seven stuck in download. I just shut down connected client, and closed boinc manager for about ten minutes. when I started it back up they all downloaded with no problem. Maybe I just got lucky but it worked once, maybe it will again. Haven't played with the rescheduler, I seem to have a pretty good mix of CPU/GPU tasks but I think the kittyman is probably right on that one. PROUD MEMBER OF Team Starfire World BOINC |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
@ Ned, @Richard What I think is happening is that Windows will keep both IP addresses cached and in the same order for the default Maximum TTL, even if the record has a shorter TTL. Since the default for the Maximum TTL is 86400 (1 day) your formula is right, but the it'd take a day to time out and re-randomize the lookup. ... and as you pointed out, there is a 50% chance of getting the same order on a random lookup. I suspect that this kind of issue never lasts for more than a couple of days, so on that time scale it's hard to be sure what happened. -- Ned |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I am curious why this only seems to effect my machines with 6.10.18 but none using 6.1.0? One of my grumbles over 6.1.0 is that it has a BOINC-style version number, but the modifications are based on a 5.x client. The Lunatics gang carefully maintained the original SETI application versions and added an additional version/build info to make it absolutely clear that it's not from Berkeley. As I remember, 6.1.0 is based on a mid-5.x.x version, probably before they started using libcurl. Why the switch to libcurl? Because it handles a variety of proxies and situations that the earlier code did not do well. Richard may be right in that it's a libcurl bug, but I have <http_debug>1</http_debug> in my cc_config.xml, and it seems to say that it's doing the lookup, and getting the (wrong) answer from the underlying OS. I could be wrong. Edit: 6.1.0 is definitely older than it appears. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
@ Ned, But if Fred W has applied the Windows kb/318803 registry corrections (message 950657), why is he still apparently 'locked in' to the faulty address (message 950761)? |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
@ Ned, As I recall Boinc caches what is in the stack and does not necessarily honor TTL. Thus if you flush the cache and check that .18 is responding and attempt to download it fails. If you have stopped Boinc and restart at that time it succeeds as it reads the stack. I have a case of a machine running headless 6.6.23 that even after stopping the dns client and restarting... flushing the cache and waiting for .18 to be the proper response in the cache will not complete the download. This points at the Boinc Core. Currently there is no tool to reveal what Boinc has cached. The machine was last restarted a 14:57 the 28th so the TTL of 84600 has not expired yet. Al Please consider a Donation to the Seti Project. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
@ Ned, Good question. This is from my logs: 11/29/2009 8:45:50 AM [http_debug] [ID#0] info: timeout on name lookup is not supported 11/29/2009 8:45:50 AM [http_debug] [ID#0] info: About to connect() to boinc.berkeley.edu port 80 (#0) 11/29/2009 8:45:50 AM [http_debug] [ID#0] info: Trying 128.32.18.189... 11/29/2009 8:45:51 AM [http_debug] [ID#0] info: Connected to boinc.berkeley.edu (128.32.18.189) port 80 (#0) 11/29/2009 8:45:51 AM [http_debug] [ID#0] Sent header to server: GET /project_list.php HTTP/1.1 User-Agent: BOINC client (windows_intelx86 6.10.19) Host: boinc.berkeley.edu Accept: */* Accept-Encoding: deflate, gzip Content-Type: application/x-www-form-urlencoded 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: HTTP/1.1 200 OK 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: Date: Sun, 29 Nov 2009 16:46:04 GMT 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: Server: Apache/2.2.9 (Fedora) 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: X-Powered-By: PHP/5.2.6 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: Connection: close 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: Transfer-Encoding: chunked 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: Content-Type: text/xml 11/29/2009 8:45:51 AM [http_debug] [ID#0] Received header from server: 11/29/2009 8:45:51 AM [http_debug] [ID#0] info: Expire cleared 11/29/2009 8:45:51 AM [http_debug] [ID#0] info: Closing connection #0 If (and that's a big If) I understand libcurl correctly, the IP is cached in the connection, and if (another big if) BOINC is closing the connection as it appears to be, it must be the underlying OS and not BOINC. Again, could be wrong. I thank Fred W. for trying this, and what I was hoping is, if we could get enough people to try this (so we get a few samples while the problem exists) we might learn something useful to pass on to the developers. ... instead of the dead-end messing with the HOSTS file, which is a kluge. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
As I recall Boinc caches what is in the stack and does not necessarily honor TTL. Thus if you flush the cache and check that .18 is responding and attempt to download it fails. If you have stopped Boinc and restart at that time it succeeds as it reads the stack. In theory, applications say to the OS "please open a connection to boinc2.ssl.berkeley.edu" and the stack (which belongs to the OS) returns a handle to the connection, and that's it. The application (in this case, BOINC) doesn't really care all that much about an IP, or even know for sure that TCP/IP is being used. <http_debug>1</http_debug> in cc_config.xml is very informative. In practice, we could be looking at a libcurl bug. -- Ned Another thought: if the caching is in libcurl, then "net stop boinc/net start boinc" should flush the libcurl cache (if the documentation is correct, it's just one name/ip per libcurl handle, in RAM). To figure this out, we need to do good science. There is enough "I edited the hosts file, changed the registry, stopped and restarted BOINC, shut down the computer, cycled power to the house, restarted the computer, and voila!" that we don't know which one fixed it. |
Stick Send message Joined: 26 Feb 00 Posts: 100 Credit: 5,283,449 RAC: 5 |
For what it is worth, I've also had some problems with stuck SETI downloads - two different times in the last few days (Win XP Pro, single CPU). In both cases, restarting BOINC (v6.10.18) fixed the problem. I had six or seven stuck in download. I just shut down connected client, and closed boinc manager for about ten minutes. when I started it back up they all downloaded with no problem. Maybe I just got lucky but it worked once, maybe it will again. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Because there's a flood watch and a severe weather warning on my route home, I'm not going to be able to test on my own machines until midday Monday UK time, but that should still be long enough for a couple of hours of experimentation before Matt comes in and destroys the experimental conditions by fixing vader (<grin>). One thought: my guess is that Windows kb/318803 wouldn't come into effect instantly (Windows wouldn't re-query the registry on every DNS lookup - would it?), but a DNS client service restart should force a registry read. That means that it should be possible to test the Windows caching independently of the BOINC caching - one could restart DNS while BOINC stays running, or restart BOINC while DNS is running. A computer reboot would restart both, of course - good for downloading SETI work, not so good for determining by scientific experiment where the problem lies. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Because there's a flood watch and a severe weather warning on my route home, I'm not going to be able to test on my own machines until midday Monday UK time, but that should still be long enough for a couple of hours of experimentation before Matt comes in and destroys the experimental conditions by fixing vader (<grin>). One guess would be that TTL gets modified when the "A" record goes into the cache, so setting the registry key might not change records already there. Another is that Windows reports TTL but doesn't really do anything with it (which is the case, I think). A third is that it is cached in BOINC (in libcurl), but I don't think that's true. Trouble is, I don't have enough uploads to tell. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
The only thing I can say to the whole discussion.. IIRC, Crunch3r's V6.1.0 is a mod of Berkeley's V5.10.1x client. |
FiveHamlet Send message Joined: 5 Oct 99 Posts: 783 Credit: 32,638,578 RAC: 0 |
I am not to clued up on this DNS stuff but, I used ReSchedule while Boinc was running and nothing else. That restart of Boinc in that manner was enought to kickstart the downloads. I also noted that as the queue of downloads got shorter so the transfers stayed on for longer. Boinc decided that I needed about 1200 WU's,and to start with it dropped out after a minute or so.The queue got shorter and at the end transfers lasted 10 to 15 mins. Don't know if any of that is of use or not but just an observation over the last day or so. Dave |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
One guess would be that TTL gets modified when the "A" record goes into the cache, so setting the registry key might not change records already there. That would suggest that the best procedure (for experimenting, not downloading) would be: 1) Set registry MaxCacheTtl 2) Flush cache 3) restart service From your log: 11/29/2009 8:45:50 AM [http_debug] [ID#0] info: About to connect() to boinc.berkeley.edu port 80 (#0) 11/29/2009 8:45:50 AM [http_debug] [ID#0] info: Trying 128.32.18.189... Unfortunately, even <http_debug> doesn't give us any clue about the logic or 'reasoning' that links those two statements: it doesn't say 'performing dns lookup: result success' or 'have value already: using previous IP'. That's what we really need. IIRC from previous events like this, a machine with stuck downloads won't be re-using [ID#0] - the connection numbers will be up in the hundreds or even thousands by now. And every connection number that didn't work will leave a trace of itself in a number HTTP file in the BOINC data directory. The current ones are usually locked, but I wonder if anyone could crack one open and see what it says? (This machine here is bereft, unfortunately). |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
@Ned, I'm sure it is worse than this. I set my registry for max and min times to 300 secs and I can track the switch-over by pinging. But once Boinc has failed to download (by picking up the .13 address) that download remains stuck until the whole Boinc app is restarted during a period when the ping comes up with .18. So it seems that Boinc is caching the IP (not the url). F. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
The edit of the hosts file work for me now over ~ 24 hours, without a prob. DL work well, also with loong breaks between work requests. A 'dummie' question.. Where I should insert 'ipconfig /flushdns'? In Start/Execute.../ ipconfig /flushdns / OK Or in Start/All programs/Accessories/prompt (german: "Eingabeaufforderung") (like DOS OS) |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Unfortunately, even <http_debug> doesn't give us any clue about the logic or 'reasoning' that links those two statements: it doesn't say 'performing dns lookup: result success' or 'have value already: using previous IP'. That's what we really need. My sinuses are palying up (again) and have been dosed with liberal quantities of medicine from north of the border (single malt) and I am planning to hit the sack soon so, unfortunately, I have just edited my hosts file (non-trivial exercise on Vistax64!!) to keep things turning over overnight. However, I have just looked in my Boinc data directory where I find http_temp_5008, http_temp_13383 and http_temp_13384 - all empty files. Are these what you were referring to Richard? F. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Open a cmd window (Start > Run > cmd) Type ipconfig /flushdns into the cmd window F. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
[...] Ahh.. O.K., then I made it correct.. A small (black) window was shown for ~ 1 sec. , nothing else. This is all? No loong HDD activity or Windows unstable? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.