Message boards :
Number crunching :
Panic Mode On (18) Server problems
Message board moderation
Author | Message |
---|---|
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Considering the length of the outage, and the level of inbound & outbound traffic prior to the outage, it looks like a a lot of people are affected by the present unable to download problem. Grant Darwin NT |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
There is a way around those download problems. Something for advanced users only, though. 1. Edit or make the cc_config.xml file in your BOINC Data directory, and add into it: <cc_config> <log_flags> <file_xfer_debug>1</file_xfer_debug> </log_flags> </cc_config> 2. Make BOINC use this cc_config.xml -> BOINC Manager->Advanced view->Advanced->Read config file. 3. BOINC Manager->Transfers tab->select the Seti tasks trying to download->Retry Now. 4. Messages tab, check for the communications messages of the Seti tasks. before each time-out, you'll see something like 25-Jun-09 12:01:29 SETI@home [file_xfer_debug] URL: http://boinc2.ssl.berkeley.edu/sah/download_fanout/2f/09mr09aa.14453.173729.3.8.219 5. Copy the whole URL out and paste that in the address bar of a browser. Let the browser try to load this file. Now save the file directly to your ..\BOINC\setiathome.berkeley.edu\ directory, clicking Yes on overwriting the old one there. 6. In the Transfers tab, select the Seti task that you just downloaded, click Retry Now. In the Messages tab you'll get a message alike: 25-Jun-09 12:08:30 SETI@home File 09mr09aa.14453.173729.3.8.219 exists already, skipping download And all is well in the world. Of course, the above is only useful if you have a handful of tasks to download. If it's many tens or hundreds, you'll have to wait for the DNS problem to clear. These usually clear after 24 hours, although it depends on your own computer when those 24 hours are over. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
OK, done that (well, a minor variation involving client_state.xml - READ ONLY! - but the same general idea), and it worked. So, how and why does it work? Surely the browser download relies on the same DNS infrastructure (supplied and managed by Windows, in my case). How come a browswer resolves DNS OK, when BOINC - using the exact same address, by definition (you pasted it) - fails? The most interesting case is one on my Q6600s, which was very close to dry. It had a few tasks waiting to download - enough to have inhibited work fetch since about Tuesday. Since the Crickets were chirping, I did a full host restart: nothing. Then I did a flushdns (is the DNS cache preserved across reboots, in WinXP?), and work started flowing as fast as I've ever seen it - several requests, downloaded probably a hundred tasks in total. Then it suddenly stopped again, with 32 tasks still awaiting download - hasn't downloaded a bean for over an hour. Cricket is still happy, though lower than I would expect for this stage of a recovery. I'm beginning to suspect that one of the two download servers is borked again (we've had this before). If you hit the good one, everything is hunky-dory. If you hit the bad one, not only does that transfer fail, it somehow poisons BOINC so nothing downloads for - oooooh, ages. If it's been going on for a while, could that explain why Matt's experiment with reverting to a single download server failed a couple of days back? Maybe he switched off the good one, and left the poisonous one running without checking it..... Edit: tried it on another machine, and suspicions are growing. 208.68.240.18 looks good 208.68.240.13 looks poisonous |
kevin6912 Send message Joined: 18 Jul 99 Posts: 17 Credit: 10,539,602 RAC: 0 |
Nslookup for name boinc2.ssl.berkeley.edu returns these IP addresses 208.68.240.13 and 208.68.240.18. The web server on IP address 208.68.240.13 is causing me problems. I am not getting any response. The web server on IP address 208.68.240.18 is the only way I can get downloads. Kevin |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I have to agree....something strange is happening. My boxes were happily downloading late last night when instantly all the downloads stopped on all the boxes. Stayed that way all night with 391 pending downloads and none went through. This morning I rebooted all machines and the downloades all continued to download and finish. Boinc....Boinc....Boinc....Boinc.... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
So, how and why does it work? Magic. (is the DNS cache preserved across reboots, in WinXP?) As far as I know, no. The DNS cache is purged when you reboot the machine. But it apparently also matters how many times a day your ISP updates the DNS cache and if they're up-to-date or have negative entries still. You can test the "Block negative entries" and DNS TTL options in this article. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Bingo. 208.68.240.13 is Vader - and that's the one which Matt made the 'sole download server' on Tuesday. He obviously reinstated 208.68.240.18 (bane) yesterday - and that's sustaining a half-pipe download service all on its own. Just had a quick play with a hosts file - very nice. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
I'm beginning to suspect that one of the two download servers is borked again (we've had this before). If you hit the good one, everything is hunky-dory. If you hit the bad one, not only does that transfer fail, it somehow poisons BOINC so nothing downloads for - oooooh, ages. I was thinking something similar. Out of all my queued downloads, only 2 have downloaded, 60+ are still trying. It would appear the load isn't spread particularly evenly. Grant Darwin NT |
rtX Send message Joined: 24 Jun 00 Posts: 13 Credit: 2,105,091 RAC: 0 |
Likewise, I got some work this way. Does this not point to a bug in the way BOINC handles these downloads? I had already flushed DNS, rebooted etc. yet BOINC seems to be still looking at an old DNS resolution that Firefox does not share. BOINC 6.6.36 seems to have taken significant steps backwards from earlier versions. It has this DNS handling issue, and it is not scheduling correctly (per other threads). I think this cannot help retain new volunteers who are less willing to 'get under the hood'. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I don't think this problem is related to BOINC v6.6.36 - I had to clean three v5.10.13 machines manually this morning, too. And I checked that server IP identification by referring back to Semiautofs (Oct 09 2008) and some PMs I exchanged with Matt around that time. The worst you can say is that BOINC has had DNS problems for absolutely ages, and should have got them sorted out by now - as should libcurl, to whom BOINC will pass the buck if you complain. |
Jean-David Beyer Send message Joined: 10 Jun 99 Posts: 60 Credit: 1,301,105 RAC: 1 |
Is that the problem? I have 5 tasks in downloading state, and they have been that way for several days. Three of them will expire tomorrow. It would still be possible to process them if I get them pretty soon. My resolver returns (in part): ;; QUESTION SECTION: ;boinc2.ssl.berkeley.edu. IN A ;; ANSWER SECTION: boinc2.ssl.berkeley.edu. 114 IN A 208.68.240.18 boinc2.ssl.berkeley.edu. 114 IN A 208.68.240.13 ;; AUTHORITY SECTION: ssl.berkeley.edu. 84383 IN NS adns1.berkeley.edu. ssl.berkeley.edu. 84383 IN NS adns2.berkeley.edu. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
If you can find a way of restricting BOINC to only attempting to contact 208.68.240.18, you should get them quickly. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Won't adding that IP address to your hosts file do that? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Won't adding that IP address to your hosts file do that? Yes, that worked for me in Windows. Haven't looked to see what OS Jean-David is using. |
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0 |
I had to stop/restart the DNS Client service before the hosts file was accessed on my XP Pro SP3 machine. Using sysinternals "filemon" to check file access. THANKS! to everyone for all your helpful advice! Martin |
cliff west Send message Joined: 7 May 01 Posts: 211 Credit: 16,180,728 RAC: 15 |
i know before when a unit had down load issues (ie had to try more than onece to down load) it would erro out... i have had alot of cuda do that this last week i hope the ones waiting now don't do that |
Leopoldo Send message Joined: 4 Aug 99 Posts: 102 Credit: 3,051,091 RAC: 0 |
THANKS! to everyone for all your helpful advice! Yes! Greatly appreciated! .13 doesn't answer to telnet@80 while .18 does _____________ WBW, Leopoldo |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
So, how and why does it work? Surely the browser download relies on the same DNS infrastructure (supplied and managed by Windows, in my case). How come a browswer resolves DNS OK, when BOINC - using the exact same address, by definition (you pasted it) - fails? For performance, DNS results are frequently cached -- and there are a couple of common issues with caching. ssl.berkeley.edu has two name servers for the zone. Those advertise a five minute "time to live." Going from the authority toward the client: A query for boinc2.ssl.berkeley.edu gets to some resolver, probably the one at your ISP, and it asks one of the two name servers for the zone. It caches the response, with a TTL of five minutes. If you're on Windows, the stub-resolver on your workstation will cache the response. ... and libcurl (in BOINC) gets the answer from Windows and caches it. Part of the problem: none of these should keep any answer for more than the specified TTL. There exist resolvers that force TTL to some minimum value. My ISP resolver forces a minimum of five minutes (technically an RFC violation). Some versions of Windows appear to use their own internal TTL setting instead of following TTL. The simplest fix is to just set the maximum TTLs in the registry to something pretty short (no more than an hour), instructions here. I think libcurl just plain stores an IP, and doesn't let it go unless it is told to do so, and I haven't reviewed the code. Another common flaw: RFC-1034/RFC-1035 says that responses should be randomized, but does not say "at the server" or "at the client" and many servers do not randomize the responses. The ones from Microsoft in particular... So the simple answer is: two DNS lookups, against the same infrastructure, should take entirely different paths to the answer, and should return different responses -- and the only exception is the very simplest case (i.e. only one "A" record). Overly aggressive caching can make "unfortunate" results last a very long time.[/url] |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.