Panic Mode On (26) Server problems

Message boards : Number crunching : Panic Mode On (26) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 13 · Next

AuthorMessage
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 950680 - Posted: 29 Nov 2009, 1:44:46 UTC

I noticed something strange about this problem on my machines. All my machines with Crunch3rs 6.1.0 fixed themselves and are fine. Each of my machines with 6.10.18 needed to be restarted to finish their downloads a couple a few restarts. So the problem seems to be in the newer builds on my machines.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 950680 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 950689 - Posted: 29 Nov 2009, 2:11:32 UTC

I had turned my i7 off for 3 hours when i had problems with the downloads. wathced a movie with the grand kids. low and behold when i turned it on ,I had work units running. maybe patience is what was called for. I only powerd up to read the forums, wasnt worried if i had anything running.
[/quote]

Old James
ID: 950689 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 950697 - Posted: 29 Nov 2009, 2:44:55 UTC - in response to Message 950672.  

The exception is the Maximum TTL settings in the Windows Registry. That is to correct a Windows BUG.

I'm sure it's considered a feature...

It's all in how you conjugate the verb.

If it's in your code, it's a bug.

If it's in my code, it's a feature.

:-)
ID: 950697 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 950698 - Posted: 29 Nov 2009, 2:47:24 UTC - in response to Message 950646.  


Curiously, I don't see a lot of Windows 7 users commenting here... could that mean the Lazy DNS cache issue is addressed in that OS ?


Well, my i7 running Windows 7 64 bit ultimately downloaded 1400 WU's. I say ultimately, since it took nearly 4 hours and more /flushdns's than I can count.

What I saw was that after a /flush, Boinc would d/l WU's for five minutes (I put a stopwatch on it to be certain; it was always 5 minutes, give or take 10 seconds), then roll right back into the error state. I'd have to halt d/l's, stop Boinc, do another flush, and start over again.

But, it DID work.............;)
ID: 950698 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 950705 - Posted: 29 Nov 2009, 3:16:07 UTC

I have Windows 7 64 bit ultimate also. Had to do a bunch of restarts.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 950705 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 950738 - Posted: 29 Nov 2009, 6:51:16 UTC
Last modified: 29 Nov 2009, 7:26:59 UTC

I had one waiting to be downloaded sinve yesterday, turned off the machine last night, switched on the machine this morning and now have 6 having trouble downloading will wait and see what happens next. Tired ipconfig /flushdns got cannot flush the resolver cache, my Isp will not allow it.
ID: 950738 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 950749 - Posted: 29 Nov 2009, 8:58:56 UTC

I found that using the ReSchedule tool for changing GPU>CPU worked for my
Vista rigs.Having to do it a few times though.

Don't know why it works.

Dave
ID: 950749 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 950757 - Posted: 29 Nov 2009, 9:54:18 UTC - in response to Message 950698.  

@ Ned,

I've been watching this thread, still without access to my main machines, and you've almost convinced me that it's all Windoze' fault - but not quite.

This story sounds like a very good description of the effects that one would expect to see with round-robin DNS and a TTL of 300 seconds:

What I saw was that after a /flush, Boinc would d/l WU's for five minutes (I put a stopwatch on it to be certain; it was always 5 minutes, give or take 10 seconds), then roll right back into the error state. I'd have to halt d/l's, stop Boinc, do another flush, and start over again.

But if that was the end of the story, wouldn't we expect to see that five minutes later, the downloads restart again of their own accord? (maybe only 50% of the time - sometimes it might be 10 minutes, sometimes 15, and so on - but it should have a good chance of restarting in due course). I don't think anybody has posted in this thread to report a spontaneous resumption of downloads after hitting the problem: if anyone has experienced one, please tell us about it.

It sounds the same as the discussion in Panic Mode On (18): there I reported a similar run of downloads followed by failure, but with no resumption after more than an hour. The odds against that are more than 4,000 to 1 if there's a true random re-resolve from DNS every 5 minutes. That's why I think there's a difference between BOINC's initial attempt to download a file (using DNS), and subsequent retries, using ?????
ID: 950757 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 950758 - Posted: 29 Nov 2009, 10:08:35 UTC - in response to Message 950738.  

Tried ipconfig /flushdns got cannot flush the resolver cache, my Isp will not allow it.

This has nothing to do with your ISP. You probably have the DNS client service disabled, or stopped. Without a DNS client, you're not caching any DNS entries.

(You'll need administrator functions for this)

Go to Start->Run
Type "services.msc" without the quotes, press Enter
Scroll to DNS Client.
Double click on DNS Client.
Under the Startup type pulldown, put it on "Automatic".
Then click the Start button.
Click Apply.
Click OK.

You will now be able to flush your DNS using the ipconfig /flushdns command.
ID: 950758 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 950761 - Posted: 29 Nov 2009, 10:40:05 UTC - in response to Message 950757.  

That's why I think there's a difference between BOINC's initial attempt to download a file (using DNS), and subsequent retries, using ?????

I was trying to compose exactly the same comment but Richard got the words so much better. I have amended the registry entries to force the 5 minute switch and can watch it happening in a cmd box by pinging. But I have one download that has been stuck for over an hour ATM and, no matter how many retries I force (or wait for) it remains stuck. It would certainly appear that the IP address (rather than the url) is being cached elsewhere within Boinc.

F.
ID: 950761 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 950764 - Posted: 29 Nov 2009, 11:16:32 UTC - in response to Message 950758.  
Last modified: 29 Nov 2009, 11:25:57 UTC

Tried it still cannot ipconfig / flushdns still get DNS cache not being cleared. Just tried it again and it has cleared it however still cannot download wu's
ID: 950764 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 950765 - Posted: 29 Nov 2009, 11:17:01 UTC - in response to Message 950761.  

I am curious why this only seems to effect my machines with 6.10.18 but none using 6.1.0?
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 950765 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 950767 - Posted: 29 Nov 2009, 11:30:25 UTC - in response to Message 950764.  

Haven't been able to download all day.
FLUSHDNS doesn't work.
Reboot doesn't work.
Upload works fine.
Report works fine.

Dust storm has made my nose run a continuous 4 minute mile all day.
Damaged newly-painted wall trying to hang a picture.
Motor mower on the fritz.

I'm telling you, it's a conspiracy!
ID: 950767 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 950768 - Posted: 29 Nov 2009, 11:32:04 UTC

Now report doesn't work.
And I appear to have been bitten by a spider.

Definitely a conspiracy [panic on]
ID: 950768 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 950769 - Posted: 29 Nov 2009, 11:35:48 UTC - in response to Message 950758.  

Tried ipconfig /flushdns got cannot flush the resolver cache, my Isp will not allow it.

This has nothing to do with your ISP. You probably have the DNS client service disabled, or stopped. Without a DNS client, you're not caching any DNS entries.

(You'll need administrator functions for this)

Go to Start->Run
Type "services.msc" without the quotes, press Enter
Scroll to DNS Client.
Double click on DNS Client.
Under the Startup type pulldown, put it on "Automatic".
Then click the Start button.
Click Apply.
Click OK.

You will now be able to flush your DNS using the ipconfig /flushdns command.


While you are correct in what you say, this is not curing the issue, which is a problem at S@H end.

We shall have to do the usual, and wait for the crew to hit the appropriate pipe with a hammer.
ID: 950769 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 950772 - Posted: 29 Nov 2009, 11:54:19 UTC

OK... if that goes on like that, I soon can open my own "Technical News" thread. LOL

On my dedicated cruncher the host file entries stopped to help. So I deleted them and although this machine still believed to find anything that ends with "berkeley.edu" is on that spanish book shop IP, the download was OK. Now half an hour later I finally get the right IPs for berkeley. And I deleted the host file entries also on my laptop. So now everything works here, hopefully it will last for a while.
ID: 950772 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51540
Credit: 1,018,363,574
RAC: 1,004
United States
Message 950805 - Posted: 29 Nov 2009, 16:49:22 UTC - in response to Message 950749.  

I found that using the ReSchedule tool for changing GPU>CPU worked for my
Vista rigs.Having to do it a few times though.

Don't know why it works.

Dave

Probably because the rescheduler shuts down Boinc and then restarts it when it has completed the rescheduling.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 950805 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 950811 - Posted: 29 Nov 2009, 17:10:18 UTC - in response to Message 950805.  

I had six or seven stuck in download. I just shut down connected client, and closed boinc manager for about ten minutes. when I started it back up they all downloaded with no problem. Maybe I just got lucky but it worked once, maybe it will again.

Haven't played with the rescheduler, I seem to have a pretty good mix of CPU/GPU tasks but I think the kittyman is probably right on that one.


PROUD MEMBER OF Team Starfire World BOINC
ID: 950811 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 950814 - Posted: 29 Nov 2009, 17:51:01 UTC - in response to Message 950757.  

@ Ned,

I've been watching this thread, still without access to my main machines, and you've almost convinced me that it's all Windoze' fault - but not quite.

This story sounds like a very good description of the effects that one would expect to see with round-robin DNS and a TTL of 300 seconds:

What I saw was that after a /flush, Boinc would d/l WU's for five minutes (I put a stopwatch on it to be certain; it was always 5 minutes, give or take 10 seconds), then roll right back into the error state. I'd have to halt d/l's, stop Boinc, do another flush, and start over again.

But if that was the end of the story, wouldn't we expect to see that five minutes later, the downloads restart again of their own accord? (maybe only 50% of the time - sometimes it might be 10 minutes, sometimes 15, and so on - but it should have a good chance of restarting in due course). I don't think anybody has posted in this thread to report a spontaneous resumption of downloads after hitting the problem: if anyone has experienced one, please tell us about it.

It sounds the same as the discussion in Panic Mode On (18): there I reported a similar run of downloads followed by failure, but with no resumption after more than an hour. The odds against that are more than 4,000 to 1 if there's a true random re-resolve from DNS every 5 minutes. That's why I think there's a difference between BOINC's initial attempt to download a file (using DNS), and subsequent retries, using ?????

@Richard

What I think is happening is that Windows will keep both IP addresses cached and in the same order for the default Maximum TTL, even if the record has a shorter TTL.

Since the default for the Maximum TTL is 86400 (1 day) your formula is right, but the it'd take a day to time out and re-randomize the lookup.

... and as you pointed out, there is a 50% chance of getting the same order on a random lookup.

I suspect that this kind of issue never lasts for more than a couple of days, so on that time scale it's hard to be sure what happened.

-- Ned
ID: 950814 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 950815 - Posted: 29 Nov 2009, 17:57:17 UTC - in response to Message 950765.  
Last modified: 29 Nov 2009, 17:57:48 UTC

I am curious why this only seems to effect my machines with 6.10.18 but none using 6.1.0?

One of my grumbles over 6.1.0 is that it has a BOINC-style version number, but the modifications are based on a 5.x client.

The Lunatics gang carefully maintained the original SETI application versions and added an additional version/build info to make it absolutely clear that it's not from Berkeley.

As I remember, 6.1.0 is based on a mid-5.x.x version, probably before they started using libcurl.

Why the switch to libcurl? Because it handles a variety of proxies and situations that the earlier code did not do well.

Richard may be right in that it's a libcurl bug, but I have <http_debug>1</http_debug> in my cc_config.xml, and it seems to say that it's doing the lookup, and getting the (wrong) answer from the underlying OS.

I could be wrong.

Edit: 6.1.0 is definitely older than it appears.
ID: 950815 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 13 · Next

Message boards : Number crunching : Panic Mode On (26) Server problems


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.