Network Drano (Mar 27 2007)

Message boards : Technical News : Network Drano (Mar 27 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 537626 - Posted: 27 Mar 2007, 23:07:07 UTC

Usual database backup outage today except we took some extra time to do a couple things. First, we powered sidious down and back up to measure its current draw. Peaks at about 8 amps during drive spin-up. Then Bob and I did a bunch of tests, comparing table sizes and sums/averages of selected fields to confirm the replica is indeed in sync with the master BOINC database. Looks good.

Upon coming back up I eventually noticed most of the file uploads were timing out on bruno. Jeff and I battled with this for a bit. We followed several red herrings and tuned various apache/tcp parameters but eventually the solution was cleaning up some nested sym links that contained a mount that fell away sometime recently. We think. Anyway, we cleaned up these links and that immediately fixed the problem. During all that kryten was working fine. It is still getting hit by a small but significant number of BOINC clients, probably due to libcurl DNS caching within the client - something we should probably fix sooner or later. By the way, this might have also been why the validator queue has been growing over the past day or so. That emptied immediately, too, though that forced a backlog in the deleter queues. I had to kick those just now to pick up the new sym link as well.

Backing up the science database today and will make changes tomorrow. Will test the changes (re: the splitter, assimilator, and validator) on kryten before implementing on bruno (later in the week or next week).

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 537626 · Report as offensive
bicyclist
Avatar

Send message
Joined: 3 Jun 01
Posts: 3
Credit: 200,695
RAC: 0
United States
Message 537631 - Posted: 27 Mar 2007, 23:35:24 UTC

sweet!
yes
ID: 537631 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 537641 - Posted: 27 Mar 2007, 23:53:54 UTC - in response to Message 537626.  
Last modified: 27 Mar 2007, 23:55:59 UTC

. . . During all that kryten was working fine. It is still getting hit by a small but significant number of BOINC clients, probably due to libcurl DNS caching within the client - something we should probably fix sooner or later. . . .
- Matt

Anything we can do to confirm we are "talking" to the correct server?
Anything we can do to correct it if not?
Tks,
Gus Obermeyer
EDIT, already tried the /flushdns trick in case that's it. /EDIT
ID: 537641 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 537642 - Posted: 27 Mar 2007, 23:57:10 UTC - in response to Message 537641.  

. . . During all that kryten was working fine. It is still getting hit by a small but significant number of BOINC clients, probably due to libcurl DNS caching within the client - something we should probably fix sooner or later. . . .
- Matt

Anything we can do to confirm we are "talking" to the correct server?
Anything we can do to correct it if not?
Tks,
Gus Obermeyer

To correct the problem:

Stop BOINC
Flush the DNS at your router. (for router appliances, this is usually a power cycle).
Run "IPConfig /flushdns" from a command prompt (drop the quotes).
Start BOINC.

I have no idea how to tell if you are talking to the correct server.


BOINC WIKI
ID: 537642 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 537647 - Posted: 28 Mar 2007, 0:08:19 UTC - in response to Message 537642.  
Last modified: 28 Mar 2007, 0:09:40 UTC

. . . During all that kryten was working fine. It is still getting hit by a small but significant number of BOINC clients, probably due to libcurl DNS caching within the client - something we should probably fix sooner or later. . . .
- Matt

Anything we can do to confirm we are "talking" to the correct server?
Anything we can do to correct it if not?
Tks,
Gus Obermeyer

To correct the problem:

Stop BOINC
Flush the DNS at your router. (for router appliances, this is usually a power cycle).
Run "IPConfig /flushdns" from a command prompt (drop the quotes).
Start BOINC.

I have no idea how to tell if you are talking to the correct server.

Well, as I said in my edit I've already tried flushing DNS. The "correct server" is Bruno, or at least NOT Kryten. I will power cycle the router tho, thanks.
ID: 537647 · Report as offensive
Profile Michael Gmirkin

Send message
Joined: 18 Dec 03
Posts: 50
Credit: 8,956,363
RAC: 26
United States
Message 537666 - Posted: 28 Mar 2007, 0:43:14 UTC - in response to Message 537647.  

. . . During all that kryten was working fine. It is still getting hit by a small but significant number of BOINC clients, probably due to libcurl DNS caching within the client - something we should probably fix sooner or later. . . .
- Matt

Anything we can do to confirm we are "talking" to the correct server?
Anything we can do to correct it if not?
Tks,
Gus Obermeyer

To correct the problem:

Stop BOINC
Flush the DNS at your router. (for router appliances, this is usually a power cycle).
Run "IPConfig /flushdns" from a command prompt (drop the quotes).
Start BOINC.

I have no idea how to tell if you are talking to the correct server.

Well, as I said in my edit I've already tried flushing DNS. The "correct server" is Bruno, or at least NOT Kryten. I will power cycle the router tho, thanks.


For some reason I seem to recall a note a few weeks back (when folks were having issues with a server/IP switch or something) saying first type:
"ipconfig /flushdns"

then type:
"ipconfig /registerdns"

Will it kill things if the latter isn't done? Just wondering. Sounded like the prior posts forgot the last step. Didn't know if it was crucial or not...?

~Michael
If there were no time, how old would YOU be? ~Me

BOINC Seti@Home Stats:

Classic Seti@Home Stats:
ID: 537666 · Report as offensive
Profile Gary

Send message
Joined: 7 Mar 07
Posts: 1
Credit: 6,815
RAC: 0
United Kingdom
Message 537676 - Posted: 28 Mar 2007, 0:53:16 UTC - in response to Message 537666.  
Last modified: 28 Mar 2007, 0:53:35 UTC


For some reason I seem to recall a note a few weeks back (when folks were having issues with a server/IP switch or something) saying first type:
"ipconfig /flushdns"

then type:
"ipconfig /registerdns"

Will it kill things if the latter isn't done? Just wondering. Sounded like the prior posts forgot the last step. Didn't know if it was crucial or not...?

~Michael



I just tried those commands - no joy here - I still cannot d/load new WU's.

I just switched 'No New Tasks' on, I will leave it a while - maybe there is a backlog now of people trying to d/load WU's.
ID: 537676 · Report as offensive
Profile Walla
Volunteer tester
Avatar

Send message
Joined: 14 May 06
Posts: 329
Credit: 177,013
RAC: 0
United States
Message 537678 - Posted: 28 Mar 2007, 1:01:20 UTC - in response to Message 537666.  
Last modified: 28 Mar 2007, 1:02:56 UTC


For some reason I seem to recall a note a few weeks back (when folks were having issues with a server/IP switch or something) saying first type:
"ipconfig /flushdns"

then type:
"ipconfig /registerdns"

Will it kill things if the latter isn't done? Just wondering. Sounded like the prior posts forgot the last step. Didn't know if it was crucial or not...?

~Michael


I don't think the latter has to be done. I didn't do it. BOINC can still access a reference site but not the SETI servers.

From Microsoft

/flushdns : Flushes and resets the contents of the DNS client resolver cache. During DNS troubleshooting, you can use this procedure to discard negative cache entries from the cache, as well as any other entries that have been added dynamically.

/displaydns : Displays the contents of the DNS client resolver cache, which includes both entries preloaded from the local Hosts file and any recently obtained resource records for name queries resolved by the computer. The DNS Client service uses this information to resolve frequently queried names quickly, before querying its configured DNS servers.

/registerdns : Initiates manual dynamic registration for the DNS names and IP addresses that are configured at a computer. You can use this parameter to troubleshoot a failed DNS name registration or resolve a dynamic update problem between a client and the DNS server without rebooting the client computer. The DNS settings in the advanced properties of the TCP/IP protocol determine which names are registered in DNS.

ID: 537678 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 537680 - Posted: 28 Mar 2007, 1:06:03 UTC - in response to Message 537666.  


For some reason I seem to recall a note a few weeks back (when folks were having issues with a server/IP switch or something) saying first type:
"ipconfig /flushdns"

then type:
"ipconfig /registerdns"

Will it kill things if the latter isn't done? Just wondering. Sounded like the prior posts forgot the last step. Didn't know if it was crucial or not...?

~Michael

I remember that as well. But unless I'm mistaken later on it was generally agreed that the /registerdns command was not really necessary; that would happen by itself.
BTW, on 6 of my 8 machines I am not able to upload or download a single unit. Zero. The cricket graph has started to flatten out so things should be moving better than that by now. I've power cycled the modem and router, have flushed dns, and finally tried rebooting. Still nothing. Since I've seldom had any trouble in the past my guess is that something is still wrong at Berkeley.
ID: 537680 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 537687 - Posted: 28 Mar 2007, 1:58:48 UTC - in response to Message 537666.  
Last modified: 28 Mar 2007, 2:04:35 UTC

. . . During all that kryten was working fine. It is still getting hit by a small but significant number of BOINC clients, probably due to libcurl DNS caching within the client - something we should probably fix sooner or later. . . .
- Matt

Anything we can do to confirm we are "talking" to the correct server?
Anything we can do to correct it if not?
Tks,
Gus Obermeyer

To correct the problem:

Stop BOINC
Flush the DNS at your router. (for router appliances, this is usually a power cycle).
Run "IPConfig /flushdns" from a command prompt (drop the quotes).
Start BOINC.

I have no idea how to tell if you are talking to the correct server.

Well, as I said in my edit I've already tried flushing DNS. The "correct server" is Bruno, or at least NOT Kryten. I will power cycle the router tho, thanks.


For some reason I seem to recall a note a few weeks back (when folks were having issues with a server/IP switch or something) saying first type:
"ipconfig /flushdns"

then type:
"ipconfig /registerdns"

Will it kill things if the latter isn't done? Just wondering. Sounded like the prior posts forgot the last step. Didn't know if it was crucial or not...?

~Michael

I've tried all those and then some, Yeah I unplugged the BEFSR81 and that was a waste of time, I get system connect and http error, I did get through at least once, See below:

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\\Documents and Settings\\Administrator.BATPC1>ipconfig /registerdns

Windows IP Configuration

Registration of the DNS resource records for all adapters of this computer has been initiated. Any errors will be reported in the Event Viewe
r in 15 minutes..

C:\\Documents and Settings\\Administrator.BATPC1>ping setiathome.berkeley.edu

Pinging setiathome.SSL.berkeley.edu [128.32.18.152] with 32 bytes of data:

Reply from 128.32.18.152: bytes=32 time=75ms TTL=241
Reply from 128.32.18.152: bytes=32 time=77ms TTL=241
Reply from 128.32.18.152: bytes=32 time=134ms TTL=241
Reply from 128.32.18.152: bytes=32 time=72ms TTL=241

Ping statistics for 128.32.18.152:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 72ms, Maximum = 134ms, Average = 89ms

C:\\Documents and Settings\\Administrator.BATPC1>

3/27/2007 6:55:10 PM||Access to reference site succeeded - project servers may be temporarily down.
3/27/2007 6:55:22 PM||Project communication failed: attempting access to reference site
3/27/2007 6:55:22 PM|SETI@home|[file_xfer] Temporarily failed upload of 27se04ab.11205.9842.236066.3.255_3_0: http error
3/27/2007 6:55:22 PM|SETI@home|Backing off 2 min 22 sec on upload of file 27se04ab.11205.9842.236066.3.255_3_0
3/27/2007 6:55:22 PM|SETI@home|[file_xfer] Started download of file 03oc03aa.4127.1522.473568.3.42
3/27/2007 6:55:23 PM||Access to reference site succeeded - project servers may be temporarily down.
3/27/2007 6:55:44 PM||Project communication failed: attempting access to reference site
3/27/2007 6:55:44 PM|SETI@home|[file_xfer] Temporarily failed download of 03oc03aa.4127.1522.473568.3.42: system connect
3/27/2007 6:55:44 PM|SETI@home|Backing off 2 hr 3 min 17 sec on download of file 03oc03aa.4127.1522.473568.3.42
3/27/2007 6:55:44 PM|SETI@home|[file_xfer] Started download of file 03no03aa.5034.11282.742326.3.111
3/27/2007 6:55:45 PM||Access to reference site succeeded - project servers may be temporarily down.
3/27/2007 6:56:06 PM||Project communication failed: attempting access to reference site
3/27/2007 6:56:06 PM|SETI@home|[file_xfer] Temporarily failed download of 03no03aa.5034.11282.742326.3.111: system connect
3/27/2007 6:56:06 PM|SETI@home|Backing off 36 min 6 sec on download of file 03no03aa.5034.11282.742326.3.111
3/27/2007 6:56:08 PM||Access to reference site succeeded - project servers may be temporarily down.
3/27/2007 6:56:08 PM|SETI@home|[file_xfer] Started upload of file 27se04ab.11205.9842.236066.3.59_3_0
3/27/2007 6:56:30 PM||Project communication failed: attempting access to reference site
3/27/2007 6:56:30 PM|SETI@home|[file_xfer] Temporarily failed upload of 27se04ab.11205.9842.236066.3.59_3_0: system connect
3/27/2007 6:56:30 PM|SETI@home|Backing off 14 min 30 sec on upload of file 27se04ab.11205.9842.236066.3.59_3_0
3/27/2007 6:56:32 PM||Access to reference site succeeded - project servers may be temporarily down.
3/27/2007 6:57:41 PM|SETI@home|[file_xfer] Finished upload of file 27se04ab.11205.9794.567336.3.148_0_0
3/27/2007 6:57:41 PM|SETI@home|[file_xfer] Throughput 1934 bytes/sec
3/27/2007 6:57:44 PM|SETI@home|[file_xfer] Started upload of file 27se04ab.11205.9842.236066.3.255_3_0
3/27/2007 6:57:44 PM|SETI@home|Sending scheduler request: To report completed tasks
3/27/2007 6:57:44 PM|SETI@home|Reporting 1 tasks
3/27/2007 6:57:55 PM|SETI@home|Scheduler RPC succeeded [server version 509]
3/27/2007 6:57:55 PM|SETI@home|Deferring communication 11 sec, because requested by project
3/27/2007 6:58:06 PM||Project communication failed: attempting access to reference site
3/27/2007 6:58:06 PM|SETI@home|[file_xfer] Temporarily failed upload of 27se04ab.11205.9842.236066.3.255_3_0: system connect
3/27/2007 6:58:06 PM|SETI@home|Backing off 2 min 28 sec on upload of file 27se04ab.11205.9842.236066.3.255_3_0
3/27/2007 6:58:08 PM||Access to reference site succeeded - project servers may be temporarily down.
3/27/2007 6:58:14 PM|SETI@home|[file_xfer] Started download of file 03oc03aa.4127.1522.473568.3.45
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 537687 · Report as offensive
Profile Walla
Volunteer tester
Avatar

Send message
Joined: 14 May 06
Posts: 329
Credit: 177,013
RAC: 0
United States
Message 537701 - Posted: 28 Mar 2007, 2:46:49 UTC

Everything seems to be flowing nicely now.
ID: 537701 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 537703 - Posted: 28 Mar 2007, 2:51:31 UTC
Last modified: 28 Mar 2007, 2:52:50 UTC

Yup, mine also. Thanks to whomever kicked the box.
ID: 537703 · Report as offensive
Profile Labbie
Avatar

Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 537704 - Posted: 28 Mar 2007, 2:53:03 UTC

I just got all mine to go thru too.
ID: 537704 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 537721 - Posted: 28 Mar 2007, 3:33:31 UTC

My huge backlog is shrinking, Just like Jupiter in 2010.

Nope, strike that last statement, I'm done.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 537721 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 537762 - Posted: 28 Mar 2007, 7:37:32 UTC

Matt,
Good work on clearing problems etc. But looks like the validator problems have re-appeared.

Andy
ID: 537762 · Report as offensive

Message boards : Technical News : Network Drano (Mar 27 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.