Magic Ping (Jul 31 2007)

Message boards : Technical News : Magic Ping (Jul 31 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 612623 - Posted: 31 Jul 2007, 20:34:21 UTC

Over the weekend the ready-to-delete queues filled up. After I restarted the file deleter processes this queue began to drain, which meant increased load competition on the workunit fileservers. These competed with the splitters (which write new workunits to those same disks) which ultimately meant the ready-to-send queue dropped to zero until the deleters caught up last night. No big deal.

Had the usual outage today. During so I rebooted some of the servers to clean their pipes but also ran some more router configuration tests as suggested by central campus. After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 612623 · Report as offensive
Profile Bax
Volunteer tester
Avatar

Send message
Joined: 16 May 99
Posts: 182
Credit: 3,919,072
RAC: 0
Canada
Message 612634 - Posted: 31 Jul 2007, 20:41:13 UTC - in response to Message 612623.  

Over the weekend the ready-to-delete queues filled up. After I restarted the file deleter processes this queue began to drain, which meant increased load competition on the workunit fileservers. These competed with the splitters (which write new workunits to those same disks) which ultimately meant the ready-to-send queue dropped to zero until the deleters caught up last night. No big deal.

Had the usual outage today. During so I rebooted some of the servers to clean their pipes but also ran some more router configuration tests as suggested by central campus. After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis.

- Matt



Keep up the good work Matt!

I'm impressed with how short the Tuesday downtimes are getting. :)


Bax

Join The Assimilators

Free Internet Radio! "The Assimilators" Browser Toolbar!


ID: 612634 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 612670 - Posted: 31 Jul 2007, 21:39:38 UTC - in response to Message 612623.  

...After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis.

Rather curious indeed...

Usual random guesses:

A problem of incompatibility between different manufacturers?

ARP firewalled(!) or turned off?
A cloned MAC that is conflicting somewhere?
Or just a broken port or partially connected LAN cable?
Or even a cross-over cable used instead of straight (or vice-versa)?!


Set a cron job running to ping the thing every five minutes to make sure it is up?!

Good luck,

Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 612670 · Report as offensive
Profile gypsy
Avatar

Send message
Joined: 25 Jan 03
Posts: 5
Credit: 148,584
RAC: 0
United States
Message 612767 - Posted: 1 Aug 2007, 1:51:58 UTC

Is this why my Seti is empty and idle and says there is no work?
ID: 612767 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 612768 - Posted: 1 Aug 2007, 2:06:10 UTC - in response to Message 612767.  

I truly wish it were that simple... There have been many that have worked through many things... So that you know there are several complex issues between the Boinc Core Client and the Server.. This does not discount Server issues... Many people are working these items and in some respects posting to the Number Crunching Forum may get more immediate help...
Many have talked out what you are stating...

Is this why my Seti is empty and idle and says there is no work?


Please consider a Donation to the Seti Project.

ID: 612768 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30649
Credit: 53,134,872
RAC: 32
United States
Message 612869 - Posted: 1 Aug 2007, 7:11:55 UTC - in response to Message 612623.  

I sure someone has tried the obvious of telling your personal SETI router what the physical path to the upstream router is. Sometimes those automatic protocalls fail. Especailly if there is some rouge router that should be operating as a switch somewhere near the path so your router can't figure out which one is the right one to connect to. Of course you could have a switch that needs a reset in the path too. It might pass IP addressed packets but not MAC addressed packets. And of course the stupid, how many switches between the routers? Someone else mentioned cloned or duplicate MAC addresses.

During so I rebooted some of the servers to clean their pipes but also ran some more router configuration tests as suggested by central campus. After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis.

- Matt


ID: 612869 · Report as offensive
Profile Michael Sinatra

Send message
Joined: 23 Jul 07
Posts: 11
Credit: 5,173
RAC: 0
United States
Message 616115 - Posted: 7 Aug 2007, 23:10:25 UTC - in response to Message 612670.  
Last modified: 7 Aug 2007, 23:20:38 UTC

...After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis.



Rather curious indeed...


Usual random guesses:

A problem of incompatibility between different manufacturers?


No. They're the same manufacturer.

ARP firewalled(!) or turned off?

No.

A cloned MAC that is conflicting somewhere?

No.

Or just a broken port or partially connected LAN cable?

No. pinging wouldn't help in that case.

Or even a cross-over cable used instead of straight (or vice-versa)?!

No. pinging DEFINITELY wouldn't fix that.

Set a cron job running to ping the thing every five minutes to make sure it is up?!


Doesn't seem necessary, but there is probably a similar solution.

michael
ID: 616115 · Report as offensive
Profile Michael Sinatra

Send message
Joined: 23 Jul 07
Posts: 11
Credit: 5,173
RAC: 0
United States
Message 616126 - Posted: 7 Aug 2007, 23:17:57 UTC - in response to Message 612869.  

I sure someone has tried the obvious of telling your personal SETI router what the physical path to the upstream router is. Sometimes those automatic protocalls fail. Especailly if there is some rouge router that should be operating as a switch somewhere near the path so your router can't figure out which one is the right one to connect to. Of course you could have a switch that needs a reset in the path too. It might pass IP addressed packets but not MAC addressed packets. And of course the stupid, how many switches between the routers? Someone else mentioned cloned or duplicate MAC addresses.


There are no automatic protocols, so the route must necessarily be defined statically--otherwise you would never get any work units from the server. The issue is that the downstream router won't use the static route until it has an ARP entry for the upstream router, but there isn't anything automatic that causes the ARP entry to be added. Hence the magic ping. If I (logged into the upstream router because that's what I manage) ping the downstream, it also brings up the tunnel correctly. So the issue is just getting the router to do an ARP query at the right time, and there are a few ways to do this...it's just a matter of finding the BEST way.

To further answer some of your points, none of the switches in the path in question are capable of layer-3--that is a design feature. And all of the routers in question are teal or charcoal grey or a combination thereof. We don't have any rouge routers.

michael
ID: 616126 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 616177 - Posted: 8 Aug 2007, 0:59:49 UTC - in response to Message 616126.  

I sure someone has tried the obvious of telling your personal SETI router what the physical path to the upstream router is. Sometimes those automatic protocalls fail. Especailly if there is some rouge router that should be operating as a switch somewhere near the path so your router can't figure out which one is the right one to connect to. Of course you could have a switch that needs a reset in the path too. It might pass IP addressed packets but not MAC addressed packets. And of course the stupid, how many switches between the routers? Someone else mentioned cloned or duplicate MAC addresses.


There are no automatic protocols, so the route must necessarily be defined statically--otherwise you would never get any work units from the server. The issue is that the downstream router won't use the static route until it has an ARP entry for the upstream router, but there isn't anything automatic that causes the ARP entry to be added. Hence the magic ping. If I (logged into the upstream router because that's what I manage) ping the downstream, it also brings up the tunnel correctly. So the issue is just getting the router to do an ARP query at the right time, and there are a few ways to do this...it's just a matter of finding the BEST way.

To further answer some of your points, none of the switches in the path in question are capable of layer-3--that is a design feature. And all of the routers in question are teal or charcoal grey or a combination thereof. We don't have any rouge routers.

michael

Could an automatic routing protocol like OSPF or IS-IS generate the traffic necessary to create the necessary ARP entries automatically? (I would not recommend proprietary protocols like the Cisco-only EIGRP nor protocols that have proven broken in practice like RIP for this purpose.) They create multicast traffic that should generate the needed traffic because multicasts to IP addresses reserved for routing purposes should not need an ARP lookup to generate the needed packets because they are sent to layer 2 broadcast or multicast addresses.

If you are wondering why I consider RIP broken, RIP has this nasty habit of creating routing loops when a line or a router fails, requiring time-squandering countermeasures to try to prevent them from forming.
ID: 616177 · Report as offensive
Profile Michael Sinatra

Send message
Joined: 23 Jul 07
Posts: 11
Credit: 5,173
RAC: 0
United States
Message 616181 - Posted: 8 Aug 2007, 1:11:50 UTC - in response to Message 616177.  


Could an automatic routing protocol like OSPF or IS-IS generate the traffic necessary to create the necessary ARP entries automatically? (I would not recommend proprietary protocols like the Cisco-only EIGRP nor protocols that have proven broken in practice like RIP for this purpose.) They create multicast traffic that should generate the needed traffic because multicasts to IP addresses reserved for routing purposes should not need an ARP lookup to generate the needed packets because they are sent to layer 2 broadcast or multicast addresses.


Yes, that might work, and in every other case where I have used GRE tunnels, there has been a routing protocol between the adjacent routers, so this was not a problem. However, we specifically do not want to do that for this design, as we want a clean demarc between the campus router and the S@h router, and OSPF and IS-IS (and any other IGP) simply won't allow that. (It is easier to filter between IGP instances, and even OSPF areas, nowadays, but it still adds a layer of complexity we don't want. Complexity==bad; elegance==good;)


If you are wondering why I consider RIP broken,


You and about 30,000,000 other people. :)

RIP has this nasty habit of creating routing loops when a line or a router fails, requiring time-squandering countermeasures to try to prevent them from forming.


Yep, the count-to-infinity problem. And there's also the no-VLSM-in-RIPv1 problem, the distance-vector-protocols-generally-suck problem, and others.

michael

ID: 616181 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 616189 - Posted: 8 Aug 2007, 1:44:25 UTC - in response to Message 616181.  


Could an automatic routing protocol like OSPF or IS-IS generate the traffic necessary to create the necessary ARP entries automatically? (I would not recommend proprietary protocols like the Cisco-only EIGRP nor protocols that have proven broken in practice like RIP for this purpose.) They create multicast traffic that should generate the needed traffic because multicasts to IP addresses reserved for routing purposes should not need an ARP lookup to generate the needed packets because they are sent to layer 2 broadcast or multicast addresses.


Yes, that might work, and in every other case where I have used GRE tunnels, there has been a routing protocol between the adjacent routers, so this was not a problem. However, we specifically do not want to do that for this design, as we want a clean demarc between the campus router and the S@h router, and OSPF and IS-IS (and any other IGP) simply won't allow that. (It is easier to filter between IGP instances, and even OSPF areas, nowadays, but it still adds a layer of complexity we don't want. Complexity==bad; elegance==good;)


If you are wondering why I consider RIP broken,


You and about 30,000,000 other people. :)

RIP has this nasty habit of creating routing loops when a line or a router fails, requiring time-squandering countermeasures to try to prevent them from forming.


Yep, the count-to-infinity problem. And there's also the no-VLSM-in-RIPv1 problem, the distance-vector-protocols-generally-suck problem, and others.

michael


Could BGP solve the problem? I have not used this protocol myself, but I have read that it handles arbitrary reasons well, which OSPF does not do. I do not know if BGP will generate the necessary traffic, though.
ID: 616189 · Report as offensive
Profile Michael Sinatra

Send message
Joined: 23 Jul 07
Posts: 11
Credit: 5,173
RAC: 0
United States
Message 616408 - Posted: 8 Aug 2007, 17:28:37 UTC - in response to Message 616189.  
Last modified: 8 Aug 2007, 17:28:48 UTC



Could BGP solve the problem? I have not used this protocol myself, but I have read that it handles arbitrary reasons well, which OSPF does not do. I do not know if BGP will generate the necessary traffic, though.


Configuring BGP between the upstream (S@h) and downstream (campus) router would technically work because BGP operates over TCP, so the two routers' attempts to establish the TCP connection when they came up would handle the necessary ARP processing. However, this would be like shaving with a guillotine. It's already necessary to announce the S@h netblock via BGP and this would add a second layer of BGP just to get the tunnel established, when the simple static route *should* work just fine (and does, provided that the ARP entry gets added). Personally, I'd like to do something more in-band than that, but there is a very limited amount of time I can spend troubleshooting this problem. I did have an idea this morning and will suggest it to the S@h folks. If it works, I'll explain it more...

michael
ID: 616408 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 616708 - Posted: 9 Aug 2007, 0:46:59 UTC - in response to Message 616115.  

...After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis.



Rather curious indeed...


Usual random guesses:

A problem of incompatibility between different manufacturers?


No. They're the same manufacturer.

ARP firewalled(!) or turned off?

No.

A cloned MAC that is conflicting somewhere?

No.

Or just a broken port or partially connected LAN cable?

No. pinging wouldn't help in that case.

Or even a cross-over cable used instead of straight (or vice-versa)?!

No. pinging DEFINITELY wouldn't fix that.

Set a cron job running to ping the thing every five minutes to make sure it is up?!


Doesn't seem necessary, but there is probably a similar solution.

michael



My only guess would be that there is link auto-negotiation turned on between those routers. Either that or just a flaky piece of hardware. Yuck!
ID: 616708 · Report as offensive
JLDun
Volunteer tester
Avatar

Send message
Joined: 21 Apr 06
Posts: 573
Credit: 196,101
RAC: 0
United States
Message 616799 - Posted: 9 Aug 2007, 4:24:42 UTC - in response to Message 616126.  


To further answer some of your points, none of the switches in the path in question are capable of layer-3--that is a design feature. And all of the routers in question are teal or charcoal grey or a combination thereof. We don't have any rouge routers.

michael

Maybe a mistyping of rogue?
ID: 616799 · Report as offensive
Profile Michael Sinatra

Send message
Joined: 23 Jul 07
Posts: 11
Credit: 5,173
RAC: 0
United States
Message 618800 - Posted: 13 Aug 2007, 22:23:33 UTC - in response to Message 616708.  


My only guess would be that there is link auto-negotiation turned on between those routers. Either that or just a flaky piece of hardware. Yuck!


That still doesn't explain why ping fixes it. Link negotiation issues on gig links wouldn't produce the observed problem.

michael
ID: 618800 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 620681 - Posted: 17 Aug 2007, 2:32:52 UTC - in response to Message 618800.  


My only guess would be that there is link auto-negotiation turned on between those routers. Either that or just a flaky piece of hardware. Yuck!


That still doesn't explain why ping fixes it. Link negotiation issues on gig links wouldn't produce the observed problem.

michael


I can't prove it, because I'm not there to tinker with it. However...

Auto-negotiation can do some strange things, so it's always worth a quick check if not done already. This is especially true if it has auto-crossover switching. I much prefer a dedicated uplink port for this exact reason, but that might not be available on their hardware. The fact that they need to do the magic ping immediately after a power cycle actually gives weight to my hypothesis, as that's when link negotiation happens.
ID: 620681 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 621547 - Posted: 18 Aug 2007, 0:36:59 UTC - in response to Message 620681.  


My only guess would be that there is link auto-negotiation turned on between those routers. Either that or just a flaky piece of hardware. Yuck!


That still doesn't explain why ping fixes it. Link negotiation issues on gig links wouldn't produce the observed problem.

michael


I can't prove it, because I'm not there to tinker with it. However...

Auto-negotiation can do some strange things, so it's always worth a quick check if not done already. This is especially true if it has auto-crossover switching. I much prefer a dedicated uplink port for this exact reason, but that might not be available on their hardware. The fact that they need to do the magic ping immediately after a power cycle actually gives weight to my hypothesis, as that's when link negotiation happens.

Auto negotiation also occurs when the cable is plugged back in. An interesting experiment to try is the behavior after unplugging and plugging the cable from both states - before the magic ping and after the magic ping.


BOINC WIKI
ID: 621547 · Report as offensive

Message boards : Technical News : Magic Ping (Jul 31 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.