Message boards :
Technical News :
Magic Ping (Jul 31 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Over the weekend the ready-to-delete queues filled up. After I restarted the file deleter processes this queue began to drain, which meant increased load competition on the workunit fileservers. These competed with the splitters (which write new workunits to those same disks) which ultimately meant the ready-to-send queue dropped to zero until the deleters caught up last night. No big deal. Had the usual outage today. During so I rebooted some of the servers to clean their pipes but also ran some more router configuration tests as suggested by central campus. After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Bax Send message Joined: 16 May 99 Posts: 182 Credit: 3,919,072 RAC: 0 |
Over the weekend the ready-to-delete queues filled up. After I restarted the file deleter processes this queue began to drain, which meant increased load competition on the workunit fileservers. These competed with the splitters (which write new workunits to those same disks) which ultimately meant the ready-to-send queue dropped to zero until the deleters caught up last night. No big deal. Keep up the good work Matt! I'm impressed with how short the Tuesday downtimes are getting. :) Bax Join The Assimilators Free Internet Radio! "The Assimilators" Browser Toolbar! |
ML1 Send message Joined: 25 Nov 01 Posts: 20283 Credit: 7,508,002 RAC: 20 |
...After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis. Rather curious indeed... Usual random guesses: A problem of incompatibility between different manufacturers? ARP firewalled(!) or turned off? A cloned MAC that is conflicting somewhere? Or just a broken port or partially connected LAN cable? Or even a cross-over cable used instead of straight (or vice-versa)?! Set a cron job running to ping the thing every five minutes to make sure it is up?! Good luck, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
gypsy Send message Joined: 25 Jan 03 Posts: 5 Credit: 148,584 RAC: 0 |
Is this why my Seti is empty and idle and says there is no work? |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
I truly wish it were that simple... There have been many that have worked through many things... So that you know there are several complex issues between the Boinc Core Client and the Server.. This does not discount Server issues... Many people are working these items and in some respects posting to the Number Crunching Forum may get more immediate help... Many have talked out what you are stating... Is this why my Seti is empty and idle and says there is no work? Please consider a Donation to the Seti Project. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30649 Credit: 53,134,872 RAC: 32 |
I sure someone has tried the obvious of telling your personal SETI router what the physical path to the upstream router is. Sometimes those automatic protocalls fail. Especailly if there is some rouge router that should be operating as a switch somewhere near the path so your router can't figure out which one is the right one to connect to. Of course you could have a switch that needs a reset in the path too. It might pass IP addressed packets but not MAC addressed packets. And of course the stupid, how many switches between the routers? Someone else mentioned cloned or duplicate MAC addresses. During so I rebooted some of the servers to clean their pipes but also ran some more router configuration tests as suggested by central campus. After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis. |
Michael Sinatra Send message Joined: 23 Jul 07 Posts: 11 Credit: 5,173 RAC: 0 |
...After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis.
No. They're the same manufacturer. ARP firewalled(!) or turned off? No. A cloned MAC that is conflicting somewhere? No. Or just a broken port or partially connected LAN cable? No. pinging wouldn't help in that case. Or even a cross-over cable used instead of straight (or vice-versa)?! No. pinging DEFINITELY wouldn't fix that. Set a cron job running to ping the thing every five minutes to make sure it is up?! Doesn't seem necessary, but there is probably a similar solution. michael |
Michael Sinatra Send message Joined: 23 Jul 07 Posts: 11 Credit: 5,173 RAC: 0 |
I sure someone has tried the obvious of telling your personal SETI router what the physical path to the upstream router is. Sometimes those automatic protocalls fail. Especailly if there is some rouge router that should be operating as a switch somewhere near the path so your router can't figure out which one is the right one to connect to. Of course you could have a switch that needs a reset in the path too. It might pass IP addressed packets but not MAC addressed packets. And of course the stupid, how many switches between the routers? Someone else mentioned cloned or duplicate MAC addresses. There are no automatic protocols, so the route must necessarily be defined statically--otherwise you would never get any work units from the server. The issue is that the downstream router won't use the static route until it has an ARP entry for the upstream router, but there isn't anything automatic that causes the ARP entry to be added. Hence the magic ping. If I (logged into the upstream router because that's what I manage) ping the downstream, it also brings up the tunnel correctly. So the issue is just getting the router to do an ARP query at the right time, and there are a few ways to do this...it's just a matter of finding the BEST way. To further answer some of your points, none of the switches in the path in question are capable of layer-3--that is a design feature. And all of the routers in question are teal or charcoal grey or a combination thereof. We don't have any rouge routers. michael |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
I sure someone has tried the obvious of telling your personal SETI router what the physical path to the upstream router is. Sometimes those automatic protocalls fail. Especailly if there is some rouge router that should be operating as a switch somewhere near the path so your router can't figure out which one is the right one to connect to. Of course you could have a switch that needs a reset in the path too. It might pass IP addressed packets but not MAC addressed packets. And of course the stupid, how many switches between the routers? Someone else mentioned cloned or duplicate MAC addresses. Could an automatic routing protocol like OSPF or IS-IS generate the traffic necessary to create the necessary ARP entries automatically? (I would not recommend proprietary protocols like the Cisco-only EIGRP nor protocols that have proven broken in practice like RIP for this purpose.) They create multicast traffic that should generate the needed traffic because multicasts to IP addresses reserved for routing purposes should not need an ARP lookup to generate the needed packets because they are sent to layer 2 broadcast or multicast addresses. If you are wondering why I consider RIP broken, RIP has this nasty habit of creating routing loops when a line or a router fails, requiring time-squandering countermeasures to try to prevent them from forming. |
Michael Sinatra Send message Joined: 23 Jul 07 Posts: 11 Credit: 5,173 RAC: 0 |
Yes, that might work, and in every other case where I have used GRE tunnels, there has been a routing protocol between the adjacent routers, so this was not a problem. However, we specifically do not want to do that for this design, as we want a clean demarc between the campus router and the S@h router, and OSPF and IS-IS (and any other IGP) simply won't allow that. (It is easier to filter between IGP instances, and even OSPF areas, nowadays, but it still adds a layer of complexity we don't want. Complexity==bad; elegance==good;)
You and about 30,000,000 other people. :) RIP has this nasty habit of creating routing loops when a line or a router fails, requiring time-squandering countermeasures to try to prevent them from forming. Yep, the count-to-infinity problem. And there's also the no-VLSM-in-RIPv1 problem, the distance-vector-protocols-generally-suck problem, and others. michael |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
Could BGP solve the problem? I have not used this protocol myself, but I have read that it handles arbitrary reasons well, which OSPF does not do. I do not know if BGP will generate the necessary traffic, though. |
Michael Sinatra Send message Joined: 23 Jul 07 Posts: 11 Credit: 5,173 RAC: 0 |
Configuring BGP between the upstream (S@h) and downstream (campus) router would technically work because BGP operates over TCP, so the two routers' attempts to establish the TCP connection when they came up would handle the necessary ARP processing. However, this would be like shaving with a guillotine. It's already necessary to announce the S@h netblock via BGP and this would add a second layer of BGP just to get the tunnel established, when the simple static route *should* work just fine (and does, provided that the ARP entry gets added). Personally, I'd like to do something more in-band than that, but there is a very limited amount of time I can spend troubleshooting this problem. I did have an idea this morning and will suggest it to the S@h folks. If it works, I'll explain it more... michael |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
...After power cycling our personal SETI router doesn't see the next router up the pike until we do what we call the "magic ping." Pinging this next router seems to be the only way to wake up this connection and then all traffic floods through. Nobody is sure why this is the case, and the tests today didn't reveal anything new. An annoyance more than a crisis. My only guess would be that there is link auto-negotiation turned on between those routers. Either that or just a flaky piece of hardware. Yuck! |
JLDun Send message Joined: 21 Apr 06 Posts: 573 Credit: 196,101 RAC: 0 |
Maybe a mistyping of rogue? |
Michael Sinatra Send message Joined: 23 Jul 07 Posts: 11 Credit: 5,173 RAC: 0 |
That still doesn't explain why ping fixes it. Link negotiation issues on gig links wouldn't produce the observed problem. michael |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
I can't prove it, because I'm not there to tinker with it. However... Auto-negotiation can do some strange things, so it's always worth a quick check if not done already. This is especially true if it has auto-crossover switching. I much prefer a dedicated uplink port for this exact reason, but that might not be available on their hardware. The fact that they need to do the magic ping immediately after a power cycle actually gives weight to my hypothesis, as that's when link negotiation happens. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Auto negotiation also occurs when the cable is plugged back in. An interesting experiment to try is the behavior after unplugging and plugging the cable from both states - before the magic ping and after the magic ping. BOINC WIKI |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.