Message boards :
Number crunching :
Problems...
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 13 · Next
Author | Message |
---|---|
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Does somebody know what/where the last 2 hops: 64.71.140.42 208.68.243.254 before the Upload/Download servers are? The 64.71.140.42 pings OK, but 208.68.243.254 shows 30% loss of Pings: http://setiathome.berkeley.edu/forum_thread.php?id=58845&nowrap=true#972727 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Does somebody know what/where the last 2 hops: I don't know, but what follows is an educated guess informed by several years of message board reading.... 208.68.243.254 is obviously in the 208.68.240.0/22 (1,000 node address space) netrange posted by arkayn in that same news thread. My guess is that it's "That big grey box is our router (to connect us to our own private ISP)," from the 2008 Photo Album (right-hand rack, picture 6). That probably means that 64.71.140.42 is the matching Cisco tunneling router several miles away across the Bay, in Palo Alto or wherever the Hurricane Electric head-end is. The idea is that all our traffic is encapsulated (possibly even encrypted) by Hurricane Electric, passes through all the Campus switches and routers unmolested (and untraced), and is finally decapsulated/decrypted in the server closet, a mile up the hill from the Campus datacenter. Some time ago, they found that the nominal 100 Mbit Cisco router had a 60 Mbit hard cap on the tunneling co-processor, so they persuaded someone to donate them a newer and much more powerful one. That photograph may actually be the old, wimpy router - I forget when the changeover was made. So your 'pings' were probably lost somewhere in the encryption/tunneling process. That information is unfortunately completely useless, unless you can produce comparable figures from last week showing the rate of packet loss when data is flowing normally. If that log exists, and is significantly different, then you would indeed have discovered something useful. But I doubt it. The Internet was engineered - by the military and ARPANET - to be quite literally able to withstand a nuclear war. Packet loss, and automatic retransmission, is absolutely normal and corrected transparently. It probably happened while you were reading this. The data I'm worried about is the packet called 'RST' whech seems to get through every time - and tells my BOINC client to cancel its upload attempt. Edit/correction: Judging by the dates, that photograph is probably the new Cisco 7600-series router donated by Bill Woodcock on or about 22 January 2008. Look up the specs (and price) of that baby! |
ikarus Send message Joined: 18 Feb 10 Posts: 1 Credit: 40,031 RAC: 0 |
Could it be that someone has installed anti-p2p software somewhere along the route? Those RST packages look suspiciously like the packets that comcast used. http://en.wikipedia.org/wiki/Hart_v._Comcast
and http://en.wikipedia.org/wiki/Sandvine
|
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Ping -t 208.68.243.254 (with 32 bytes of data) gives < 10% loss How big is "the packet called 'RST'"? "which seems to get through every time" = 100% of how many attempts? P.S. I have a LAN card which semi-burned after a lightning storm - it continued to work but very slow! (many packets lost) Can't the router burn-out this way? . Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
How big is "the packet called 'RST'"? Not very - count them: (Direct link) "which seems to get through every time" = 100% of how many attempts? Well, I could count the ones which reach me - but I can't give you a percentage, because I wouldn't know if Berkeley sent me one which didn't get through! |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Could it be that someone has installed anti-p2p software somewhere along the route? Those RST packages look suspiciously like the packets that comcast used. .... and the screwdriver I used to put up that new shelf looks a little bit like the murder weapon in a bizarre unsolved case in Brisbane, Australia. Doesn't make me the murderer. There are uncounted millions of RST packets sent every day. RST packets have been generated routinely by devices on the internet since 1981. It's a pretty big jump from "Comcast picked a really stupid way to enforce their Terms of Service" to "All RST packets are aimed at P2P" and another big jump to all RST packets are malicious. There are 2,500,000 active hosts, according to BOINCstats, and at this point, lots of them want to send in work. Bruno may be able to handle 2,500 uploads at once (although the throughput is likely better at the 250 simultaneous upload level). The most likely reason for the RST: Bruno is at whatever limit has been set, and it wants the BOINC client to stop sending packets so it can manage what it has as best it can. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Richard has pointed out that SETI has some nice big Cisco routers, and I'll bet the rest of the Campus network is all-Cisco. "ping" is ICMP. It's not TCP, it's not UDP. The "control message" is "8" (echo). ... and it is trivially easy to tell a router to drop ICMP Echo packets entirely, or drop them if the CPU load is above a certain value. This is commonly done on Cisco routers, and likely possible on others. What that means is that you can't assume that dropped echo packets are a sign of trouble unless you know that the router on the end of that link is configured to never ever drop pings. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
How big is "the packet called 'RST'"? RST is a single-bit flag which is present in every single TCP packet. There are six of these flags. "RST packets" are packets with the RST flag set to 1. The smallest TCP packet, encapsulated in an IP packet, should be 40 or 48 bytes. I think everyone here is chasing a network problem, and all of the evidence shows a fairly healthy network, and a server trying to cope with 1,000 times the number of connections it ought to be getting. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
There are 2,500,000 active hosts, according to BOINCstats, ... Actually, the 'Active' column says 296,911 hosts: the 2.5 million is the total number of hosts that have ever been attached. So the best part of 90% of them have dropped out again. All of the evidence shows a fairly healthy network ... Agreed ... and a server trying to cope with 1,000 times the number of connections it ought to be getting. Disagree. That figure would show up in the packet count version of the Cricket graphs, and it just isn't there. There's an anomaly for Tuesday night/Wednesday morning, when the aircon was off and the servers had been shut down in a rather untidy way, but the current packet-count is less than usual. No, it feels that the number of connection attempts is nominal, and it's the ability of the servers to service them which has inexplicably dropped far below its normal level. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
...No, it feels that the number of connection attempts is nominal, and it's the ability of the servers to service them which has inexplicably dropped far below its normal level. Motion seconded. Grant Darwin NT |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
...No, it feels that the number of connection attempts is nominal, and it's the ability of the servers to service them which has inexplicably dropped far below its normal level. This isn't a democracy. Technical problems are what they are, not what consensus decides they should be. So we can all sit back and second-guess, but it'll turn out to be whatever it was. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
The problem you and I are having is that the metrics we really want are not available, so we're looking for proxies in the metrics we have. For example, we have packets into SETI@Home through the interface graphs you noted. That packet count includes downloads, uploads, scheduler requests, and who knows what else (well, staff very likely knows). I'd like to know the number of SYN packets, and I'd like that count just for Bruno, since we seem to have a laser-focus on uploads. At least as of last night, a failed upload only took five packets (in both directions). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
At least as of last night, a failed upload only took five packets (in both directions). The ones that fail at 'first contact' (when the client asks the server what information it already has on the file it's about to upload, in the hope that there's already a partial upload which it can re-commence at the break-point), can indeed consume very few packets (four/eight in the screendump I posted in Technical News). But others - one or two in every half-dozen, I would judge - get the file info back, and proceed to initiate the actual data transfer. And some (most?) of those then go on to upload the whole file, or a substantial portion of it - then get an RST instead of the FIN / ACK / OK we're hoping for. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 31005 Credit: 53,134,872 RAC: 32 |
At least as of last night, a failed upload only took five packets (in both directions). And we don't know why. It very well could be Bruno isn't able to confirm the data wrote to disk and is correctly telling our client to try later, or it could be some software or hardware glitch. We don't have the tools to tell because we can't see Bruno's logs, we can't match entries. I hope by noon Monday staff can tell us if it will be easy to fix or not so easy. In any case there is nothing more we can do right now except wait. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Just to add to the confusion, a larger number than usual of the WU's that do manage to upload are failing to validate. I am getting a significant percentage from my CPU and that *never* returns a result that fails to validate. Temps are all OK this end and others have noted this too. F. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
This is an interesting puzzle, and I've got some interesting tools, so.... I'm using a capture function on my router to capture sessions. It's not like wireshark or some passive system in that the packets are captured on the way through the router. So, I repeatedly tried to upload a task, enough to see Richard's case where the transaction goes to some percentage, and then dies. ... and what I'm seeing is the BOINC client goes through the SYN/SYN+ACK/ACK sequence, and the HTTP "POST" to do the upload is piggybacked on the ACK. Most of the time, Bruno responds with either a FIN (closing the connection) or a RST (closing the connection). Occasionally, the SYN/SYN+ACK goes well, but the ACK with the HTTP POST gets lost, and retries. I saw BOINC go to over 50% "transferred" but the file is not going over the wire. It's not in the captured results. The ones that show progress may be the ones where the initial ACK gets lost. ... at least in the samples I have. |
Julie Send message Joined: 28 Oct 09 Posts: 34060 Credit: 18,883,157 RAC: 18 |
|
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66331 Credit: 55,293,173 RAC: 49 |
Question: What does Bruno do as part of Its "duties"... It isn't the one that does AP and MB on the same server is It Ned? Never mind. sigh. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Question: What does Bruno do as part of Its "duties"... It isn't the one that does AP and MB on the same server is It Ned? It really depends on what you mean by "does." Most of the physical boxes at SETI@Home do more than one job. Bruno is the upload server, and if you are trying to upload, that's the function you care about. Most everything at SETI@Home "does" both projects (and hosts BETA, and probably BOINC Alpha, etc.) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
...No, it feels that the number of connection attempts is nominal, and it's the ability of the servers to service them which has inexplicably dropped far below its normal level. Yep. But "Motion seconded" makes a change from "I agree" or "Me too". I have no doubt that it's a small & obvious problem with a rather simple fix. But that's the way all problems are- after they're discovered & resolved. Hindsight is such a wonderfull thing (over 15 years fixing electronics has taught me that). Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.