Closer to being out of the woods on this one...

Message boards : Number crunching : Closer to being out of the woods on this one...
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 574819 - Posted: 24 May 2007, 7:19:35 UTC

Well, well. well.
The kitties appear to have been busy whilst I was away at work today.
Got home and saw some progress, so I rebooted the whole farm. Uploads took off and completed. Way cool. Reinstated the app info xml files. Restarted Boinc. Still working. Way cooler.
Downloads were a bit flaky, so I reset the router MTU from 1500 down to 1400. Shouldn't make a difference on my DSL connection, 1500 is reccomended by most TCP advisory programs, but what the heck.
Lordy, lordy, finally was able to download even the Beta results on some of my rigs! Ultimate coolness.
My quaddy (remember my quaddy?) has almost finished playing with 30 to 50 hour Astropulse WUs and finally thought it was time to request some Seti Main work........and got some! Quaddy should be back crunching Seti Main soon.
And the kitties smile!
Wonder where the next wicked witch in this little fairy tale will come from? According to Matt's last post, splitter capacity is about maxed out, and we are almost out of classic WUs to process pending Multibeam data. Hope that transition goes more smoothly than a few of the other recent ones have. Oh well, no pain, no gain.
High fives all around for the Seti staff!!

And the kitties say...'Stay tuned for the next episode of.....As the Seti Turns'.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 574819 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 574824 - Posted: 24 May 2007, 7:49:02 UTC

And the Cricket Graph seems to be showing a slow deline in traffic.
Could this mean we are really making a tidbit of progress in filling caches and having fewer hosts requesting work? I think maybe!!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 574824 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19078
Credit: 40,757,560
RAC: 67
United Kingdom
Message 574840 - Posted: 24 May 2007, 10:48:03 UTC - in response to Message 574824.  
Last modified: 24 May 2007, 10:48:28 UTC

And the Cricket Graph seems to be showing a slow deline in traffic.
Could this mean we are really making a tidbit of progress in filling caches and having fewer hosts requesting work? I think maybe!!

And the work ready to send is going up,

Thanks to Scarecrow for graph.
ID: 574840 · Report as offensive
Profile Dennis Lathem
Avatar

Send message
Joined: 3 Dec 06
Posts: 27
Credit: 1,126,010
RAC: 0
United States
Message 574850 - Posted: 24 May 2007, 12:23:42 UTC

All but one of my six machines are now working again. My newest and most powerful just does not seem to be able to communicate in any way with SETI. I am running the chicken optimized, but I am running it on all but one other machine. I have perhaps two dozen small work units that have been waiting to report for more than a week. IT is doing no work because it cannot get any. I check the logs and see repeated failures to communicate. Funny, when SETI first came up for a few hours this machine connected and received all those small work units crunched them and has not been able to communicate since then.

ID: 574850 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 574852 - Posted: 24 May 2007, 12:27:01 UTC - in response to Message 574850.  

All but one of my six machines are now working again. My newest and most powerful just does not seem to be able to communicate in any way with SETI. I am running the chicken optimized, but I am running it on all but one other machine. I have perhaps two dozen small work units that have been waiting to report for more than a week. IT is doing no work because it cannot get any. I check the logs and see repeated failures to communicate. Funny, when SETI first came up for a few hours this machine connected and received all those small work units crunched them and has not been able to communicate since then.



Have you rebooted the machine? I found that this kick started all of my rigs last night which had uploads stalled.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 574852 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 574853 - Posted: 24 May 2007, 12:28:38 UTC
Last modified: 24 May 2007, 12:31:55 UTC

Dennis, check to make sure:

The project isn't suspended

The comms are set to "network always available" and not suspended,

The project is set to "allow new work"

any other setting which might be stopping it.
ID: 574853 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 575076 - Posted: 25 May 2007, 0:12:32 UTC - in response to Message 574819.  
Last modified: 25 May 2007, 0:19:33 UTC

so I reset the router MTU from 1500 down to 1400. Shouldn't make a difference on my DSL connection, 1500 is reccomended by most TCP advisory programs, but what the heck.


Well, my take is that the MTU should be 1500.
It seems that the machine on 208.68.240.16 has an MTU of 1476

i:\\program files\\boinc>ping 208.68.240.16 -f -l 1450
Pinging 208.68.240.16 with 1450 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.


i:\\program files\\boinc>ping 208.68.240.16 -f -l 1448
Pinging 208.68.240.16 with 1448 bytes of data:
Reply from 208.68.240.16: bytes=1448 time=192ms TTL=53
Reply from 208.68.240.16: bytes=1448 time=192ms TTL=53


MTU = size + 28, so MTU is 1448+28 = 1476

This seems suspicious. Why is bruno using 1476 when the rest of the world uses 1500? This will almost double the amount of packets to deal with and could explain the serious difficulty I still have in transferring anything from a couple of crunching linux web servers. They are using state based firewalls (default fedora core 6), a side effect is the filtering of incoming state related RST packets, which I thought could be a reason for the extremely poor performance. So, I explicitly allowed 208.68.240.16 through. Even after this, it's still only successful in communicating 1% of the time, and no, I'm not going to change the MTU of 1500 on a production web server!

Andy.
ID: 575076 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 575077 - Posted: 25 May 2007, 0:14:19 UTC
Last modified: 25 May 2007, 0:14:52 UTC

Looks like a good observation, Andy.

Please re-post it in the Staff Blog area, it'll have a better chance of being read by the right people.

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 575077 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 575080 - Posted: 25 May 2007, 0:20:33 UTC - in response to Message 575077.  

OK... doing it now... :-)
ID: 575080 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 575114 - Posted: 25 May 2007, 1:49:31 UTC - in response to Message 575076.  



This seems suspicious. Why is bruno using 1476 when the rest of the world uses 1500?


Traffic to our router is tunneled to a router in Palo Alto. The overhead of the tunnel is 24 bytes more than the normal TCP/IP over ethernet overhead. Fragmentation shouldn't be too much of an issue because nearly every modern TCP/IP stack uses MTU discovery to size outgoing packets properly.

What might have been an issue is that the interface on bruno was set to have an MTU of 1500. I've adjusted that to 1476, which did cause us to jump up to 50 Mbps for a few minutes, but we're back down in the 20s again.

Eric

@SETIEric@qoto.org (Mastodon)

ID: 575114 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 575125 - Posted: 25 May 2007, 2:20:55 UTC - in response to Message 575114.  



This seems suspicious. Why is bruno using 1476 when the rest of the world uses 1500?


Traffic to our router is tunneled to a router in Palo Alto. The overhead of the tunnel is 24 bytes more than the normal TCP/IP over ethernet overhead. Fragmentation shouldn't be too much of an issue because nearly every modern TCP/IP stack uses MTU discovery to size outgoing packets properly.

What might have been an issue is that the interface on bruno was set to have an MTU of 1500. I've adjusted that to 1476, which did cause us to jump up to 50 Mbps for a few minutes, but we're back down in the 20s again.

Eric


Eric, thanks for the clear explanation - I think you solved it! All my machines are now connecting and reporting without timeouts. :-)

Either the tunnel had to fragment, or the tunnel traffic had to...
Fragmentation does make a significant transfer overhead - I wonder, is the tunnel absolutely necessary, given the load that it's under? There might be another way.

If not, then perhaps a net-facing MTU of 1500 could be maintained if the machines responsible for the tunnel used an MTU of 1524. If maintaining the tunnel is their primary purpose, then it might be worth taking a 'fragmentation hit' internally when talking to peers on other business with MTU 1500...
Just some thoughts...

Andy.
ID: 575125 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 575155 - Posted: 25 May 2007, 3:25:32 UTC - in response to Message 575114.  

. . . I've adjusted that to 1476, which did cause us to jump up to 50 Mbps for a few minutes, but we're back down in the 20s again.

Eric

Eric -
I can't explain the graphs flattening out again, but three of my machines have had major problems until right after you made the above change. They are now connecting without a single retry. You may indeed have fixed it.
- Regards
ID: 575155 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 575171 - Posted: 25 May 2007, 4:07:33 UTC - in response to Message 575114.  
Last modified: 25 May 2007, 4:08:52 UTC


Traffic to our router is tunneled to a router in Palo Alto. The overhead of the tunnel is 24 bytes more than the normal TCP/IP over ethernet overhead. Fragmentation shouldn't be too much of an issue because nearly every modern TCP/IP stack uses MTU discovery to size outgoing packets properly.

What might have been an issue is that the interface on bruno was set to have an MTU of 1500. I've adjusted that to 1476, which did cause us to jump up to 50 Mbps for a few minutes, but we're back down in the 20s again.

Eric

I concur with the previous posters, uploads and reporting are downright zippy now compared to before, when they would go past 100% and stall (sometimes forever).

Should decrease average bandwidth usage for the same amount of net transfer, too, so it's a win/win.

Thanks for looking into it,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 575171 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 575175 - Posted: 25 May 2007, 4:17:20 UTC - in response to Message 575114.  
Last modified: 25 May 2007, 4:17:44 UTC

I've adjusted that to 1476, which did cause us to jump up to 50 Mbps for a few minutes, but we're back down in the 20s again.

Eric


Eric..........

I have seen a dramatic improvement since right after you made this change.
Thanks to you and all the crew for the work you have done recently.



Boinc....Boinc....Boinc....Boinc....
ID: 575175 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 575198 - Posted: 25 May 2007, 5:32:28 UTC

Yay, all my machines are working again Smoothly! Thanks Eric!
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 575198 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 575203 - Posted: 25 May 2007, 6:05:57 UTC
Last modified: 25 May 2007, 6:06:16 UTC

OK....so should I change my router back to the MTU setting of 1500 which I would consider to be the correct setting?? Or to 1476?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 575203 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 575219 - Posted: 25 May 2007, 7:26:58 UTC - in response to Message 575203.  

OK....so should I change my router back to the MTU setting of 1500 which I would consider to be the correct setting?? Or to 1476?


Should be 1500, but that also depends on your provider... as Eric says, the servers should negotiate MTU, so it should drop to 1476 by itself to avoid fragmentation. I guess if all the machine does is crunch, then set the MTU for the *machine* to 1476 to save a couple of packets of negotiation.

If you adjust it on the router and it doesn't know about MTU negotiation, then all your packets everywhere may be fragmented - to and from the router, on your side and net side, as everywhere else you visit wants to use 1500 MTU...

There's a lot of shoulds and maybes here! I don't know the configuration of every network on the net, but 1500 is the standard size.

A quick google on mtu optimisation might help!

Andy.
ID: 575219 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 575222 - Posted: 25 May 2007, 7:37:56 UTC

Well, after consulting with the kitties, I set the router back to a MTU of 1500. It's what I had it set at from day 1, and with my DSL connection, what the TCP optimizers say to use. Mebbe with the changes made to Seti's settings it will work properly again.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 575222 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 575223 - Posted: 25 May 2007, 7:40:55 UTC
Last modified: 25 May 2007, 7:46:22 UTC

Follow up......
What's up with the Cricket Graph
Lots of funny looking spikes the last few hours.
Looks like something is struggling along here.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 575223 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 575229 - Posted: 25 May 2007, 8:06:23 UTC - in response to Message 575222.  

Well, after consulting with the kitties, I set the router back to a MTU of 1500. It's what I had it set at from day 1, and with my DSL connection, what the TCP optimizers say to use. Mebbe with the changes made to Seti's settings it will work properly again.


Kitties saved the day... you prompted me to check it out and pose a question that Eric saw, checked out and fixed the problem! sic vita est!

File transfers with SETI now running like a dream! :-)))))
ID: 575229 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Closer to being out of the woods on this one...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.