Labyrinth of Light (Oct 06 2011)

Message boards : Technical News : Labyrinth of Light (Oct 06 2011)

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1441
Credit: 213,689
RAC: 0
United States
Message 1159571 - Posted: 6 Oct 2011, 22:20:37 UTC

Hey gang. I've been back in the lab for a few days. Figured I'd say hi and mention a couple things.

The HE problems are indeed getting weirder, and multi-faceted. We know the router itself needs more memory. Getting memory isn't the problem. Getting access to the router is. Knowing this, one hopeful option is to perhaps get ourselves off the current link and move entirely back to using campus infrastructure, now that there's enough bandwidth to handle us. But there are so many parties involved on all fronts that, as always, this sort of thing is moving at a snails pace. Meanwhile, one of the routers in our chain, unrelated to us but still affecting us, was the victim of a DDOS attack the other day. Another reason we need to simplify our setup already.

Note that there have been other issues affecting general connectivity. For example: our mysql schedule database swelled too large because db_purge wasn't running for a while, so it started falling out of memory and slowing everything down. This is clearing up on its own at the moment. There were also some scheduler bugs that have been introduced but then mostly if not entirely have been fixed. Meanwhile we turned off "resend lost results" until the smoke clears a bit.

We're also weighing our options for improving the science database throughput. The solutions include (and aren't mutually exclusive) moving entirely to solid state disks (which I find a little scary), changing the schema of our signal tables to bifurcate into good/uninteresting signals (which will vastly reduce lookups and what we need to keep in memory, but will require major changes to all our backend code), and perhaps just adding another disk enclosure with SATA drives.

Meanwhile I just started another informative mass e-mail. It's going out now verrrry slowly (due to recent campus mail configuration changes). If you're curious, here it is.

By the way that Secret Chiefs 3 US/Canada tour was super fun, and I'm about to head out on a shorter one in Europe (Iceland/France/England). There may be other similar tours on my plate in the new year (Western US, Australia, South America). Sorry about the absence, but I'll be back in November and then not going anywhere for a couple months I think.

- Matt


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

ID: 1159571 · Report as offensive
ClaggyProject Donor
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4623
Credit: 46,348,740
RAC: 2,934
United Kingdom
Message 1159577 - Posted: 6 Oct 2011, 22:29:51 UTC - in response to Message 1159571.  

Welcome back Matt, and thanks for the update,

Claggy

ID: 1159577 · Report as offensive
OzzFan
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15116
Credit: 45,483,599
RAC: 46,377
United States
Message 1159581 - Posted: 6 Oct 2011, 22:50:34 UTC - in response to Message 1159571.  

Thanks for dropping us a line and it's great to hear you're having fun outside of your duties at SETI@Home.

ID: 1159581 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11140
Credit: 83,767,042
RAC: 46,192
United Kingdom
Message 1159587 - Posted: 6 Oct 2011, 23:24:17 UTC

Hi Matt, welcome back.

Do you have any idea why the RAM which has been adequate for the last three years has suddenly become too little?

ID: 1159587 · Report as offensive
Profile Jim_S
Avatar

Send message
Joined: 23 Feb 00
Posts: 4633
Credit: 37,422,602
RAC: 19,081
United States
Message 1159588 - Posted: 6 Oct 2011, 23:34:19 UTC

Howdy Matt,
Welcome back for awhile and Thanks for the update.

Jim



I Desire Peace and Justice, Jim Scott (Mod-Ret.)

ID: 1159588 · Report as offensive
Profile Gary CharpentierCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 18642
Credit: 21,466,367
RAC: 20,107
United States
Message 1159625 - Posted: 7 Oct 2011, 1:44:16 UTC - in response to Message 1159571.  

Getting memory isn't the problem. Getting access to the router is.

This sounds like you know that there are empty sockets just waiting if you could get by the security guard.

As to that DDOS you know the rumors will be that it was ET doing it!

ID: 1159625 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 65
United States
Message 1159626 - Posted: 7 Oct 2011, 1:46:33 UTC

One lesson that needs to be learned about SSDs before you install them. They need to be backed up religiously. If the drive fails, the data is gone and there is no recovery other than the backups.




BOINC WIKI

ID: 1159626 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 536
Credit: 2,377,393
RAC: 667
Canada
Message 1159646 - Posted: 7 Oct 2011, 2:47:39 UTC

Perhaps it would be a good idea to put the goodsearch/goodshop link on the donation page (since it wasn't mentioned in the email)
Bob




Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957

ID: 1159646 · Report as offensive
Harland Dains

Send message
Joined: 20 Jul 99
Posts: 2
Credit: 3,554,048
RAC: 0
United States
Message 1159658 - Posted: 7 Oct 2011, 3:48:03 UTC

Nice to have you back, you were missed. The project was heart broken without you and decided to act out.


ID: 1159658 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45918
Credit: 815,238,516
RAC: 124,857
United States
Message 1159696 - Posted: 7 Oct 2011, 6:27:05 UTC

Great to see your face in da place, Matt.

Glad you had fun on your tour.

See ya in November.

And thanks, as always, for yet another informative post.

Meow!


Cats.....what more does one need?

Have made friends in this life.
Most were cats.

ID: 1159696 · Report as offensive
Profile Dimly Lit Lightbulb 😀Project Donor
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 14363
Credit: 2,924,128
RAC: 3,704
United Kingdom
Message 1161066 - Posted: 10 Oct 2011, 21:38:17 UTC

Welcome back Matt, glad you had fun on the tour, and thanks for the news.

ID: 1161066 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29579
Credit: 49,101,998
RAC: 17,183
Germany
Message 1161165 - Posted: 11 Oct 2011, 7:33:36 UTC

Thanks for the update Matt.

Its good to see you back.


With each crime and every kindness we birth our future.

ID: 1161165 · Report as offensive
Profile Pato

Send message
Joined: 27 Nov 06
Posts: 4
Credit: 977,574
RAC: 328
Australia
Message 1161692 - Posted: 12 Oct 2011, 23:33:34 UTC

This would explain why I haven't been able to connect for just over a week!

ID: 1161692 · Report as offensive
Profile Merlin SyStems

Send message
Joined: 2 Oct 08
Posts: 8
Credit: 13,639,383
RAC: 0
Netherlands
Message 1161806 - Posted: 13 Oct 2011, 7:10:18 UTC

Thanks for the update Matt

ID: 1161806 · Report as offensive
David SProject Donor
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 17042
Credit: 20,948,006
RAC: 6,218
United States
Message 1161863 - Posted: 13 Oct 2011, 13:43:45 UTC

Good to have you back, Matt, and thanks for the news update. I can't believe you posted this a whole week ago and I just saw it today! I must be slipping...


David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


ID: 1161863 · Report as offensive
richardw66
Volunteer tester

Send message
Joined: 13 Nov 00
Posts: 7
Credit: 411,889
RAC: 0
Australia
Message 1162098 - Posted: 14 Oct 2011, 4:50:22 UTC - in response to Message 1161863.  
Last modified: 14 Oct 2011, 5:10:40 UTC

Hi Matt

There seems to be some routing issue in HE as well.

I can connect from one location on one ISP but not from another.

From the ISP that I cannot connect (optus), the route dies at HE:

traceroute to setiboinc.ssl.berkeley.edu (208.68.240.20), 64 hops max, 52 byte packets
1 10.0.1.1 (10.0.1.1) 1.066 ms 1.906 ms 0.580 ms
2 10.63.0.1 (10.63.0.1) 23.003 ms 9.393 ms 9.953 ms
3 riv3-ge0-2.gw.optusnet.com.au (198.142.160.241) 10.112 ms 9.969 ms 29.102 ms
4 riv5-ge5-0.gw.optusnet.com.au (211.29.126.29) 8.199 ms 11.851 ms 9.064 ms
5 203.208.190.125 (203.208.190.125) 162.840 ms 168.029 ms 166.840 ms
6 pos3-2.sngtp-ar2.ix.singtel.com (203.208.182.205) 184.330 ms
xe-0-0-0-0.plapx-cr2.ix.singtel.com (203.208.183.161) 171.129 ms
pos3-2.sngtp-ar2.ix.singtel.com (203.208.182.205) 168.020 ms
7 paix.he.net (198.32.176.20) 190.950 ms 181.960 ms 181.067 ms
8 * * *
9 * * *
10 * * *

PING setiboinc.ssl.berkeley.edu (208.68.240.20): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1


The ISP that will connect (TPG):

traceroute to setiboinc.ssl.berkeley.edu (208.68.240.20), 64 hops max, 40 byte packets
1 192.168.1.1 (192.168.1.1) 1.377 ms 0.412 ms 0.323 ms
2 10.20.21.49 (10.20.21.49) 80.781 ms 17.424 ms 17.383 ms
3 202.7.173.185 (202.7.173.185) 19.036 ms 17.518 ms 17.013 ms
4 syd-nxg-men-crt2-ge-7-1-0.tpgi.com.au (202.7.162.37) 15.958 ms 16.214 ms 15.787 ms
5 10gigabitethernet1-3.core1.sjc1.he.net (72.52.93.37) 229.097 ms 170.447 ms 177.884 ms
6 10.122.122.18 (10.122.122.18) 250.467 ms 171.295 ms 175.204 ms
7 64.71.140.42 (64.71.140.42) 166.409 ms 203.883 ms 223.715 ms
8 * 208.68.243.254 (208.68.243.254) 172.653 ms *


PING setiboinc.ssl.berkeley.edu (208.68.240.20): 56 data bytes
64 bytes from 208.68.240.20: icmp_seq=0 ttl=55 time=210.032 ms
64 bytes from 208.68.240.20: icmp_seq=1 ttl=55 time=205.536 ms
64 bytes from 208.68.240.20: icmp_seq=2 ttl=55 time=208.649 ms

From HE's Looking Glass page some of their locations cannot route to setiboinc at all and some can. I have sent a message to their looking glass support about this, hopefully they find the problem.


ID: 1162098 · Report as offensive
geyser

Send message
Joined: 7 Oct 04
Posts: 8
Credit: 23,198,507
RAC: 13,123
United States
Message 1162109 - Posted: 14 Oct 2011, 5:38:11 UTC - in response to Message 1162098.  

I just realized that I have the same routing problem so I have not been able to get new work units:

traceroute setiboinc.ssl.berkeley.edu
traceroute to setiboinc.ssl.berkeley.edu (208.68.240.20), 30 hops max, 60 byte packets
1 209-162-130-1.cortland.com (209.162.130.1) 30.930 ms 31.492 ms 32.430 ms
2 ser-117-109.cortland.com (207.229.117.109) 33.397 ms 34.082 ms 34.780 ms
3 10gigabitethernet1-3.core1.sea1.he.net (206.81.80.40) 62.591 ms 63.785 ms 65.230 ms
4 10gigabitethernet9-1.core1.sjc2.he.net (72.52.92.157) 62.980 ms 10gigabitethernet1-2.core1.pdx1.he.net (72.52.92.10) 64.669 ms 10gigabitethernet9-1.core1.sjc2.he.net (72.52.92.157) 64.142 ms
5 10gigabitethernet3-2.core1.pao1.he.net (72.52.92.69) 60.417 ms 10gigabitethernet7-1.core1.sjc2.he.net (72.52.92.13) 59.658 ms 61.337 ms
6 * * *
7 * * *
.
.
.
30 * * *


ID: 1162109 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 13336
Credit: 154,761,920
RAC: 117,876
United Kingdom
Message 1162113 - Posted: 14 Oct 2011, 6:53:11 UTC

There are TWO threads at the top of NUMBER CRUNCHING, have a look there because one gives a workaround for the KNOWN problems within Hurricane Electric


Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

ID: 1162113 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 448
Credit: 241,828,576
RAC: 91,221
Australia
Message 1163625 - Posted: 19 Oct 2011, 6:38:50 UTC - in response to Message 1159571.  
Last modified: 19 Oct 2011, 6:39:41 UTC

Thanks to Matt and the team for keeping things going.

Meanwhile we turned off "resend lost results" until the smoke clears a bit.

Just wondering: how much extra load does lost results processing put on the servers? And is it possible to switch it back on yet?

Unlike some, I only keep a relatively small cache (1 day, or at least, what BOINC estimates to be 1 day), so my GPUs ran out of work. I sometimes shuffle work-units between CPUs/GPUs to maximise efficiency, but occasionally something goes wrong and I lose all modified work-units. Like this time.

Sure, my host can work on other projects and I have SETI work again now that the weekly maintenance seems to be over, but it bothers me that (after working hard to minimise my invalid/erroneous results count) some 130+ work-units won't be processed for at least 6 weeks. Doesn't having the work-units hang around on the server cause issues as well?

Anyway, I hope the 'resend lost results' processing can be switched on again soon, but if not, I'll understand.
Soli Deo Gloria

ID: 1163625 · Report as offensive
Fatsie

Send message
Joined: 21 Jul 99
Posts: 2
Credit: 1,876,634
RAC: 0
Belgium
Message 1164174 - Posted: 21 Oct 2011, 9:13:47 UTC

Thank you Matt !

Just wondering, have you considered moving storage intensive tasks to PCI based flash storage ? We have had great success using them for datawarehousing.


ID: 1164174 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Labyrinth of Light (Oct 06 2011)


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.