Drive (Aug 11 2011)

Message boards : Technical News : Drive (Aug 11 2011)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1139085 - Posted: 11 Aug 2011, 23:07:27 UTC

Okay, we didn't fix the HE connections problem, but are getting closer to understanding what's going on. Basically our router down at the PAIX keeps getting a corrupted routing table. We reboot it, which flushes the pipes, but this only "evolves" the issue: people who couldn't connect before now can, but people who could connect before now cannot, or people don't see any change in behavior. This is likely due to a mixture of: (a) low memory on this old router, (b) our ridiculously high, constant rate of traffic, and perhaps also (c) a broken default route.

We're looking into (c) at the moment, and solving (a) may be far too painful (we don't have easy access to this router, which is a donated box mounted in donated rack space 30 miles away). So I've been arguing that we need to deal with (b) first, i.e. reduce our rate of traffic.

Part of reducing our traffic means breaking open our splitter code. Basically, one of the seven beams down at Arecibo has been busted for a while, thus causing a much-higher-than-normal rate of noisy workunits. We've come up with a way to detect busted beam automatically in the splitter (so it won't bother creating workunits for said beam) but this means cracking open the splitter. This is a delicate procedure, as you can really screw things up if the splitter is broken - and usually needs oversight from Eric who is the only one qualified to bless any changes to it. Of course, Eric has been busy with a zillion other things, so this kept getting kicked down the street. But at this point we all feel this needs to happen, which should reduce general traffic loads, and maybe clear up other problems - like our seemingly overworked router facing HE unable to handle the load.

Of course, it doesn't help we're all bogged down in a wave of grant proposals and conferences, and I'm having to write a bunch of notes as part of a major brain dump since I'm leaving for two months (starting two weeks from now). I'll be on the road (all over the Eastern North America in September, all over Europe in October) playing keyboards/guitar with the band Secret Chiefs 3. It's been a crazy month thus far getting ready for that.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1139085 · Report as offensive
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1139093 - Posted: 11 Aug 2011, 23:24:26 UTC - in response to Message 1139085.  
Last modified: 11 Aug 2011, 23:33:38 UTC

Thanks for the update Matt,

Another idea is to penalise hosts more heavily that are producing invalid work on an app version, Hosts with Fermi GPU's running non-Fermi apps springs to mind,

ID: 1139093 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Message 1139094 - Posted: 11 Aug 2011, 23:32:04 UTC - in response to Message 1139085.  

Thanks for the update Matt, and thanks to the SETI@home crew for all your long hours of hard work ... Best Wishes Byron.
ID: 1139094 · Report as offensive
Profile arkayn
Volunteer tester

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1139104 - Posted: 11 Aug 2011, 23:54:02 UTC

Thanks for the update Matt.

Maybe we should start a drive to fund a new replacement router.

ID: 1139104 · Report as offensive
Profile HAL9000
Volunteer tester

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1139123 - Posted: 12 Aug 2011, 0:34:36 UTC - in response to Message 1139104.  

Thanks for the update Matt.

Maybe we should start a drive to fund a new replacement router.

Maybe we should shoot for replacing the on campus routers as well. In the event high bandwidth could be utilized.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=]BP6/VP6 User Group[
ID: 1139123 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 25 Dec 00
Posts: 31115
Credit: 53,134,872
RAC: 32
United States
Message 1139139 - Posted: 12 Aug 2011, 1:06:03 UTC
Last modified: 12 Aug 2011, 1:10:13 UTC

Thanks for the update.

While cleaning up the work units is important, that is at best a temporary patch.

You said donated rack space and router. Are you saying if the donated router was replaced the rack space donation would go away? Or is the rack space such that no other box will fit? Just trying to get the political angle if any sorted.

Another question that comes to mind is the box on the latest version of software from the manufacturer? Perhaps an update for this has been issued and it needs to be flashed.

[edit]Isn't that box a gig router? Shouldn't it be able to handle 10X the traffic (routing table) it is now getting? RAM failure?
ID: 1139139 · Report as offensive
Profile Jeff Mercer

Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1139230 - Posted: 12 Aug 2011, 3:39:52 UTC

Once again, Thanks For The Update Matt !
ID: 1139230 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1139235 - Posted: 12 Aug 2011, 3:43:43 UTC - in response to Message 1139085.  

I'll be on the road (all over the Eastern North America in September, all over Europe in October) playing keyboards/guitar with the band Secret Chiefs 3. It's been a crazy month thus far getting ready for that.

Les Claypool? Faith No More? Who know you were so cool Matt? Let me know when you make it to Ozzy Osbourne and I'll book the first flight out there and be your groupie, carrying your instruments. ;-D
ID: 1139235 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13882
Credit: 208,696,464
RAC: 304
Message 1139276 - Posted: 12 Aug 2011, 5:10:50 UTC - in response to Message 1139085.  

Part of reducing our traffic means breaking open our splitter code. Basically, one of the seven beams down at Arecibo has been busted for a while, thus causing a much-higher-than-normal rate of noisy workunits.

How many units per 1,000 (or 10,000 etc) are noisy?
I myself haven't seen many noisy Work Units- a couple of times i've had a group of 5-8 that only ran for 10-20secs before finishing- but this is out of 2,500-3,000 Work Units in the cache at that time. What i have been seeing is a lot more shorties than in the past- ie WUs that take only 3 min or less to run on my video card. The work mix, apart from a burst of almost nothing but shorties a week or two a go, appears to be a mix of some longer running WUs but a much higher percentage of shorter running WUs, with very little middle of the range WUs at all (although over the last few days there have been a few more mid runtime WUs than there have been).

Also the fact that my caches have only been full a couple of times over the last month would also be contributing to the extended periods of high traffic- the frequent glitches that have been bringing the system down, or limiting WU production, or limiting it's allocation mean that the caches just haven't had a chance to re-fill.
Many of the faster systems would be having the same problem- the cache just isn't filling up as something occurs that stops it from getting work- but it's still processing work at a significant rate. So when Seti comes up again, it's wiped out the gains it made in builkding up it's cache since the last outage.

I suspect that if the system was able to remain up without any slowdowns in work production or allocation between weekly outages then after 2 weeks the traffic would drop off considerably as all the faster machines would have finally been able to fill their caches.
Darwin NT
ID: 1139276 · Report as offensive

Send message
Joined: 24 Apr 10
Posts: 49
Credit: 8,347,432
RAC: 0
United States
Message 1140675 - Posted: 15 Aug 2011, 3:23:06 UTC

What kind of router are we speaking of here? What is in place now? What might a good replacement be?
ID: 1140675 · Report as offensive
Profile edjcox

Send message
Joined: 20 May 99
Posts: 96
Credit: 5,878,353
RAC: 0
United States
Message 1140698 - Posted: 15 Aug 2011, 4:59:42 UTC

Do they have any real IT engineers working on this effort?

It seems I see a pattern of hit/miss efforts to fix problems that are never really pinned down.

Before the rightous all chime in here let's seriously look at what's going on. There are a lot of good companies who would contribute hardware and maybe even communications hardware and bandwidth. It just seems the people within the project while doing yeomans work for little reward are simply not getting the support, funding, and outside helpt this project deserves.

We seem to have but a few who even bother to communicate what's going down and that will shortly meet a two month hiatus as Matt hits the road for his music tour.

Where are the principles pn this team and why can't they manage to set up a regular weekly update and a regular status update and troubleshooting direction.

A lot of us out here are available for more that CPU cycles if we're comunicated with. We might even be able to dissect the system if someone would take the time and block chart a schematic and layout of the systems and the hardware.

Those of you that are offended by my viewpoints needs not throw electrons as frankly I am not a happy SETI at Home camper anymore. I'm aware of the limited manpower, limited funds, and so on. I also expect that this project get handled by people with their best work.

A rime example is the dated information on the website and how its not current at all.

What for example has come from the Data acquired afew weeks back. HAs it even been analysed at all? How about some feedback on the front page...

Sorry, with my three PC's working on SETI and Einstein I am getting a better feel looking for gravitational waves than I am looking for evidence of communications between ET's.

Never engage stupid people at their level, they then have the home court advantage.....
ID: 1140698 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22652
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1140843 - Posted: 15 Aug 2011, 18:08:03 UTC

Matt, Thanks for all you've been doing to keep S@H running.

Europe in October - must keep my eyes open and try to get my ears filled.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1140843 · Report as offensive
Profile Francis Noel

Send message
Joined: 30 Aug 05
Posts: 452
Credit: 142,832,523
RAC: 94
Message 1141266 - Posted: 16 Aug 2011, 20:43:06 UTC
Last modified: 16 Aug 2011, 20:45:05 UTC

Touring with SC3 !

What can I say ? Congratulations on having a kick-ass life, Matt. I am genuinely happy for you and I truly hope you enjoy the experience.

I love it when Good Things happen to Good People.

This is beautiful.
ID: 1141266 · Report as offensive

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1141275 - Posted: 16 Aug 2011, 21:14:56 UTC

I don't know if it's even possible, but during the software blanking stage of cleaning the tapes up before splitting them, is it even possible to prevent APs that are 100% blanked from even being sent out? That's 16MB of data transfer that can be saved for every WU affected by it. My machines are nowhere near power crunchers, but I still pick up a handful of those WUs back-to-back every now and then.

I guess something along the lines of a CSV file that says blanking starts at this offset and lasts for this many bytes, so when the splitters come along, they look at the CSV table and see what sections of the tape to skip entirely (obviously only sections where a whole WU will be 100% blanked).

Just a thought.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1141275 · Report as offensive
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1143602 - Posted: 21 Aug 2011, 21:40:12 UTC - in response to Message 1139093.  
Last modified: 21 Aug 2011, 21:40:52 UTC

Thanks for the update Matt,

Another idea is to penalise hosts more heavily that are producing invalid work on an app version, Hosts with Fermi GPU's running non-Fermi apps springs to mind,


Here's an example of an (Anonymous) host which should be penalised for producing invalid results:


It has a Phenom II X6 1055T CPU and three Cayman GPU's, most of it's MB CPU tasks are inconclusive/invalid, as are it's ATI Astropulse GPU tasks,
It has 765 AP tasks in it's list, 24 of those are valid, 60 invalid, and 673 pending tasks most of which only took a few hundred seconds if that, and it still has a Max tasks per day of 100 for ATI AP,
The counts for CPU Multibeam are hardly any better, and that also has a Max tasks per day of 100,

ID: 1143602 · Report as offensive
Profile Hauser Investments, inc

Send message
Joined: 22 Nov 02
Posts: 1
Credit: 2,794,720
RAC: 0
United States
Message 1143853 - Posted: 22 Aug 2011, 15:45:32 UTC

ID: 1143853 · Report as offensive
Profile Jeff Mercer

Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1144112 - Posted: 22 Aug 2011, 23:47:18 UTC

Hello to all. Since the project is down at the moment, and I am out of work, I think I'll just back out of the program for a short time. Got a lot of things going on here at the moment, and so, I'm shutting down for a while. All work units that I had are now, "CRUNCHED" and sent back in. I'll be back though ! Thanks !
ID: 1144112 · Report as offensive
Profile Jason Safoutin
Volunteer tester

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 1144152 - Posted: 23 Aug 2011, 2:12:22 UTC

I have been crunching with SETI@home on and off for a while now. Whenever there was major issues that brought the project down, these guys (Matt et. al.) always pulled through to get the project up and running again, whether it be sooner or later. I have faith that they will pull through this problem as well.

It's a shame there are not more donations both financially and in terms of hardware. So sad to see a good program go underfunded. If I ever run into a lot of money, you can bet I would make a large donation to the project.
"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 1144152 · Report as offensive
Profile Jeff Mercer

Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1144168 - Posted: 23 Aug 2011, 2:48:01 UTC - in response to Message 1144152.  

Well Said....... Well said INDEED !!!!
ID: 1144168 · Report as offensive

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 1144305 - Posted: 23 Aug 2011, 14:48:57 UTC

A little off topic, but not really: Matt seems like a great guy but he isn't the sole technologist on the seti project (ok, maybe he is the soul one, given his musical talents). Is there no one else willing to keep the message boards up to date with more or less current information about the project? Doing so keeps the project alive and meaningful to the volunteers. I fear that we will go through a drought of information soon with Matt away, and changes when he returns.
ID: 1144305 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Technical News : Drive (Aug 11 2011)

©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.