Message boards :
Technical News :
Drive (Aug 11 2011)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Okay, we didn't fix the HE connections problem, but are getting closer to understanding what's going on. Basically our router down at the PAIX keeps getting a corrupted routing table. We reboot it, which flushes the pipes, but this only "evolves" the issue: people who couldn't connect before now can, but people who could connect before now cannot, or people don't see any change in behavior. This is likely due to a mixture of: (a) low memory on this old router, (b) our ridiculously high, constant rate of traffic, and perhaps also (c) a broken default route. We're looking into (c) at the moment, and solving (a) may be far too painful (we don't have easy access to this router, which is a donated box mounted in donated rack space 30 miles away). So I've been arguing that we need to deal with (b) first, i.e. reduce our rate of traffic. Part of reducing our traffic means breaking open our splitter code. Basically, one of the seven beams down at Arecibo has been busted for a while, thus causing a much-higher-than-normal rate of noisy workunits. We've come up with a way to detect busted beam automatically in the splitter (so it won't bother creating workunits for said beam) but this means cracking open the splitter. This is a delicate procedure, as you can really screw things up if the splitter is broken - and usually needs oversight from Eric who is the only one qualified to bless any changes to it. Of course, Eric has been busy with a zillion other things, so this kept getting kicked down the street. But at this point we all feel this needs to happen, which should reduce general traffic loads, and maybe clear up other problems - like our seemingly overworked router facing HE unable to handle the load. Of course, it doesn't help we're all bogged down in a wave of grant proposals and conferences, and I'm having to write a bunch of notes as part of a major brain dump since I'm leaving for two months (starting two weeks from now). I'll be on the road (all over the Eastern North America in September, all over Europe in October) playing keyboards/guitar with the band Secret Chiefs 3. It's been a crazy month thus far getting ready for that. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Another idea is to penalise hosts more heavily that are producing invalid work on an app version, Hosts with Fermi GPU's running non-Fermi apps springs to mind, Claggy |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Thanks for the update Matt, and thanks to the SETI@home crew for all your long hours of hard work ... Best Wishes Byron. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Thanks for the update Matt. Maybe we should shoot for replacing the on campus routers as well. In the event high bandwidth could be utilized. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 31015 Credit: 53,134,872 RAC: 32 |
Thanks for the update. While cleaning up the work units is important, that is at best a temporary patch. You said donated rack space and router. Are you saying if the donated router was replaced the rack space donation would go away? Or is the rack space such that no other box will fit? Just trying to get the political angle if any sorted. Another question that comes to mind is the box on the latest version of software from the manufacturer? Perhaps an update for this has been issued and it needs to be flashed. [edit]Isn't that box a gig router? Shouldn't it be able to handle 10X the traffic (routing table) it is now getting? RAM failure? |
Jeff Mercer Send message Joined: 14 Aug 08 Posts: 90 Credit: 162,139 RAC: 0 |
Once again, Thanks For The Update Matt ! |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I'll be on the road (all over the Eastern North America in September, all over Europe in October) playing keyboards/guitar with the band Secret Chiefs 3. It's been a crazy month thus far getting ready for that. Les Claypool? Faith No More? Who know you were so cool Matt? Let me know when you make it to Ozzy Osbourne and I'll book the first flight out there and be your groupie, carrying your instruments. ;-D |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Part of reducing our traffic means breaking open our splitter code. Basically, one of the seven beams down at Arecibo has been busted for a while, thus causing a much-higher-than-normal rate of noisy workunits. How many units per 1,000 (or 10,000 etc) are noisy? I myself haven't seen many noisy Work Units- a couple of times i've had a group of 5-8 that only ran for 10-20secs before finishing- but this is out of 2,500-3,000 Work Units in the cache at that time. What i have been seeing is a lot more shorties than in the past- ie WUs that take only 3 min or less to run on my video card. The work mix, apart from a burst of almost nothing but shorties a week or two a go, appears to be a mix of some longer running WUs but a much higher percentage of shorter running WUs, with very little middle of the range WUs at all (although over the last few days there have been a few more mid runtime WUs than there have been). Also the fact that my caches have only been full a couple of times over the last month would also be contributing to the extended periods of high traffic- the frequent glitches that have been bringing the system down, or limiting WU production, or limiting it's allocation mean that the caches just haven't had a chance to re-fill. Many of the faster systems would be having the same problem- the cache just isn't filling up as something occurs that stops it from getting work- but it's still processing work at a significant rate. So when Seti comes up again, it's wiped out the gains it made in builkding up it's cache since the last outage. I suspect that if the system was able to remain up without any slowdowns in work production or allocation between weekly outages then after 2 weeks the traffic would drop off considerably as all the faster machines would have finally been able to fill their caches. Grant Darwin NT |
PKnowles Send message Joined: 24 Apr 10 Posts: 49 Credit: 8,347,432 RAC: 0 |
What kind of router are we speaking of here? What is in place now? What might a good replacement be? |
edjcox Send message Joined: 20 May 99 Posts: 96 Credit: 5,878,353 RAC: 0 |
Do they have any real IT engineers working on this effort? It seems I see a pattern of hit/miss efforts to fix problems that are never really pinned down. Before the rightous all chime in here let's seriously look at what's going on. There are a lot of good companies who would contribute hardware and maybe even communications hardware and bandwidth. It just seems the people within the project while doing yeomans work for little reward are simply not getting the support, funding, and outside helpt this project deserves. We seem to have but a few who even bother to communicate what's going down and that will shortly meet a two month hiatus as Matt hits the road for his music tour. Where are the principles pn this team and why can't they manage to set up a regular weekly update and a regular status update and troubleshooting direction. A lot of us out here are available for more that CPU cycles if we're comunicated with. We might even be able to dissect the system if someone would take the time and block chart a schematic and layout of the systems and the hardware. Those of you that are offended by my viewpoints needs not throw electrons as frankly I am not a happy SETI at Home camper anymore. I'm aware of the limited manpower, limited funds, and so on. I also expect that this project get handled by people with their best work. A rime example is the dated information on the website and how its not current at all. What for example has come from the Data acquired afew weeks back. HAs it even been analysed at all? How about some feedback on the front page... Sorry, with my three PC's working on SETI and Einstein I am getting a better feel looking for gravitational waves than I am looking for evidence of communications between ET's. Never engage stupid people at their level, they then have the home court advantage..... |
rob smith Send message Joined: 7 Mar 03 Posts: 22541 Credit: 416,307,556 RAC: 380 |
Matt, Thanks for all you've been doing to keep S@H running. Europe in October - must keep my eyes open and try to get my ears filled. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Francis Noel Send message Joined: 30 Aug 05 Posts: 452 Credit: 142,832,523 RAC: 94 |
Touring with SC3 ! Dude... What can I say ? Congratulations on having a kick-ass life, Matt. I am genuinely happy for you and I truly hope you enjoy the experience. I love it when Good Things happen to Good People. This is beautiful. mambo |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I don't know if it's even possible, but during the software blanking stage of cleaning the tapes up before splitting them, is it even possible to prevent APs that are 100% blanked from even being sent out? That's 16MB of data transfer that can be saved for every WU affected by it. My machines are nowhere near power crunchers, but I still pick up a handful of those WUs back-to-back every now and then. I guess something along the lines of a CSV file that says blanking starts at this offset and lasts for this many bytes, so when the splitters come along, they look at the CSV table and see what sections of the tape to skip entirely (obviously only sections where a whole WU will be 100% blanked). Just a thought. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Here's an example of an (Anonymous) host which should be penalised for producing invalid results: hostid=3378825 It has a Phenom II X6 1055T CPU and three Cayman GPU's, most of it's MB CPU tasks are inconclusive/invalid, as are it's ATI Astropulse GPU tasks, It has 765 AP tasks in it's list, 24 of those are valid, 60 invalid, and 673 pending tasks most of which only took a few hundred seconds if that, and it still has a Max tasks per day of 100 for ATI AP, The counts for CPU Multibeam are hardly any better, and that also has a Max tasks per day of 100, Claggy |
Hauser Investments, inc Send message Joined: 22 Nov 02 Posts: 1 Credit: 2,794,720 RAC: 0 |
ok |
Jeff Mercer Send message Joined: 14 Aug 08 Posts: 90 Credit: 162,139 RAC: 0 |
Hello to all. Since the project is down at the moment, and I am out of work, I think I'll just back out of the program for a short time. Got a lot of things going on here at the moment, and so, I'm shutting down for a while. All work units that I had are now, "CRUNCHED" and sent back in. I'll be back though ! Thanks ! |
Jason Safoutin Send message Joined: 8 Sep 05 Posts: 1386 Credit: 200,389 RAC: 0 |
I have been crunching with SETI@home on and off for a while now. Whenever there was major issues that brought the project down, these guys (Matt et. al.) always pulled through to get the project up and running again, whether it be sooner or later. I have faith that they will pull through this problem as well. It's a shame there are not more donations both financially and in terms of hardware. So sad to see a good program go underfunded. If I ever run into a lot of money, you can bet I would make a large donation to the project. "By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3 |
Jeff Mercer Send message Joined: 14 Aug 08 Posts: 90 Credit: 162,139 RAC: 0 |
Well Said....... Well said INDEED !!!! |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
A little off topic, but not really: Matt seems like a great guy but he isn't the sole technologist on the seti project (ok, maybe he is the soul one, given his musical talents). Is there no one else willing to keep the message boards up to date with more or less current information about the project? Doing so keeps the project alive and meaningful to the volunteers. I fear that we will go through a drought of information soon with Matt away, and changes when he returns. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.