Message boards :
Number crunching :
Splitter off? WHY?
Message board moderation
Author | Message |
---|---|
Josh Linscott Send message Joined: 10 Sep 03 Posts: 121 Credit: 32,250 RAC: 0 |
Server status Program Host Status data-driven web pages klaatu Running scheduler kryten Running feeder kryten Running file_deleter koloth Running transitioner1 klaatu Running transitioner2 klaatu Running transitioner3 koloth Running transitioner4 koloth Running sah_validate1 koloth Running sah_validate2 koloth Running sah_validate3 kosh Running sah_validate4 kosh Running sah_assimilator galileo Running sah_splitter galileo Running sah_splitter2 milkyway Not running sah_splitter3 philmor Running <img src="http://boinc.mundayweb.com/one/stats.php?userID=1052&prj=1&trans=off"> <img src="http://seti.mundayweb.com/stats.php?userID=749&trans=off"> |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
From the very same page you saw that told you the splitters were off: Sah_splitter: Reads tapes (or tape images on disk) containing raw telescope data and creates workunits for the BOINC/SETI@home clients. At least one needs to be running to produce work, and that's usually enough. If some are not running, it's probably because we are doing fine with less, and are keeping excess production down to prevent the upload/download disks from getting hit too hard. > Server status > Program Host Status > data-driven web pages klaatu Running > scheduler kryten Running > feeder kryten Running > file_deleter koloth Running > transitioner1 klaatu Running > transitioner2 klaatu Running > transitioner3 koloth Running > transitioner4 koloth Running > sah_validate1 koloth Running > sah_validate2 koloth Running > sah_validate3 kosh Running > sah_validate4 kosh Running > sah_assimilator galileo Running > sah_splitter galileo Running > sah_splitter2 milkyway Not running > sah_splitter3 philmor Running > > |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
> Server status > Program Host Status > data-driven web pages klaatu Running > sah_splitter2 milkyway Not running > sah_splitter3 philmor Running > > From the same page.... The splitters read tapes of aproximately 33000 blocks. milkyway is at block 33070 (or so).... It's reached the end of the tape and waiting for another.... (Educated guess :-) ) /edit Oops wasn't that :( [Re-education required]- it's started running again and moved on to 13ja05ab milkyway 33121 Must have hung/stopped somehow and it's been kicked-off again. It will still reach end of tape soonish.... |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> > Server status > > Program Host Status > > data-driven web pages klaatu Running > > sah_splitter2 milkyway Not running > > sah_splitter3 philmor Running > > > > > > From the same page.... > > The splitters read tapes of aproximately 33000 blocks. milkyway is at block > 33070 (or so).... It's reached the end of the tape and waiting for > another.... > > (Educated guess :-) ) Actually, the same page says that they don't run all the splitters all the time, they turn them off and on to stay a little bit ahead of the demand for work. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
/tmp filled up on milkyway. Had to stop splitter to clear it up. Then restarted it. Really no big deal. FYI, we don't need to "switch" tapes. The tape images are already read onto disk in advance and the splitter finishes one file and quickly starts chewing on the next unread file. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
> /tmp filled up on milkyway. Had to stop splitter to clear it up. Then > restarted it. Really no big deal. > > FYI, we don't need to "switch" tapes. The tape images are already read onto > disk in advance and the splitter finishes one file and quickly starts chewing > on the next unread file. > > - Matt > Learn something new every day, Thanks. BTW, on the classic page it shows about 9 tapes currently being split. Does this mean that classic has 9 splitter processes, or is that therefore a list of the disk files being/waiting to be processed? Put another way, when classic shuts down, will BOINC need/have more splitters? [Assumes some/most of the classic hardware is ear-marked for transfer to move to BOINC?] |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
The 9 tapes on the classic status page refer to the disk images on line that are ready to be split (or have been split, they just haven't been deleted yet). The machines used to split workunits for classic are the same as BOINC - so they are competing for CPU cycles (and internal bandwidth for that matter). So, in essence, we will gain more splitter power when classic gets shut off. But.. we will want to add more CPUs to the fray soon enough. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
AdRoc Send message Joined: 3 Apr 99 Posts: 5 Credit: 10,443 RAC: 0 |
Why aren't the tapes split in order. For example, I see thatthere are tapes from as far back as Dec 19th and then seem to jump to various non-sequential dates. 19 December 2004 (ab) 25 December 2004 (aa) 27 December 2004 (aa) 27 December 2004 (ab) 11 January 2005 (ab) 12 January 2005 (aa) 13 January 2005 (ab) 16 January 2005 (aa) 18 January 2005 (aa) Why not split like below? 19 December 2004 (aa) 20 December 2004 (aa) 21 December 2004 (aa) 22 December 2004 (aa) . . . Just curious. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
> Why aren't the tapes split in order There's is a little rhyme to this reason. I have boxes of tapes (about half a year's worth of backlogged data) next to my desk. If the splitter stacker needs to be filled again, I grab 9 tapes at semi-random, do some preliminary data checking, and then throw them in the stacker. BUT! If I notice I have a bunch of tapes all recorded on sequential days, I try to mix it up a bit. Why? Well.. if one of the tapes has a bunch of RFI on it, there's a good chance the tape on the next day has some RFI as well. RFI (as in radio frequency interference) tends to make workunits full of junk, so they only get crunched on a few minutes before the client throws them back at us. This causes an increase in our server activity. Not much, but still an increase. As well, it makes our users upset. So, by mixing up the tape chronologically, this vastly reduces the chance that we'll be making RFI workunits for days on end - instead we'll send out healthy workunits between the unhealthy ones so we won't be DOS'ing ourselves forever and users won't be stuck chewing on several bad workunits in a row. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dave Mickey Send message Joined: 19 Oct 99 Posts: 178 Credit: 11,122,965 RAC: 0 |
Ahhhh, very good! I take 2 good points from this: 1) That the sah "machine" that we might tend to imagine is in fact composed of such things as Matt grabbing a bunch of tapes from a box by his desk, pretty much at random 2) There is, as yet, value in having humans use their brains in deciding what the computers should do...... getting too many RFI-laced units in a row is indeed a bummer after very carefully adjusting cache levels. Carry (and crunch) on! Dave |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Funny thing.. Turns out the splitters on milkyway and philmor haven't been working since we came up from the power outage yesterday. So we've actually been getting by working with just one splitter (on galileo). So.. we're going to have to resplit some tapes. No big deal - I still have yet to figure out what the fallout will be, but the worst is that users will be getting spurious "file not found" type errors when trying to download work until this clears up, or users will get workunits with no work in them and return them right away. Either case amounts to more server load for a while, but that's about it. Shouldn't mess with credit or science. At any rate, *sigh*. I'm posting this here just so people hopefully won't freak out if they see tapes being resplit on the server pages. Maybe I'll cough up a tech news item once I have a better idea what's going to happen. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Josh Linscott Send message Joined: 10 Sep 03 Posts: 121 Credit: 32,250 RAC: 0 |
Ok, thank you for the info! <img src="http://boinc.mundayweb.com/one/stats.php?userID=1052&prj=1&trans=off"> <img src="http://seti.mundayweb.com/stats.php?userID=749&trans=off"> |
Alan M. MacRobert Send message Joined: 10 Apr 99 Posts: 13 Credit: 4,385,900 RAC: 1 |
> I have boxes of tapes (about half a > year's worth of backlogged data)... Matt, thanks for the explanation. A question: If there's half a year of backlogged unsplit data, while S@H users are recrunching the same work units more times than is useful, why not split the tapes, clear out the backlog, and put more of us crunchers to good use? I've wondered about this for years. Are the splitters a bottleneck? Dave Anderson once said no -- but then what is? Thanks again, Alan MacRobert |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
> Matt, thanks for the explanation. A question: If there's half a year of > backlogged unsplit data, while S@H users are recrunching the same work units > more times than is useful, why not split the tapes, clear out the backlog, and > put more of us crunchers to good use? I've wondered about this for years. Are > the splitters a bottleneck? Dave Anderson once said no -- but then what is? Good question. Like all things around here there are a tangled web of dependencies that are usually too difficult to fully explain. But here's the gist of it. First and foremost, the data recorder down at Arecibo is working 24/7. In case you haven't noticed, our data servers are not. Every minute our servers are off and people can't get data, that's one minute we fall behind on splitting. Over the years we've been off for many, many days (last weekend for 3, that time those people stole copper wire down the hill and broke our internet connection for a week, not to mention weekly 30-60 outages for maintenance - it all adds up). Of course, there are many times we need to do database repair. These outages, to fix broken indexes or corrupt tables or simply add rows to tables, etc., result in a "behind the scenes" outage. All the data servers are up serving data, but we are unable to split data - so we fall behind. Yes, this means users get overly redundant data at this time - we decided more people would rather have data to chew on than another full outage which typically results in message board posts of the sort "Berkeley is down again - those idiots have no idea what they are doing, blah blah blah". There are many other pieces to the puzzle. For instance, there's a "garbage collector" in SETI classic which fails from time to time (doesn't restart after a power outage, for example) which causes a backlog in the workunit queue and therefore the splitter idle until the queue clears. Even if things are going well, demand in SETI@home classic is so high that we can't split fast enough, which means the redundancy level is more than scientifically necessary, but moreso means that our servers in general are hit with a greater load - more results coming in that need to be validated and shoveled into the database - and if the database is busy the splitters might run slower, etc. This all gets fixed in BOINC - we only send out as much work we need to, and beyond that we encourage users to run other projects to keep their CPUs busy. DESPITE ALL THIS there have been many times when every system was working properly during the course of SETI@home that we caught up on splitting. As well, there are times when Arecibo is not recording as well, or the data is too littered with RFI we don't split it. So it's all a game of give and take. An important fact to remember (or learn) is that we started recording SETI@home data in December of 1998, and the project started in May 1999, so we had a six month backlog at the onset. Okay.. I better finish my breakfast and get up to the lab. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Mike Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80 |
Hi Thanks for this information Matt. greetz from Germany Mike With each crime and every kindness we birth our future. |
D.J. Schweitz Send message Joined: 29 Oct 02 Posts: 157 Credit: 871,078 RAC: 0 |
Matt, a question for you from someone on my team. Figured since you were already doing such a great job of answering other WU questions I would jump in. Member has 130 classic WU's left to crunch, and wonders of the "ethics" of dumping them, to un-install classic and install BOINC. Ive heard that some classic WU's get sent out as many as 40 times, where with BOINC its more like 5 or 6. Will his dumping classic WU's compromise the science of those WU's? Are those same classic WU's sent out as BOINC WU's? Did you enjoy your breakfast? Click below for our Team Website |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
> Will his dumping classic WU's compromise the science of those WU's? > Are those same classic WU's sent out as BOINC WU's? At this point in time (and for a while) the workunits sent to classic and BOINC have been the same (good for validation that classic and BOINC return the same scientific results). So there is no harm in science for dumping classic workunits. Hope that encourages your team member(s) to move to BOINC sooner. > Did you enjoy your breakfast? It was adequate. Will have chinese food for lunch, and probably something light for dinner so I'm not burping into the microphone at my gig tonight. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
D.J. Schweitz Send message Joined: 29 Oct 02 Posts: 157 Credit: 871,078 RAC: 0 |
>> Will his dumping classic WU's compromise the science of those WU's? > Are those same classic WU's sent out as BOINC WU's? At this point in time (and for a while) the workunits sent to classic and BOINC have been the same (good for validation that classic and BOINC return the same scientific results). So there is no harm in science for dumping classic workunits. Hope that encourages your team member(s) to move to BOINC sooner. > Did you enjoy your breakfast? It was adequate. Will have chinese food for lunch, and probably something light for dinner so I'm not burping into the microphone at my gig tonight. Thanks for the quick reply Matt, am headed out for sushi myself :-P Click below for our Team Website |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.