Message boards :
Technical News :
Grembo Zavia (Nov 08 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
As noted yesterday in my tech news item we had some database plans this morning. First a brief SETI@home project outage to clean up some logs. That was quick and harmless. We then kept the assimilators offline so we could add signal table indexes on the science database. Jeff's continuing work on developing/optimizing the signal candidate "nitpicker" - short for "near time persistency checker" i.e. the thing that continually looks for persistent, and therefore interesting, signals in our reduced data. The new indexes will be a great help. Of course, there were other things afoot to make the above a little more complicated. The science replica database server hung up again this morning. We found this was due to the automounter losing some important directories. Why the hell does this happen? The mounts time out naturally, but the automounter fails to remount them next time they are needed. Seems like a major linux bug to me, as it's happening on all our systems to some extent. I adjusted the automounter timeouts from 5 minutes to 30 days. Doing so already helped on one other test system. Meanwhile, back on the farm... we're sending out some junky data that overflows quickly so that's been swamping our servers with twice the usual load. Annoying, but we'll just let nature take its course and get through the bad spots. This has the positive by product of giving us a heavy-load test to see how our servers currently perform under increased strain... except with the simultaneous aforementioned index build the extra splitter activity was gumming everything up. We have the splitters offline as I write this. Hopefully we'll be able to get them back online before we run out of work. If not, then so be it. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
Thanks for keepin' us informed Matt . . . fingers *crossed* @ this moment re: Servers . . . Best of Luck Sir! Keep up the Great Work Berkeley. BOINC Wiki . . . Science Status Page . . . |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
UPDATE: The primary science database, without our permission, is sending over the new index we created to the replica. No big deal, except this is blocking the splitters which in turn means no new work is being created. Even more mysterious is that, after sending the new index, it began sending it *again*. Some kind of bug? Infinite loop? Yet another inexplicable database behavior that is noted as a feature lost in a giant pile of impenetrable documentation? You make the call! - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
perpetual eh . . . [edit]: koenji hyakkei pretty good Matt, pretty good (tight too!!!) BOINC Wiki . . . Science Status Page . . . |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
ANOTHER UPDATE: Bob just explained to me what's going on. It's not an infinite loop - just sending the index over in 4 pieces as it is broken up over 4 dbspaces. So in about 3 hours the dam should break... - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
I wonder if the NTPC app should run on an OLAP database replicata? Typically, major production systems run transactions on one database, and they replicate it over to another database that has a lot more indexes. That way, reporting/analysis does not affect the performance of transactions. Of course, this requires a lot more disk space and fiber to refresh the database periodically. In any event, while an index is being created, no rows can be modified in the tablespace for that specific table. Depending on the locking algorithm, all the database write requests will either queue up or timeout. Select statements should be fine. On a 170GB SQL Server database, I saw an index take 30 minutes to create. It's hard to compare the experience with SETI though. Trust me if that DB was on RAID 5 it would take 2 days to create that index. |
eaglescouter Send message Joined: 28 Dec 02 Posts: 162 Credit: 42,012,553 RAC: 0 |
ANOTHER UPDATE: Still no work available for my clients :( It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :) |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
As usual Matt, your keeping us in the loop with your continued technical posts is most appreciated! Gives us some background on the trials and tribulations as a counterpoint to the posts we see just complaining because things are not running smoothly. A lot of things to juggle and balance out to keep the whole thing working! "Time is simply the mechanism that keeps everything from happening all at once." |
Sirius B Send message Joined: 26 Dec 00 Posts: 24912 Credit: 3,081,182 RAC: 7 |
As usual Matt, your keeping us in the loop with your continued technical posts is most appreciated! Gives us some background on the trials and tribulations as a counterpoint to the posts we see just complaining because things are not running smoothly. A lot of things to juggle and balance out to keep the whole thing working! ...and instead of complaining, I use the downtime to shut down & service my machines. By the time I've done this, work should be available for them on reboot. Thanks for the insights Matt, keep 'em coming. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
ANOTHER UPDATE: Hmmm. It looks like you have well over 1000 unprocessed wu's on your machines, many unprocessed for over a month. Life would be happier for you if you 1) added a secondary project at, say, a 10% resource share to tide you over if there is a Seti glitch, 2) reduced your queue to a bare minimum, say 1 day's worth of work, so that you don't constipate the system for the rest of us (i.e. large caches really increase the project's latency for no real benefit), and 3) consumed a little less caffeine. Just a suggestion. |
Rev. Tim Olivera Send message Joined: 15 Jan 06 Posts: 20 Credit: 1,717,714 RAC: 0 |
It must be friday I see not one of our systems is doing any Seti work!! You can count on, when theirs a long week end, the Seti server well be down! Bet the farm and the house on that it's a sure thing! And I'm not a betting man but I would bet that!! The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but them $100,000 SUN systems with 40 Hard Drives and 10Gig of RAM you guys run, every other week end it's down?? whats up with that?? Oh well I told I would bitch if I came in and found our systems on and doing nothing so theirs my bitch. Rev. Tim Olivera As noted yesterday in my tech news item we had some database plans this morning. First a brief SETI@home project outage to clean up some logs. That was quick and harmless. We then kept the assimilators offline so we could add signal table indexes on the science database. Jeff's continuing work on developing/optimizing the signal candidate "nitpicker" - short for "near time persistency checker" i.e. the thing that continually looks for persistent, and therefore interesting, signals in our reduced data. The new indexes will be a great help. |
ML1 Send message Joined: 25 Nov 01 Posts: 21221 Credit: 7,508,002 RAC: 20 |
... The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but ... I guess there's only ever one user on there, and you have sold your soul to god? (Still, 10 years uptime is still very good!) Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
John Neale Send message Joined: 16 Mar 00 Posts: 634 Credit: 7,246,513 RAC: 9 |
It must be friday I see not one of our systems is doing any Seti work!! You can count on, when theirs a long week end, the Seti server well be down! Bet the farm and the house on that it's a sure thing! And I'm not a betting man but I would bet that!! The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but them $100,000 SUN systems with 40 Hard Drives and 10Gig of RAM you guys run, every other week end it's down?? whats up with that?? Oh well I told I would bitch if I came in and found our systems on and doing nothing so theirs my bitch. "Reverend", you sure know how to use irreverent language. And checking your posts, you have been known to complain before. Often. You might help your cause if you maintained a small cache of work units, and if you had a back-up project. Since your main interest is SETI, you could even consider SETI Beta as your backup; it does have work at the moment. (These points have been made by others. Often.) The SETI servers are not down at the moment. (I have managed to snare a few work units during the past 10 hours.) As explained by Matt, they're straining under a heavy load brought about by a convergence of adverse circumstances. They therefore cannot keep up with the demand. As Martin points out, demand is the essential difference between your setup and that of SETI@home. |
-=SuperG=- Send message Joined: 3 Apr 99 Posts: 63 Credit: 89,161,651 RAC: 23 |
It must be friday I see not one of our systems is doing any Seti work!! You can count on, when theirs a long week end, the Seti server well be down! Bet the farm and the house on that it's a sure thing! And I'm not a betting man but I would bet that!! The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but them $100,000 SUN systems with 40 Hard Drives and 10Gig of RAM you guys run, every other week end it's down?? whats up with that?? Oh well I told I would bitch if I came in and found our systems on and doing nothing so theirs my bitch. He has a point though. Seems to be the long weekends when these strange problems occur. Of course I can't prove that.. just seems that way. Anyways, there is no point in complaining about these little outages. The servers will be up when they are up. Some things just take time. I upgraded 1 of my PCs to a C2Q 6600 from a P4 2.53GHz. Then found out that the splitters were down.. lol.. ran out of cached WUs in less than 2 hours. Anyways, good luck getting that index copied over. Hopefully you can get those splitters back up for a few hours today so I can download another 4 days worth of WUs. Cheers Boinc Wiki "Great spirits have always encountered violent opposition from mediocre minds." -Albert Einstein |
Mentor397 Send message Joined: 16 May 99 Posts: 25 Credit: 6,794,344 RAC: 108 |
[quote You might help your cause if you maintained a small cache of work units, and if you had a back-up project. Since your main interest is SETI, you could even consider SETI Beta as your backup; it does have work at the moment. (These points have been made by others. Often.) The SETI servers are not down at the moment. (I have managed to snare a few work units during the past 10 hours.) As explained by Matt, they're straining under a heavy load brought about by a convergence of adverse circumstances. They therefore cannot keep up with the demand. As Martin points out, demand is the essential difference between your setup and that of SETI@home.[/quote] He has a point though. Seems to be the long weekends when these strange problems occur. Of course I can't prove that.. just seems that way. Anyways, there is no point in complaining about these little outages. The servers will be up when they are up. Some things just take time. I upgraded 1 of my PCs to a C2Q 6600 from a P4 2.53GHz. Then found out that the splitters were down.. lol.. ran out of cached WUs in less than 2 hours. Anyways, good luck getting that index copied over. Hopefully you can get those splitters back up for a few hours today so I can download another 4 days worth of WUs. [/quote] I was lucky. I ran my system out of work deliberately two days ago to make sure my upgrade to the latest boinc was flawless (prolly unnecessary, but I've had a bad experience once). I'd just manage to get the cache full a day before this came up, so I should be good. I wonder about the people though who complain the most. It's true, I am nearly religiously against other projects (haven't found one I liked, am not going to join another 'just' to keep busy), but why complain about the project not working flawlessly. It's hard to keep things together on the edge of technology, with a very limited budget, and very limited time. What do you expect? Even the computers on the space station go down once in a while. - Jim |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
What ever, as my kids might say... ... but if you want to have some fun check out scarecrow's graphs. It is like a roller coaster ride! It appears CentralCommand has released the blockage. |
-=SuperG=- Send message Joined: 3 Apr 99 Posts: 63 Credit: 89,161,651 RAC: 23 |
What ever, as my kids might say... Are you sure it was CentralCommand that released the blockage?? I hear these people guarantee their work.. :P Sry.. just my attempt at humour.. :) Boinc Wiki "Great spirits have always encountered violent opposition from mediocre minds." -Albert Einstein |
Keck_Komputers Send message Joined: 4 Jul 99 Posts: 1575 Credit: 4,152,111 RAC: 1 |
You might help your cause if you maintained a small cache of work units, and if you had a back-up project. Since your main interest is SETI, you could even consider SETI Beta as your backup; it does have work at the moment. (These points have been made by others. Often.) SETI beta is probably not a good choice for a backup project if your main project is SETI. Both servers tend to be down at the same times, even thought that is not the case this time. BOINC WIKI BOINCing since 2002/12/8 |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Unfortunately, his point seems to be that the project does deliver the level of reliability that they promise. There is a reason BOINC can carry a multi-day cache: projects run on a shoestring with hand-me-down servers are going to break from time to time, and they're going to stay down for extended periods when the failures are big. So, they (BOINC) built a client that can keep crunching through outages. As others have suggested, they also built a client that can attach to multiple projects, so if you are passionate about crunching you can crunch more than one project. Unfortunately, or Rev. Tim seems to be passionate about not tuning his installations to make sure he doesn't run out of work. He'd rather have the project spend money they don't have on new servers (sure, the SUN machines were expensive, but they were bought long ago), and on staff to be on-call 24/7. The SETI servers don't need 99.999% reliability to keep us happily crunching away. 80% is probably good enough if we just do a little BOINC tuning. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.