Grembo Zavia (Nov 08 2007)

Message boards : Technical News : Grembo Zavia (Nov 08 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 674281 - Posted: 8 Nov 2007, 21:25:23 UTC

As noted yesterday in my tech news item we had some database plans this morning. First a brief SETI@home project outage to clean up some logs. That was quick and harmless. We then kept the assimilators offline so we could add signal table indexes on the science database. Jeff's continuing work on developing/optimizing the signal candidate "nitpicker" - short for "near time persistency checker" i.e. the thing that continually looks for persistent, and therefore interesting, signals in our reduced data. The new indexes will be a great help.

Of course, there were other things afoot to make the above a little more complicated. The science replica database server hung up again this morning. We found this was due to the automounter losing some important directories. Why the hell does this happen? The mounts time out naturally, but the automounter fails to remount them next time they are needed. Seems like a major linux bug to me, as it's happening on all our systems to some extent. I adjusted the automounter timeouts from 5 minutes to 30 days. Doing so already helped on one other test system.

Meanwhile, back on the farm... we're sending out some junky data that overflows quickly so that's been swamping our servers with twice the usual load. Annoying, but we'll just let nature take its course and get through the bad spots. This has the positive by product of giving us a heavy-load test to see how our servers currently perform under increased strain... except with the simultaneous aforementioned index build the extra splitter activity was gumming everything up. We have the splitters offline as I write this. Hopefully we'll be able to get them back online before we run out of work. If not, then so be it.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 674281 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 674329 - Posted: 8 Nov 2007, 23:22:04 UTC

Thanks for keepin' us informed Matt . . . fingers *crossed* @ this moment re: Servers . . . Best of Luck Sir!

Keep up the Great Work Berkeley.


BOINC Wiki . . .

Science Status Page . . .
ID: 674329 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 674378 - Posted: 9 Nov 2007, 0:15:05 UTC

UPDATE:

The primary science database, without our permission, is sending over the new index we created to the replica. No big deal, except this is blocking the splitters which in turn means no new work is being created. Even more mysterious is that, after sending the new index, it began sending it *again*. Some kind of bug? Infinite loop? Yet another inexplicable database behavior that is noted as a feature lost in a giant pile of impenetrable documentation? You make the call!

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 674378 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 674383 - Posted: 9 Nov 2007, 0:22:06 UTC
Last modified: 9 Nov 2007, 0:28:48 UTC


perpetual eh . . .

[edit]:
koenji hyakkei pretty good Matt, pretty good (tight too!!!)


BOINC Wiki . . .

Science Status Page . . .
ID: 674383 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 674402 - Posted: 9 Nov 2007, 0:36:54 UTC

ANOTHER UPDATE:

Bob just explained to me what's going on. It's not an infinite loop - just sending the index over in 4 pieces as it is broken up over 4 dbspaces. So in about 3 hours the dam should break...

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 674402 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 674455 - Posted: 9 Nov 2007, 2:44:44 UTC - in response to Message 674402.  

I wonder if the NTPC app should run on an OLAP database replicata? Typically, major production systems run transactions on one database, and they replicate it over to another database that has a lot more indexes. That way, reporting/analysis does not affect the performance of transactions. Of course, this requires a lot more disk space and fiber to refresh the database periodically.

In any event, while an index is being created, no rows can be modified in the tablespace for that specific table. Depending on the locking algorithm, all the database write requests will either queue up or timeout. Select statements should be fine.

On a 170GB SQL Server database, I saw an index take 30 minutes to create. It's hard to compare the experience with SETI though. Trust me if that DB was on RAID 5 it would take 2 days to create that index.
ID: 674455 · Report as offensive
Profile eaglescouter

Send message
Joined: 28 Dec 02
Posts: 162
Credit: 42,012,553
RAC: 0
United States
Message 674508 - Posted: 9 Nov 2007, 4:24:23 UTC - in response to Message 674402.  

ANOTHER UPDATE:

Bob just explained to me what's going on. It's not an infinite loop - just sending the index over in 4 pieces as it is broken up over 4 dbspaces. So in about 3 hours the dam should break...

- Matt

Still no work available for my clients :(
It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :)
ID: 674508 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 674536 - Posted: 9 Nov 2007, 7:12:05 UTC

As usual Matt, your keeping us in the loop with your continued technical posts is most appreciated! Gives us some background on the trials and tribulations as a counterpoint to the posts we see just complaining because things are not running smoothly. A lot of things to juggle and balance out to keep the whole thing working!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 674536 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 674583 - Posted: 9 Nov 2007, 11:09:33 UTC - in response to Message 674536.  

As usual Matt, your keeping us in the loop with your continued technical posts is most appreciated! Gives us some background on the trials and tribulations as a counterpoint to the posts we see just complaining because things are not running smoothly. A lot of things to juggle and balance out to keep the whole thing working!



...and instead of complaining, I use the downtime to shut down & service my machines. By the time I've done this, work should be available for them on reboot.

Thanks for the insights Matt, keep 'em coming.
ID: 674583 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 674584 - Posted: 9 Nov 2007, 11:19:46 UTC - in response to Message 674508.  

ANOTHER UPDATE:

Bob just explained to me what's going on. It's not an infinite loop - just sending the index over in 4 pieces as it is broken up over 4 dbspaces. So in about 3 hours the dam should break...

- Matt

Still no work available for my clients :(


Hmmm. It looks like you have well over 1000 unprocessed wu's on your machines, many unprocessed for over a month.

Life would be happier for you if you 1) added a secondary project at, say, a 10% resource share to tide you over if there is a Seti glitch, 2) reduced your queue to a bare minimum, say 1 day's worth of work, so that you don't constipate the system for the rest of us (i.e. large caches really increase the project's latency for no real benefit), and 3) consumed a little less caffeine. Just a suggestion.
ID: 674584 · Report as offensive
Profile Rev. Tim Olivera

Send message
Joined: 15 Jan 06
Posts: 20
Credit: 1,717,714
RAC: 0
United States
Message 674594 - Posted: 9 Nov 2007, 12:16:18 UTC - in response to Message 674281.  

It must be friday I see not one of our systems is doing any Seti work!! You can count on, when theirs a long week end, the Seti server well be down! Bet the farm and the house on that it's a sure thing! And I'm not a betting man but I would bet that!! The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but them $100,000 SUN systems with 40 Hard Drives and 10Gig of RAM you guys run, every other week end it's down?? whats up with that?? Oh well I told I would bitch if I came in and found our systems on and doing nothing so theirs my bitch.

Rev. Tim Olivera

As noted yesterday in my tech news item we had some database plans this morning. First a brief SETI@home project outage to clean up some logs. That was quick and harmless. We then kept the assimilators offline so we could add signal table indexes on the science database. Jeff's continuing work on developing/optimizing the signal candidate "nitpicker" - short for "near time persistency checker" i.e. the thing that continually looks for persistent, and therefore interesting, signals in our reduced data. The new indexes will be a great help.

Of course, there were other things afoot to make the above a little more complicated. The science replica database server hung up again this morning. We found this was due to the automounter losing some important directories. Why the hell does this happen? The mounts time out naturally, but the automounter fails to remount them next time they are needed. Seems like a major linux bug to me, as it's happening on all our systems to some extent. I adjusted the automounter timeouts from 5 minutes to 30 days. Doing so already helped on one other test system.

Meanwhile, back on the farm... we're sending out some junky data that overflows quickly so that's been swamping our servers with twice the usual load. Annoying, but we'll just let nature take its course and get through the bad spots. This has the positive by product of giving us a heavy-load test to see how our servers currently perform under increased strain... except with the simultaneous aforementioned index build the extra splitter activity was gumming everything up. We have the splitters offline as I write this. Hopefully we'll be able to get them back online before we run out of work. If not, then so be it.

- Matt


ID: 674594 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20267
Credit: 7,508,002
RAC: 20
United Kingdom
Message 674609 - Posted: 9 Nov 2007, 13:02:51 UTC - in response to Message 674594.  
Last modified: 9 Nov 2007, 13:03:48 UTC

... The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but ...

I guess there's only ever one user on there,

and you have sold your soul to god?


(Still, 10 years uptime is still very good!)

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 674609 · Report as offensive
Profile John Neale
Volunteer tester
Avatar

Send message
Joined: 16 Mar 00
Posts: 634
Credit: 7,246,513
RAC: 9
South Africa
Message 674633 - Posted: 9 Nov 2007, 14:39:49 UTC - in response to Message 674594.  

It must be friday I see not one of our systems is doing any Seti work!! You can count on, when theirs a long week end, the Seti server well be down! Bet the farm and the house on that it's a sure thing! And I'm not a betting man but I would bet that!! The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but them $100,000 SUN systems with 40 Hard Drives and 10Gig of RAM you guys run, every other week end it's down?? whats up with that?? Oh well I told I would bitch if I came in and found our systems on and doing nothing so theirs my bitch.

Rev. Tim Olivera


"Reverend", you sure know how to use irreverent language. And checking your posts, you have been known to complain before. Often.

You might help your cause if you maintained a small cache of work units, and if you had a back-up project. Since your main interest is SETI, you could even consider SETI Beta as your backup; it does have work at the moment. (These points have been made by others. Often.)

The SETI servers are not down at the moment. (I have managed to snare a few work units during the past 10 hours.) As explained by Matt, they're straining under a heavy load brought about by a convergence of adverse circumstances. They therefore cannot keep up with the demand. As Martin points out, demand is the essential difference between your setup and that of SETI@home.
ID: 674633 · Report as offensive
Profile -=SuperG=-
Avatar

Send message
Joined: 3 Apr 99
Posts: 63
Credit: 89,161,651
RAC: 23
Canada
Message 674663 - Posted: 9 Nov 2007, 16:10:11 UTC - in response to Message 674633.  

It must be friday I see not one of our systems is doing any Seti work!! You can count on, when theirs a long week end, the Seti server well be down! Bet the farm and the house on that it's a sure thing! And I'm not a betting man but I would bet that!! The funny thing is I have a web server built in 1997, a dual Pentium Pro 200MHz system with 2 9Gig SCSI H/Ds' running FreeBSD 2.something that has never been off and never shut down and wroking for 3,000+ days with not one break down, but them $100,000 SUN systems with 40 Hard Drives and 10Gig of RAM you guys run, every other week end it's down?? whats up with that?? Oh well I told I would bitch if I came in and found our systems on and doing nothing so theirs my bitch.

Rev. Tim Olivera


"Reverend", you sure know how to use irreverent language. And checking your posts, you have been known to complain before. Often.

You might help your cause if you maintained a small cache of work units, and if you had a back-up project. Since your main interest is SETI, you could even consider SETI Beta as your backup; it does have work at the moment. (These points have been made by others. Often.)

The SETI servers are not down at the moment. (I have managed to snare a few work units during the past 10 hours.) As explained by Matt, they're straining under a heavy load brought about by a convergence of adverse circumstances. They therefore cannot keep up with the demand. As Martin points out, demand is the essential difference between your setup and that of SETI@home.



He has a point though. Seems to be the long weekends when these strange problems occur. Of course I can't prove that.. just seems that way.

Anyways, there is no point in complaining about these little outages. The servers will be up when they are up. Some things just take time.

I upgraded 1 of my PCs to a C2Q 6600 from a P4 2.53GHz. Then found out that the splitters were down.. lol.. ran out of cached WUs in less than 2 hours.

Anyways, good luck getting that index copied over. Hopefully you can get those splitters back up for a few hours today so I can download another 4 days worth of WUs.

Cheers

Boinc Wiki




"Great spirits have always encountered violent opposition from mediocre minds." -Albert Einstein
ID: 674663 · Report as offensive
Profile Mentor397
Avatar

Send message
Joined: 16 May 99
Posts: 25
Credit: 6,794,344
RAC: 108
United States
Message 674709 - Posted: 9 Nov 2007, 18:17:10 UTC - in response to Message 674663.  

[quote
You might help your cause if you maintained a small cache of work units, and if you had a back-up project. Since your main interest is SETI, you could even consider SETI Beta as your backup; it does have work at the moment. (These points have been made by others. Often.)

The SETI servers are not down at the moment. (I have managed to snare a few work units during the past 10 hours.) As explained by Matt, they're straining under a heavy load brought about by a convergence of adverse circumstances. They therefore cannot keep up with the demand. As Martin points out, demand is the essential difference between your setup and that of SETI@home.[/quote]


He has a point though. Seems to be the long weekends when these strange problems occur. Of course I can't prove that.. just seems that way.

Anyways, there is no point in complaining about these little outages. The servers will be up when they are up. Some things just take time.

I upgraded 1 of my PCs to a C2Q 6600 from a P4 2.53GHz. Then found out that the splitters were down.. lol.. ran out of cached WUs in less than 2 hours.

Anyways, good luck getting that index copied over. Hopefully you can get those splitters back up for a few hours today so I can download another 4 days worth of WUs.

[/quote]

I was lucky. I ran my system out of work deliberately two days ago to make sure my upgrade to the latest boinc was flawless (prolly unnecessary, but I've had a bad experience once). I'd just manage to get the cache full a day before this came up, so I should be good.

I wonder about the people though who complain the most. It's true, I am nearly religiously against other projects (haven't found one I liked, am not going to join another 'just' to keep busy), but why complain about the project not working flawlessly. It's hard to keep things together on the edge of technology, with a very limited budget, and very limited time. What do you expect? Even the computers on the space station go down once in a while.

- Jim

ID: 674709 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 674786 - Posted: 9 Nov 2007, 20:32:59 UTC

What ever, as my kids might say...

... but if you want to have some fun check out scarecrow's graphs. It is like a roller coaster ride! It appears CentralCommand has released the blockage.
ID: 674786 · Report as offensive
Profile -=SuperG=-
Avatar

Send message
Joined: 3 Apr 99
Posts: 63
Credit: 89,161,651
RAC: 23
Canada
Message 674846 - Posted: 9 Nov 2007, 22:16:35 UTC - in response to Message 674786.  
Last modified: 9 Nov 2007, 22:17:07 UTC

What ever, as my kids might say...

... but if you want to have some fun check out scarecrow's graphs. It is like a roller coaster ride! It appears CentralCommand has released the blockage.


Are you sure it was CentralCommand that released the blockage?? I hear these people guarantee their work.. :P



Sry.. just my attempt at humour.. :)

Boinc Wiki




"Great spirits have always encountered violent opposition from mediocre minds." -Albert Einstein
ID: 674846 · Report as offensive
Profile Keck_Komputers
Volunteer tester
Avatar

Send message
Joined: 4 Jul 99
Posts: 1575
Credit: 4,152,111
RAC: 1
United States
Message 675132 - Posted: 10 Nov 2007, 7:25:33 UTC - in response to Message 674633.  

You might help your cause if you maintained a small cache of work units, and if you had a back-up project. Since your main interest is SETI, you could even consider SETI Beta as your backup; it does have work at the moment. (These points have been made by others. Often.)

SETI beta is probably not a good choice for a backup project if your main project is SETI. Both servers tend to be down at the same times, even thought that is not the case this time.
BOINC WIKI

BOINCing since 2002/12/8
ID: 675132 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 675829 - Posted: 11 Nov 2007, 4:54:11 UTC - in response to Message 674663.  


He has a point though. Seems to be the long weekends when these strange problems occur. Of course I can't prove that.. just seems that way.

Unfortunately, his point seems to be that the project does deliver the level of reliability that they promise.

There is a reason BOINC can carry a multi-day cache: projects run on a shoestring with hand-me-down servers are going to break from time to time, and they're going to stay down for extended periods when the failures are big.

So, they (BOINC) built a client that can keep crunching through outages. As others have suggested, they also built a client that can attach to multiple projects, so if you are passionate about crunching you can crunch more than one project.

Unfortunately, or Rev. Tim seems to be passionate about not tuning his installations to make sure he doesn't run out of work. He'd rather have the project spend money they don't have on new servers (sure, the SUN machines were expensive, but they were bought long ago), and on staff to be on-call 24/7.

The SETI servers don't need 99.999% reliability to keep us happily crunching away. 80% is probably good enough if we just do a little BOINC tuning.

ID: 675829 · Report as offensive

Message boards : Technical News : Grembo Zavia (Nov 08 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.