And Up Again (Sep 11 2008)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 807115 - Posted: 11 Sep 2008, 22:08:04 UTC So we hit that brick wall again with the science database - that is, when we try to create a new index it works fine on the primary server but then clogs up sending the new index pages to the secondary. This clog locks up the database, the splitters grind to a halt, the assimilators grind to a halt, i.e. fun for everybody! We thought we were out of the woods yesterday afternoon but checking in at 1am last night (this morning?) I saw this all happening again, so I gave things a swift kick and went to bed. This morning, once we were all here at the lab, we decided to just bite the bullet this time and shut down all the splitters/assimilators and let the clog work through naturally on its own, which it did. We also took the down time to do an "update statistics" on one signal table (this helps re-sort current indexes for speedier lookups) and add disk space for said indexes. I just turned things back on, we'll be catching up for a while, etc. I did do some qlogic card testing today which got us over my "information gathering and training" hurdle so we can upgrade the remaining two servers with old OS's in the coming weeks. We also got our homemade NAS configured so that we may get the old NetApp rack out of the closet maybe next week. It's still working quite reliably, but it's taking up a third of our closet space, a seventh of our power, but delivering only 2 TB of raw disk space. Not really efficient, and we have a lot of servers waiting to get into the closet already. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 807115 ·

squishymaster Send message Joined: 21 Oct 03 Posts: 34 Credit: 784,496 RAC: 0	Message 807125 - Posted: 11 Sep 2008, 22:24:18 UTC Thanks for the update Matt. I wonder about this problem that keeps coming up. Is it because there isn't enough hard drive space? Or because the servers aren't fast enough? ID: 807125 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 807127 - Posted: 11 Sep 2008, 22:28:13 UTC - in response to Message 807125. Is it because there isn't enough hard drive space? Or because the servers aren't fast enough? Still unclear to me. Bob's looking into it. Could be we have simple configuration screws to tighten, or that the server acting as the primary science database also has 48 drives, so we use up a lot of extra resources transferring raw data to/from there. Maybe it's the RAID configuration on the database/raw data drives (maximized for storage, not for speed). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 807127 ·

squishymaster Send message Joined: 21 Oct 03 Posts: 34 Credit: 784,496 RAC: 0	Message 807130 - Posted: 11 Sep 2008, 22:43:10 UTC So it would seem that the problem is too many results coming in too quickly? Wouldn't it be possible to create a bottle neck that would only allow so many results coming in at a time and the rest would have to wait until the way was clear? ID: 807130 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 807132 - Posted: 11 Sep 2008, 22:47:11 UTC . . . nice work from all @ Berkeley - Thanks for the Posting Matt BOINC Wiki . . . Science Status Page . . . ID: 807132 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 807150 - Posted: 11 Sep 2008, 23:19:10 UTC - in response to Message 807130. So it would seem that the problem is too many results coming in too quickly? Wouldn't it be possible to create a bottle neck that would only allow so many results coming in at a time and the rest would have to wait until the way was clear? Nope. We can insert results probably 10 times as fast as we do now. In fact, the bottleneck there would be our bandwidth for workunits going out (you have to download a workunit to ultimately send a result). The current issues deal with stuff on the system outside of creating workunits and inserting results - like storing 8 TB of raw data there, creating indexes for future scientific analysis, etc. This is a problem, but not exactly due to too many results. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 807150 ·

KWSN THE Holy Hand Grenade! Volunteer tester Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0	Message 807485 - Posted: 12 Sep 2008, 17:40:00 UTC got the following when I tried to update and report 5 WU's: 9/12/2008 10:33:39 AM\|SETI@home\|Scheduler request failed: HTTP internal server error This with client 5.10.30... not the client that usually gives this error. . Hello, from Albany, CA!... ID: 807485 ·

KWSN THE Holy Hand Grenade! Volunteer tester Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0	Message 808642 - Posted: 15 Sep 2008, 21:45:33 UTC .xml stats export apparently not working again... no updates on the stats sites for the last two days. . Hello, from Albany, CA!... ID: 808642 ·

speedimic Volunteer tester Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0	Message 808649 - Posted: 15 Sep 2008, 22:22:00 UTC Looking at the server status page, it seems they are working on it... mic. ID: 808649 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.