And Up Again (Sep 11 2008)

Message boards : Technical News : And Up Again (Sep 11 2008)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 807115 - Posted: 11 Sep 2008, 22:08:04 UTC

So we hit that brick wall again with the science database - that is, when we try to create a new index it works fine on the primary server but then clogs up sending the new index pages to the secondary. This clog locks up the database, the splitters grind to a halt, the assimilators grind to a halt, i.e. fun for everybody!

We thought we were out of the woods yesterday afternoon but checking in at 1am last night (this morning?) I saw this all happening again, so I gave things a swift kick and went to bed. This morning, once we were all here at the lab, we decided to just bite the bullet this time and shut down all the splitters/assimilators and let the clog work through naturally on its own, which it did. We also took the down time to do an "update statistics" on one signal table (this helps re-sort current indexes for speedier lookups) and add disk space for said indexes. I just turned things back on, we'll be catching up for a while, etc.

I did do some qlogic card testing today which got us over my "information gathering and training" hurdle so we can upgrade the remaining two servers with old OS's in the coming weeks. We also got our homemade NAS configured so that we may get the old NetApp rack out of the closet maybe next week. It's still working quite reliably, but it's taking up a third of our closet space, a seventh of our power, but delivering only 2 TB of raw disk space. Not really efficient, and we have a *lot* of servers waiting to get into the closet already.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 807115 · Report as offensive
Profile squishymaster
Avatar

Send message
Joined: 21 Oct 03
Posts: 34
Credit: 784,496
RAC: 0
United States
Message 807125 - Posted: 11 Sep 2008, 22:24:18 UTC

Thanks for the update Matt. I wonder about this problem that keeps coming up. Is it because there isn't enough hard drive space? Or because the servers aren't fast enough?
ID: 807125 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 807127 - Posted: 11 Sep 2008, 22:28:13 UTC - in response to Message 807125.  

Is it because there isn't enough hard drive space? Or because the servers aren't fast enough?


Still unclear to me. Bob's looking into it. Could be we have simple configuration screws to tighten, or that the server acting as the primary science database also has 48 drives, so we use up a lot of extra resources transferring raw data to/from there. Maybe it's the RAID configuration on the database/raw data drives (maximized for storage, not for speed).

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 807127 · Report as offensive
Profile squishymaster
Avatar

Send message
Joined: 21 Oct 03
Posts: 34
Credit: 784,496
RAC: 0
United States
Message 807130 - Posted: 11 Sep 2008, 22:43:10 UTC

So it would seem that the problem is too many results coming in too quickly? Wouldn't it be possible to create a bottle neck that would only allow so many results coming in at a time and the rest would have to wait until the way was clear?
ID: 807130 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 807132 - Posted: 11 Sep 2008, 22:47:11 UTC


. . . nice work from all @ Berkeley - Thanks for the Posting Matt



BOINC Wiki . . .

Science Status Page . . .
ID: 807132 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 807150 - Posted: 11 Sep 2008, 23:19:10 UTC - in response to Message 807130.  

So it would seem that the problem is too many results coming in too quickly? Wouldn't it be possible to create a bottle neck that would only allow so many results coming in at a time and the rest would have to wait until the way was clear?


Nope. We can insert results probably 10 times as fast as we do now. In fact, the bottleneck there would be our bandwidth for workunits going out (you have to download a workunit to ultimately send a result). The current issues deal with stuff on the system outside of creating workunits and inserting results - like storing 8 TB of raw data there, creating indexes for future scientific analysis, etc. This *is* a problem, but not exactly due to too many results.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 807150 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 807485 - Posted: 12 Sep 2008, 17:40:00 UTC

got the following when I tried to update and report 5 WU's:

9/12/2008 10:33:39 AM|SETI@home|Scheduler request failed: HTTP internal server error

This with client 5.10.30... not the client that usually gives this error.
.

Hello, from Albany, CA!...
ID: 807485 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 808642 - Posted: 15 Sep 2008, 21:45:33 UTC

.xml stats export apparently not working again... no updates on the stats sites for the last two days.
.

Hello, from Albany, CA!...
ID: 808642 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 808649 - Posted: 15 Sep 2008, 22:22:00 UTC

Looking at the server status page, it seems they are working on it...
mic.


ID: 808649 · Report as offensive

Message boards : Technical News : And Up Again (Sep 11 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.