Returning (Jul 14 2008)

Message boards : Technical News : Returning (Jul 14 2008)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 782781 - Posted: 14 Jul 2008, 23:07:47 UTC

So the second half of last week was spent trying to figure out why our database server was so painfully slow. Bob, Jeff, Eric, and I were scratching our heads, trying this and that to diagnose and fix this mysterious problem. Everything was fine before the Tuesday outage, nothing changed during the outage, but upon restarting the project we couldn't handle very much load.

We were quick to blame mysql, as it has had random episodes in the past of secretive bookkeeping causing us grief. We ruled this out. We started blaming the "credited job" table which is growing infinitely. This is the table keeping track of which user did which workunit. We do nothing but insert into this table (no random access selects), so why would that be a problem? Nevertheless we turned off inserts (back to writing similar info to flat files for later parsing) to no avail.

Maybe it was hardware? Did a disk fail? Is a disk about to fail? We ruled all that out as well, which brought the focus back on mysql with dozens of server tuneables that we tweaked for various reasons over the years. Did we go too far with some of those variables? We convinced ourselves that wasn't it.

Of course on hindsight the ultimate solution seems obvious: the filesystem where all the data is kept. Just because the hardware seems okay, and I/O rates are normal, doesn't mean the filesystem is happy. And the focus was back on "credited job" as this table is constantly growing and therefore a big ol' file - much bigger than anything else. A file that is constantly growing during all other inserts and updates that happen as the project is running will likely become interleaved and fragmented to the nth degree. Without fearing data loss we dropped the credited job indexes and that alone broke the dam. Well, jeez.

We're still catching up from the backlog, but mysql is performing incredibly well at this point. This is good, as we're hoping to release Astropulse before the end of the week. More on that later.

Happy Bastille Day, by the way.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 782781 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 782785 - Posted: 14 Jul 2008, 23:14:44 UTC
Last modified: 14 Jul 2008, 23:19:37 UTC

Good to hear you found the cause of the problems, but by dropping the table does that mean that if someone actually finds anything you won't know who to credit with it?
[edit] Or is the info stored somewhere else also?
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 782785 · Report as offensive
Profile Mumps [MM]
Volunteer tester
Avatar

Send message
Joined: 11 Feb 08
Posts: 4454
Credit: 100,893,853
RAC: 30
United States
Message 782786 - Posted: 14 Jul 2008, 23:25:23 UTC - in response to Message 782785.  

Good to hear you found the cause of the problems, but by dropping the table does that mean that if someone actually finds anything you won't know who to credit with it?
[edit] Or is the info stored somewhere else also?

Actually, he indicates they just dropped an Index. Which only helps find the records faster. All the data is still there. Updating indexes on a highly fragmented file can be a performance pig, so not maintaining that should be hugely beneficial. And when it comes time to pull the lucky WU-Owner information, the data can be accessed without an index, or an index can be built then. (Maybe once they have enough disk to give it it's own filesystem to avoid the fragmentation in the first place. :-))
ID: 782786 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 782787 - Posted: 14 Jul 2008, 23:28:37 UTC - in response to Message 782786.  


Actually, he indicates they just dropped an Index. Which only helps find the records faster. All the data is still there. Updating indexes on a highly fragmented file can be a performance pig, so not maintaining that should be hugely beneficial. And when it comes time to pull the lucky WU-Owner information, the data can be accessed without an index, or an index can be built then. (Maybe once they have enough disk to give it it's own filesystem to avoid the fragmentation in the first place. :-))

That's what I get for "speed reading" the post! I missed the word "indexes" hehe. Thanks for pointing it out.
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 782787 · Report as offensive
Profile staze

Send message
Joined: 8 May 99
Posts: 9
Credit: 34,186,653
RAC: 21
United States
Message 782791 - Posted: 14 Jul 2008, 23:55:08 UTC

So, I guess the question becomes either... could you place that index on another drive/partition, and therefore give it room to grow, without fragmenting everything else... or, have you tried raw device support within mysql, and forgo the filesystem altogether?

how big did the index get? we're obviously talking millions (if not, hundreds of millions) of entries...
ID: 782791 · Report as offensive
Profile Neil Blaikie
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 143
Credit: 6,652,341
RAC: 0
Canada
Message 782797 - Posted: 15 Jul 2008, 0:18:00 UTC

Glad to see you guys have made some huge progress in sorting the problem out.

A big round of applause to you all, you all do a great job.

Being a server admin myself, I know how it feels to spend ages finding problems that sometimes turn out to be simplest of things.
I actually switched into technician mode over the weekend and troubleshooted a few "dead" motherboards lying around the house and from work. Got most fixed, turns out it was A)a bent pin on CPU. B) one had a small piece of metallic object stuck in a ram slot C)blown capacitor. Just no one from technical had bothered to troubleshoot it thoroughly enough.
While I have probably bore everyone senseless, just goes to show with a little time and patience small mundane things lead to big successes.

Keep up the good work guys.
ID: 782797 · Report as offensive
Profile Mumps [MM]
Volunteer tester
Avatar

Send message
Joined: 11 Feb 08
Posts: 4454
Credit: 100,893,853
RAC: 30
United States
Message 782812 - Posted: 15 Jul 2008, 2:31:37 UTC - in response to Message 782791.  

So, I guess the question becomes either... could you place that index on another drive/partition, and therefore give it room to grow, without fragmenting everything else... or, have you tried raw device support within mysql, and forgo the filesystem altogether?

how big did the index get? we're obviously talking millions (if not, hundreds of millions) of entries...

Actually, I think what Matt is indicating here is not that just the Index itself was fragmented. But with the combination of the index and data being interleaved with everything else on the system, the drives just ended up seeking almost randomly trying to get multiple indexes for this single file updated. And all the extra seeking is what killed the performance. A lot of seeking will cause your I/O's per second to plummet. I don't recall if Matt has previously indicated that they had a separate filesystem for indexes, but even that might not help enough once the size of the indexes got large enough. The fact the these indexes would be interleaved with all the others could still cause performance problems like this.

The best thing might be to leave this table on it's own filesystem (on separate drives) and don't even bother with more than a basic index to enforce uniqueness, if any. I don't recall that there's anything in the system that needs to read anything out of this file. (And Matt implied there isn't when he stated "we do nothing but insert.")

I don't know that a RAW filesystem would even have helped too much, because the pieces of the indexes would still end up scattered across all the drivespace allocated and strewn hither-n-yon.

Good sleuthing gents! Glad you got a handle on it. Incidentally, wouldn't this improve NitPicker's performance too? ;-)
ID: 782812 · Report as offensive
rq2000

Send message
Joined: 19 May 99
Posts: 662
Credit: 1,041,579
RAC: 0
United States
Message 782814 - Posted: 15 Jul 2008, 2:45:55 UTC

I wanted to say THANKS for all of your HARD WORK getting the site back up and running. It was not an easy task I am certain of that, I only hope that everyone still has all their hair and not pulled themselves bald over the past week. Excellent JOB!!! I did notice once it was back online the speed appeared to be dramatically increased in uploading WUs. I am not sure if that was due to most not being aware it was back up and ruinning so there wasn't a BATTLE FOR BANDWITH or just as a result of the tweaking done trying to fix all of the problems. Either way it seemed to help. So thank you again.
ID: 782814 · Report as offensive
Profile AndyW Project Donor
Volunteer tester
Avatar

Send message
Joined: 23 Oct 02
Posts: 5862
Credit: 10,957,677
RAC: 18
United Kingdom
Message 782834 - Posted: 15 Jul 2008, 6:35:34 UTC

It's amazing how something that has always worked suddenly doesn't, with no obvious sign that anything broke in the first place. It's horrible when a fix becomes "poke & hope" too!
ID: 782834 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 782879 - Posted: 15 Jul 2008, 14:53:20 UTC - in response to Message 782791.  

So, I guess the question becomes either... could you place that index on another drive/partition, and therefore give it room to grow, without fragmenting everything else... or, have you tried raw device support within mysql, and forgo the filesystem altogether?

how big did the index get? we're obviously talking millions (if not, hundreds of millions) of entries...


According to the science page, that's about 453 million WU's...
.

Hello, from Albany, CA!...
ID: 782879 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 782882 - Posted: 15 Jul 2008, 15:42:52 UTC

Nice work Eric, Jeff, Josh, Matt, Bob, and everyone who has their hands in the project both here and at SETI Beta. Good news to hear Astropulse might be here by the end of the week. Very exciting news indeed. Keep up the good work all of ya!
ID: 782882 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 782950 - Posted: 16 Jul 2008, 2:27:13 UTC - in response to Message 782879.  

According to the science page, that's about 453 million WU's...


multiplied by ~3 names each due to average of ~3 results for each (I took a guess at 3 so it may be wrong)

~BoB


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 782950 · Report as offensive
Ehran

Send message
Joined: 21 Dec 03
Posts: 4
Credit: 894,870
RAC: 0
Canada
Message 783728 - Posted: 18 Jul 2008, 3:16:35 UTC

would this be why i've been unable to download new work units for over a week?

not a lot of buttons to poke at on this end and i've only gotten a single batch of work units in oh ten days or so now. no idea what caused it to cough up some WU for me though which is a bit frustrating.
ID: 783728 · Report as offensive
Profile Chadwick

Send message
Joined: 13 Feb 04
Posts: 2
Credit: 758,217
RAC: 0
United States
Message 784409 - Posted: 19 Jul 2008, 17:14:04 UTC
Last modified: 19 Jul 2008, 17:14:51 UTC

Hello!

I hope off topic posts are ok. I didn't see anywhere else where this might be applicable.

I was wondering if anyone has looked into writing a SETIatHome app for the iPod Touch 2.0 / iPhone 2.0?

That would be awesome if my iPod could crunch some numbers while I'm not using it!

Thanks,
Chadwick Jones
Magnolia, Arkansas
USA
ID: 784409 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 784453 - Posted: 19 Jul 2008, 19:18:30 UTC - in response to Message 784409.  

Hello!

I hope off topic posts are ok. I didn't see anywhere else where this might be applicable.

The "Number Crunching" forum is where app discussions usually take place, but about 90% of posts here are off-topic and allowed.
I was wondering if anyone has looked into writing a SETIatHome app for the iPod Touch 2.0 / iPhone 2.0?

That would be awesome if my iPod could crunch some numbers while I'm not using it!

Thanks,
Chadwick Jones
Magnolia, Arkansas
USA

The Boincoid project on Sourceforge is for cell phones and includes both BOINC and S@H. Although its genesis was as a Google Android project, IIRC the developers are attempting to make both that version and a pure Sun Java version which might run on your phone.
                                                                  Joe
ID: 784453 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 784763 - Posted: 20 Jul 2008, 16:25:17 UTC - in response to Message 784453.  

Hello!

I hope off topic posts are ok. I didn't see anywhere else where this might be applicable.

The "Number Crunching" forum is where app discussions usually take place, but about 90% of posts here are off-topic and allowed.
I was wondering if anyone has looked into writing a SETIatHome app for the iPod Touch 2.0 / iPhone 2.0?

That would be awesome if my iPod could crunch some numbers while I'm not using it!

Thanks,
Chadwick Jones
Magnolia, Arkansas
USA

The Boincoid project on Sourceforge is for cell phones and includes both BOINC and S@H. Although its genesis was as a Google Android project, IIRC the developers are attempting to make both that version and a pure Sun Java version which might run on your phone.
                                                                  Joe

The standard warning applies. Cell phones do everything they can to ensure long battery life. This includes dropping the CPU speed by a factor of 10 to 100 and not having a Floating Point Unit. This will increase processing time by a factor of up to 10,000.


BOINC WIKI
ID: 784763 · Report as offensive
Profile Chadwick

Send message
Joined: 13 Feb 04
Posts: 2
Credit: 758,217
RAC: 0
United States
Message 785458 - Posted: 22 Jul 2008, 16:13:31 UTC

Thanks for the info!

Apple just recently released the retail SDK for iPod Touch 2.0. The app would need to be written in the watered down OS X version for iPod... I forget the code name. It cannot run any kind of java applet, but it would probably be easy to port over.

I wish I had more programming knowledge to try to write it myself. Hopefully someone will write an app for it, though. :)
ID: 785458 · Report as offensive

Message boards : Technical News : Returning (Jul 14 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.