Sad Trombone (Nov 25 2009)


log in

Advanced search

Message boards : Technical News : Sad Trombone (Nov 25 2009)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 949811 - Posted: 25 Nov 2009, 23:34:12 UTC

Okay then. The mysql commit behavior we were testing was an absolute failure - though for expected reasons (not enough disk i/o, even with the solid state drives). It was worth a shot, but we fell back to the old commit behavior for now.

However, this caused a lot of backend processes to clog up including the transitioners, which ultimately meant the splitters burned through all kinds of raw data files before they realized we had more than enough work on disk. This could have been bad, i.e. filled up our workunit storage server, but luckily it didn't even come close to doing that.

Anyway, we reverted this morning and all the dams broke for a while... until we ran out of work to send out. Turns out the last 10 files I brought up from Arecibo are all broken. Fwa wa wa waaaaa. This is particularly frustrating as I was busting my hump trying to get enough work on line before the long holiday weekend, and now we have zero. So it'll be to me and Jeff to check in over the next few days and kick the pipeline along. We'll be out of real work to send out until this evening at the earliest, and quite probably hit long periods of no work throughout the weekend. Fine.

In better news, we did the last bits to get the Astropulse signal table fully copied over to another database fragment - only losing a few rows here and there (as opposed to many thousands as originally thought). Work will resume on Monday to make this exchange old/new fragments and hopefully the science database will be much happier.

That's it for now.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile James Sotherden
Avatar
Send message
Joined: 16 May 99
Posts: 8947
Credit: 36,236,685
RAC: 46,147
United States
Message 949855 - Posted: 26 Nov 2009, 4:11:29 UTC

Matt, that stinks you have to check in and do some kicking on the holidays. Theres nothing I hate worse than having to think about work on a long weekend.

Have a great Thanksgiving.
____________

Old James

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12341
Credit: 2,639,120
RAC: 1,240
Netherlands
Message 949865 - Posted: 26 Nov 2009, 6:37:12 UTC - in response to Message 949811.
Last modified: 26 Nov 2009, 6:44:00 UTC

So it'll be to me and Jeff to check in over the next few days and kick the pipeline along.

Are you crazy? Go on, have that same extra long weekend that everyone else has - medical, fire and police people excluded - as I am sure people's computers can do without work for a while, while the people being boss of said computers have got plenty of projects to choose from to overcome any workflow problems Seti has.

Have a fine weekend and a Happy Thanksgiving, Matt (and Jeff). :-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile JulieProject donor
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 28 Oct 09
Posts: 22360
Credit: 3,960,060
RAC: 5,684
Belgium
Message 949867 - Posted: 26 Nov 2009, 6:57:08 UTC

Have a happy Thanksgiving Matt and Jeff:-)
Really glad you keep us so well updated!

Profile T. Bell
Send message
Joined: 27 Mar 06
Posts: 20
Credit: 229,303
RAC: 0
Australia
Message 949874 - Posted: 26 Nov 2009, 8:01:31 UTC

Dang. Well that's a real kick in the nuts. Just as well I buffered up 10 days worth of workunits to keep my lappy busy for now ;-)

Happy Thanksgiving by the way ;)

-- Tom :)
____________

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 12,136,780
RAC: 13,881
United Kingdom
Message 949885 - Posted: 26 Nov 2009, 11:01:24 UTC - in response to Message 949865.

Are you crazy? Go on, have that same extra long weekend that everyone else has - medical, fire and police people excluded -


What? Let 'em have all that time off? Surely Matt and Co. really are our Fire, Police, Ambulance services all rolled into one?

Only kidding - have a great time while all the rest of us stare into blank screens! ;-)
____________

Peterjansson20
Send message
Joined: 12 Oct 00
Posts: 1
Credit: 124,856
RAC: 0
Sweden
Message 949894 - Posted: 26 Nov 2009, 13:23:55 UTC - in response to Message 949811.

I hope i will worke better for you an
you not lose enny more data

God lyck!
Peter
____________

Profile Ramuh
Send message
Joined: 9 Jul 99
Posts: 3
Credit: 3,039,201
RAC: 0
Netherlands
Message 949895 - Posted: 26 Nov 2009, 13:28:08 UTC

You're talking a lot about MySQL problems on that new monster of yours, and in case you weren't aware, I just thought you should know that MySQL scales horribly on a large number of cores. It can barely scale to 8, so there's virtually no chance that you're getting good results on 24.

Sun has an "official" writeup of a scaling attempt here: http://blogs.sun.com/mrbenchmark/entry/scaling_mysql_on_a_256

Their suggestion? Run several instances of MySQL on the same server. Which is a bit meh, but if you insist on running MySQL on a high-performance project like this, it might be worth looking into splitting up the most trafficked tables into separate instances. Then you could also enable the safe commit behavior individually on the critical servers.

You've also mentioned running into replica problems on jocelyn after a primary crash, which is very common. Here's a trick to get around it, at least temporarily.

After the crash, feed jocelyn the following through the MySQL console:

CHANGE MASTER TO MASTER_LOG_FILE=[NEXT FILE], MASTER_LOG_POS=4;
START SLAVE;

Replace [NEXT FILE] with the name of the first binary log file on mork that it started writing after the crash, typically mysql-bin.002342 or something similar. You can get the name by running SHOW MASTER STATUS; on mork. This will skip the corrupted end of the previous binary log, and restart replication from the new file.

Then verify that it's running with a SHOW SLAVE STATUS;

Note that you *cannot be sure* that the state on the replica and the primary is now consistent unless you are using safe commits, however it should still be consistent enough for non-science use.
____________

C
Send message
Joined: 3 Apr 99
Posts: 240
Credit: 6,694,192
RAC: 999
United States
Message 949906 - Posted: 26 Nov 2009, 15:22:29 UTC

Looks as if everything quit completely last night around 2300 PST. Let it go for the day, Matt, and enjoy Thanksgiving Day with friends and family. We'll survive...

C
____________

Join Team MacNN

cncr04s
Send message
Joined: 25 Oct 00
Posts: 6
Credit: 296,024
RAC: 0
United States
Message 950064 - Posted: 27 Nov 2009, 4:35:34 UTC

Seti is always offline it seems, I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. I've been with seti for a long time (9) years, and I'm sad to see so many problems with servers going down lately... can't you guys find some one smart enough to fix stuff? and buy lasting equipment, as my server machine is still top shape after 6 years. I'd be willing to do help seti in my spare time, if only I lived that far west, I'm sure others would too so don't rant to me about "costs" and the lack of funding.

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24715
Credit: 522,925
RAC: 30
United States
Message 950066 - Posted: 27 Nov 2009, 4:38:06 UTC - in response to Message 950064.

Seti is always offline it seems, I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. I've been with seti for a long time (9) years, and I'm sad to see so many problems with servers going down lately... can't you guys find some one smart enough to fix stuff? and buy lasting equipment, as my server machine is still top shape after 6 years. I'd be willing to do help seti in my spare time, if only I lived that far west, I'm sure others would too so don't rant to me about "costs" and the lack of funding.

They have no money for anything - the entire budget is from donations at the moment. Servers are mostly donated...
____________


BOINC WIKI

Bearcat
Send message
Joined: 10 Sep 99
Posts: 106
Credit: 10,778,506
RAC: 0
United States
Message 950079 - Posted: 27 Nov 2009, 5:22:07 UTC - in response to Message 950064.

... ,I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. ...


Same here. I have been with SETI from 1999 to 2004 and received a "we need your help" email a few months ago. Unfortunately I can only donate my computer time, but not money, and it seems (at least to me) that SETI either needs more donations or the existing donations are spent unwisely.

It is certainly none of my business, but the thought crossed my mind, is there info available about the funding situation and where the money goes ? It also seems people are just interested in keeping the computers going to gain credit instead of actually finding ET. Isn't this what's it's all about ? And how are we going to do this without getting new data from Arecibo ? It seems pointless to continue and to waste millions of KW hours

/end of rant + start of apology


____________

Profile GreggyBee
Volunteer tester
Avatar
Send message
Joined: 9 Mar 01
Posts: 203
Credit: 1,600,521
RAC: 0
Message 950133 - Posted: 27 Nov 2009, 10:10:31 UTC - in response to Message 950079.

... ,I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. ...


It also seems people are just interested in keeping the computers going to gain credit instead of actually finding ET. Isn't this what's it's all about ?


For some people, yes; what you need to remember is S@H is now only one part of a larger Distributed computing project called Boinc; A lot of people are only interested in the number-crunching game- not the individual merits of any one project. Don't believe me? Just check out the shoutbox on Boincstats: there's no loyalty to individual projects, just credit- chasers boasting about the latest mullti-million credit day- on BOINC. S@H has become a victim of its own pioneering.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 950265 - Posted: 27 Nov 2009, 20:16:30 UTC - in response to Message 950064.

Seti is always offline it seems, I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place

Then you didn't understand what was going on then, and you don't understand now.

Setting aside the funding issue, somewhat.

BOINC projects are supposed to do big science on very small budgets. That means that they don't do things like redundant server clusters, multihomed sites and all of the (expensive) things that hide the occasional outage that you might see with Amazon or Space.com.

If you have work in your cache, a server outage is a minor inconvenience. It's interesting to know about, but that's all.

The work isn't time critical, so getting reported later is no big deal.

I wish people would start realizing how well the entire system (client and server) work when none of the individual components are 99.99% reliable.

____________

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8518
Credit: 4,202,497
RAC: 1,666
United Kingdom
Message 950307 - Posted: 27 Nov 2009, 23:58:09 UTC - in response to Message 950265.
Last modified: 28 Nov 2009, 0:00:35 UTC

... The work isn't time critical, so getting reported later is no big deal.

I wish people would start realizing how well the entire system (client and server) work when none of the individual components are 99.99% reliable.

Considering the overall environment, the vast numbers (data AND users) and the project goals, I'm utterly amazed at how successful and how well all of this works in the first place.

Truly a dedication from Matt & Dr A and a few others for the last decade and more!

And all despite some flat-Earth political nit-twit forbidding all funding in case they might find God?!!! (Or some such?...)


Meanwhile, a failed network switch and a few hours downtime over their big festive break is all part of the fun.

The big Boinc crunch continues, uninterrupted.

Who forsake their turkey to go into the lab to fix that one I wonder?!...


... So when do all the Boinc scheduler problems and all the other minor niggles get fixed? ;-)

(Or to rephrase: Any interested volunteers to the rescue?)


It's all part of the experiment and the development!

Happy Thanksgiving crunchin'!
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Odysseus
Volunteer tester
Avatar
Send message
Joined: 26 Jul 99
Posts: 1786
Credit: 3,831,826
RAC: 287
Canada
Message 950435 - Posted: 28 Nov 2009, 9:22:59 UTC - in response to Message 950079.

Last year’s budget

____________

Profile Wingless Wonder
Send message
Joined: 14 May 99
Posts: 14
Credit: 12,156,933
RAC: 0
United States
Message 950447 - Posted: 28 Nov 2009, 10:37:37 UTC

In response to some less-than-positive comments about the state of SETI@home, its servers, not being able to obtain work, etc., I myself volunteer use of my computer because I feel like it. No one is twisting my arm to do so. I maintain a two-day work cache in anticipation of the occasional outages, along with participation in other BOINC projects. I rarely run out of work. If I did, no big deal.

I don't feel that the folks at SETI@home owe me some sort of debt of gratitude. On the contrary, I feel privileged to be able to participate in any of the BOINC projects.

Profile Robert Waite
Avatar
Send message
Joined: 23 Oct 07
Posts: 2217
Credit: 5,757,723
RAC: 2,383
Canada
Message 950829 - Posted: 29 Nov 2009, 18:27:17 UTC

No worries
A signal may travel for thousands of years to reach Earth.
No need to squack about a few days of down time.

Profile Dr. C.E.T.I.
Avatar
Send message
Joined: 29 Feb 00
Posts: 15993
Credit: 690,597
RAC: 0
United States
Message 952264 - Posted: 4 Dec 2009, 9:41:01 UTC


eh Matt - a big hug for you & your updates
from joanne & i

____________
BOINC Wiki . . .

Science Status Page . . .

Message boards : Technical News : Sad Trombone (Nov 25 2009)

Copyright © 2014 University of California