Sad Trombone (Nov 25 2009)

Message boards : Technical News : Sad Trombone (Nov 25 2009)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 949811 - Posted: 25 Nov 2009, 23:34:12 UTC

Okay then. The mysql commit behavior we were testing was an absolute failure - though for expected reasons (not enough disk i/o, even with the solid state drives). It was worth a shot, but we fell back to the old commit behavior for now.

However, this caused a lot of backend processes to clog up including the transitioners, which ultimately meant the splitters burned through all kinds of raw data files before they realized we had more than enough work on disk. This could have been bad, i.e. filled up our workunit storage server, but luckily it didn't even come close to doing that.

Anyway, we reverted this morning and all the dams broke for a while... until we ran out of work to send out. Turns out the last 10 files I brought up from Arecibo are all broken. Fwa wa wa waaaaa. This is particularly frustrating as I was busting my hump trying to get enough work on line before the long holiday weekend, and now we have zero. So it'll be to me and Jeff to check in over the next few days and kick the pipeline along. We'll be out of real work to send out until this evening at the earliest, and quite probably hit long periods of no work throughout the weekend. Fine.

In better news, we did the last bits to get the Astropulse signal table fully copied over to another database fragment - only losing a few rows here and there (as opposed to many thousands as originally thought). Work will resume on Monday to make this exchange old/new fragments and hopefully the science database will be much happier.

That's it for now.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 949811 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 949820 - Posted: 25 Nov 2009, 23:50:42 UTC

Well......
Hope you find time to enjoy your Thanksgiving anyway Matt and Jeff.
The crunchers will wait if need be.

So now ya gotta troubleshoot the recorder at Arecibo again on top of everything else, eh?

Sad song indeed. Best of luck sorting it.

Happy Thanksgiving to you and yours.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 949820 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 949855 - Posted: 26 Nov 2009, 4:11:29 UTC

Matt, that stinks you have to check in and do some kicking on the holidays. Theres nothing I hate worse than having to think about work on a long weekend.

Have a great Thanksgiving.
[/quote]

Old James
ID: 949855 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 949865 - Posted: 26 Nov 2009, 6:37:12 UTC - in response to Message 949811.  
Last modified: 26 Nov 2009, 6:44:00 UTC

So it'll be to me and Jeff to check in over the next few days and kick the pipeline along.

Are you crazy? Go on, have that same extra long weekend that everyone else has - medical, fire and police people excluded - as I am sure people's computers can do without work for a while, while the people being boss of said computers have got plenty of projects to choose from to overcome any workflow problems Seti has.

Have a fine weekend and a Happy Thanksgiving, Matt (and Jeff). :-)
ID: 949865 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34060
Credit: 18,883,157
RAC: 18
Belgium
Message 949867 - Posted: 26 Nov 2009, 6:57:08 UTC

Have a happy Thanksgiving Matt and Jeff:-)
Really glad you keep us so well updated!
ID: 949867 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 949885 - Posted: 26 Nov 2009, 11:01:24 UTC - in response to Message 949865.  

Are you crazy? Go on, have that same extra long weekend that everyone else has - medical, fire and police people excluded -


What? Let 'em have all that time off? Surely Matt and Co. really are our Fire, Police, Ambulance services all rolled into one?

Only kidding - have a great time while all the rest of us stare into blank screens! ;-)

ID: 949885 · Report as offensive
Peterjansson20

Send message
Joined: 12 Oct 00
Posts: 1
Credit: 124,856
RAC: 0
Sweden
Message 949894 - Posted: 26 Nov 2009, 13:23:55 UTC - in response to Message 949811.  

I hope i will worke better for you an
you not lose enny more data

God lyck!
Peter
ID: 949894 · Report as offensive
Profile Saicere

Send message
Joined: 9 Jul 99
Posts: 7
Credit: 6,717,907
RAC: 0
Netherlands
Message 949895 - Posted: 26 Nov 2009, 13:28:08 UTC

You're talking a lot about MySQL problems on that new monster of yours, and in case you weren't aware, I just thought you should know that MySQL scales horribly on a large number of cores. It can barely scale to 8, so there's virtually no chance that you're getting good results on 24.

Sun has an "official" writeup of a scaling attempt here: http://blogs.sun.com/mrbenchmark/entry/scaling_mysql_on_a_256

Their suggestion? Run several instances of MySQL on the same server. Which is a bit meh, but if you insist on running MySQL on a high-performance project like this, it might be worth looking into splitting up the most trafficked tables into separate instances. Then you could also enable the safe commit behavior individually on the critical servers.

You've also mentioned running into replica problems on jocelyn after a primary crash, which is very common. Here's a trick to get around it, at least temporarily.

After the crash, feed jocelyn the following through the MySQL console:

CHANGE MASTER TO MASTER_LOG_FILE=[NEXT FILE], MASTER_LOG_POS=4;
START SLAVE;

Replace [NEXT FILE] with the name of the first binary log file on mork that it started writing after the crash, typically mysql-bin.002342 or something similar. You can get the name by running SHOW MASTER STATUS; on mork. This will skip the corrupted end of the previous binary log, and restart replication from the new file.

Then verify that it's running with a SHOW SLAVE STATUS;

Note that you *cannot be sure* that the state on the replica and the primary is now consistent unless you are using safe commits, however it should still be consistent enough for non-science use.
ID: 949895 · Report as offensive
C

Send message
Joined: 3 Apr 99
Posts: 240
Credit: 7,716,977
RAC: 0
United States
Message 949906 - Posted: 26 Nov 2009, 15:22:29 UTC

Looks as if everything quit completely last night around 2300 PST. Let it go for the day, Matt, and enjoy Thanksgiving Day with friends and family. We'll survive...

C

Join Team MacNN
ID: 949906 · Report as offensive
cncr04s

Send message
Joined: 25 Oct 00
Posts: 6
Credit: 296,024
RAC: 0
United States
Message 950064 - Posted: 27 Nov 2009, 4:35:34 UTC

Seti is always offline it seems, I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. I've been with seti for a long time (9) years, and I'm sad to see so many problems with servers going down lately... can't you guys find some one smart enough to fix stuff? and buy lasting equipment, as my server machine is still top shape after 6 years. I'd be willing to do help seti in my spare time, if only I lived that far west, I'm sure others would too so don't rant to me about "costs" and the lack of funding.
ID: 950064 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 950066 - Posted: 27 Nov 2009, 4:38:06 UTC - in response to Message 950064.  

Seti is always offline it seems, I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. I've been with seti for a long time (9) years, and I'm sad to see so many problems with servers going down lately... can't you guys find some one smart enough to fix stuff? and buy lasting equipment, as my server machine is still top shape after 6 years. I'd be willing to do help seti in my spare time, if only I lived that far west, I'm sure others would too so don't rant to me about "costs" and the lack of funding.

They have no money for anything - the entire budget is from donations at the moment. Servers are mostly donated...


BOINC WIKI
ID: 950066 · Report as offensive
Bearcat

Send message
Joined: 10 Sep 99
Posts: 106
Credit: 10,778,506
RAC: 0
United States
Message 950079 - Posted: 27 Nov 2009, 5:22:07 UTC - in response to Message 950064.  

... ,I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. ...


Same here. I have been with SETI from 1999 to 2004 and received a "we need your help" email a few months ago. Unfortunately I can only donate my computer time, but not money, and it seems (at least to me) that SETI either needs more donations or the existing donations are spent unwisely.

It is certainly none of my business, but the thought crossed my mind, is there info available about the funding situation and where the money goes ? It also seems people are just interested in keeping the computers going to gain credit instead of actually finding ET. Isn't this what's it's all about ? And how are we going to do this without getting new data from Arecibo ? It seems pointless to continue and to waste millions of KW hours

/end of rant + start of apology


ID: 950079 · Report as offensive
Profile GreggyBee
Volunteer tester
Avatar

Send message
Joined: 9 Mar 01
Posts: 203
Credit: 1,600,521
RAC: 0
Message 950133 - Posted: 27 Nov 2009, 10:10:31 UTC - in response to Message 950079.  

... ,I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place. ...


It also seems people are just interested in keeping the computers going to gain credit instead of actually finding ET. Isn't this what's it's all about ?


For some people, yes; what you need to remember is S@H is now only one part of a larger Distributed computing project called Boinc; A lot of people are only interested in the number-crunching game- not the individual merits of any one project. Don't believe me? Just check out the shoutbox on Boincstats: there's no loyalty to individual projects, just credit- chasers boasting about the latest mullti-million credit day- on BOINC. S@H has become a victim of its own pioneering.
ID: 950133 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 950265 - Posted: 27 Nov 2009, 20:16:30 UTC - in response to Message 950064.  

Seti is always offline it seems, I was enticed by emails to come back, but it seems the same thing is happening that made me leave in the first place

Then you didn't understand what was going on then, and you don't understand now.

Setting aside the funding issue, somewhat.

BOINC projects are supposed to do big science on very small budgets. That means that they don't do things like redundant server clusters, multihomed sites and all of the (expensive) things that hide the occasional outage that you might see with Amazon or Space.com.

If you have work in your cache, a server outage is a minor inconvenience. It's interesting to know about, but that's all.

The work isn't time critical, so getting reported later is no big deal.

I wish people would start realizing how well the entire system (client and server) work when none of the individual components are 99.99% reliable.

ID: 950265 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21099
Credit: 7,508,002
RAC: 20
United Kingdom
Message 950307 - Posted: 27 Nov 2009, 23:58:09 UTC - in response to Message 950265.  
Last modified: 28 Nov 2009, 0:00:35 UTC

... The work isn't time critical, so getting reported later is no big deal.

I wish people would start realizing how well the entire system (client and server) work when none of the individual components are 99.99% reliable.

Considering the overall environment, the vast numbers (data AND users) and the project goals, I'm utterly amazed at how successful and how well all of this works in the first place.

Truly a dedication from Matt & Dr A and a few others for the last decade and more!

And all despite some flat-Earth political nit-twit forbidding all funding in case they might find God?!!! (Or some such?...)


Meanwhile, a failed network switch and a few hours downtime over their big festive break is all part of the fun.

The big Boinc crunch continues, uninterrupted.

Who forsake their turkey to go into the lab to fix that one I wonder?!...


... So when do all the Boinc scheduler problems and all the other minor niggles get fixed? ;-)

(Or to rephrase: Any interested volunteers to the rescue?)


It's all part of the experiment and the development!

Happy Thanksgiving crunchin'!
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 950307 · Report as offensive
Odysseus
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 1808
Credit: 6,701,347
RAC: 6
Canada
Message 950435 - Posted: 28 Nov 2009, 9:22:59 UTC - in response to Message 950079.  

ID: 950435 · Report as offensive
Profile Wingless Wonder

Send message
Joined: 14 May 99
Posts: 14
Credit: 12,157,146
RAC: 0
United States
Message 950447 - Posted: 28 Nov 2009, 10:37:37 UTC

In response to some less-than-positive comments about the state of SETI@home, its servers, not being able to obtain work, etc., I myself volunteer use of my computer because I feel like it. No one is twisting my arm to do so. I maintain a two-day work cache in anticipation of the occasional outages, along with participation in other BOINC projects. I rarely run out of work. If I did, no big deal.

I don't feel that the folks at SETI@home owe me some sort of debt of gratitude. On the contrary, I feel privileged to be able to participate in any of the BOINC projects.
ID: 950447 · Report as offensive
Profile Robert Waite
Avatar

Send message
Joined: 23 Oct 07
Posts: 2417
Credit: 18,192,122
RAC: 59
Canada
Message 950829 - Posted: 29 Nov 2009, 18:27:17 UTC

No worries
A signal may travel for thousands of years to reach Earth.
No need to squack about a few days of down time.

ID: 950829 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 952264 - Posted: 4 Dec 2009, 9:41:01 UTC


eh Matt - a big hug for you & your updates
from joanne & i

BOINC Wiki . . .

Science Status Page . . .
ID: 952264 · Report as offensive

Message boards : Technical News : Sad Trombone (Nov 25 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.