Why is BOINC so finicky?

Message boards : Number crunching : Why is BOINC so finicky?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Yogurtron

Send message
Joined: 31 Jul 03
Posts: 1
Credit: 29,910
RAC: 0
United States
Message 33164 - Posted: 6 Oct 2004, 7:24:18 UTC

First of all, I want to say, this is not a flame thread, and I know the people at SETI and the other BOINC projects are doing their best.

But I am confused, how come most boinc projects (SETI, LHC, and Predictor (the ones I noticed and refer to with the problem so far)) have so many problems? I mean, I know that BOINC is still in some... semi-beta form, but most of these downtimes have description saying that they are replacing actual hardware. I mean, is how hard is BOINC to process on the server side that causes it to actually overload hardware?

The software problems I can probably chalk up to it still being in some form of semi-beta... but I mean, why are there problems so often? I dunno, it just seems weird how often these problems arise.

But yeah, I just wanted clarification to answer my curiosity of just why BOINC has so frequent problems.

PS. Still, keep up the good works guys (and that isn't sarcastic)
ID: 33164 · Report as offensive
Profile Toby
Volunteer tester
Avatar

Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 6,366,949
RAC: 0
United States
Message 33178 - Posted: 6 Oct 2004, 9:21:09 UTC

A good question actually. Since seti has so many more users than the other projects, it would be easy to say 'so many users just clog the system'. But I don't think this is the correct answer. First of all, as Matt posted in the "database replica thread" they don't have all that big of a budget. This means they can't get the super enterprise level stuff that might help. They are using a software RAID for crying out loud... I'm not sure what kind of money the other projects have. With predictor, it isn't so much hardware problems as vendor problems. Dell seems to have 'lost' their server plus they are still upgrading to BOINC v4 and improving their science client applications which are software issues. With LHC I suspect they just didn't have the hardware to support the latest rush. They had a bunch of work units that finished really quickly (some withinn mere seconds) and this obviously put a huge strain on the hardware for a project which has just come out of beta. They probably should have upgraded their hardware before coming out of beta but they are Swiss so we will forgive this error because they make excellent chocolate :) Right now they are simply out of work units. Everything they wanted to have processed has been processed. Until they analyze the data, they can't generate new work units. It has been stated by their admins that this may frequently be the case with their project.

CPDN on the other hand has HUGE work units (each one takes 3 weeks or more on a good CPU) so their database/bandwidth demands are much less and they have had much better uptime because of it.

Those excuses being made, I think there are a few design issues with the server side part of BOINC. I haven't looked at the code very much but from what I can tell, EVERYTHING runs off of one central database. Most of the science stuff probably has to but one area that I think should be split off is the message boards. LHC implied in one of their posts that the message boards actually put a fair amount of stress on the database. If the message boards were a completely seperate database, the load would be reduced but (possibly more importantly) when the project went down or got overloaded, users could still get to the message boards and see what is going on.

There is my 2 cents worth - from a recently graduated software engineer with virtually no practical experience. If you think I'm smart and want to hire me, get in touch! If you think I'm stupid, don't tell anyone else! :)


---------------------------------------
- A member of The Knights Who Say NI!
Possibly the best stats site in the universe:
http://boinc-kwsn.no-ip.info
ID: 33178 · Report as offensive
Profile geckomind
Avatar

Send message
Joined: 27 Nov 03
Posts: 5
Credit: 5,063
RAC: 0
Germany
Message 33191 - Posted: 6 Oct 2004, 10:09:14 UTC
Last modified: 6 Oct 2004, 10:18:25 UTC

Hi there!

Interesting analysis... Sounds quite logical to me. Why the heck are they running the boards under the same server/database anyway? Wouldn't it be easier to set up a seperate thingy with phpbb or vbulletin out o' the box?

I mean, I don't now stuff. Just strikes me, that's all...


Greetings from the University of Bonn, Germany!
ID: 33191 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 33196 - Posted: 6 Oct 2004, 10:43:08 UTC - in response to Message 33191.  

> Hi there!
>
> Interesting analysis... Sounds quite logical to me. Why the heck are they
> running the boards under the same server/database anyway? Wouldn't it be
> easier to set up a seperate thingy with phpbb or vbulletin out o' the box?
>
> I mean, I don't now stuff. Just strikes me, that's all...

Well, not ALL of it has to be out of one database. And they may be doing some more partitioning later. But just one simple example, your work needs your account data ... so the the web site for the "Your Account" page ...

Now, they have done some more moving of things around we can hope for better responses ... LONG term, we may even see multiple databases set up with replication so that the web site can be hosted completely separately from the remainder of the site, with changes to the account data being replicated forward to the science database that the BOINC Work Manager connects to for accumulation of the results.
<p>
For BOINC Documentation: Click Me!


ID: 33196 · Report as offensive
Profile geckomind
Avatar

Send message
Joined: 27 Nov 03
Posts: 5
Credit: 5,063
RAC: 0
Germany
Message 33197 - Posted: 6 Oct 2004, 10:48:38 UTC

Yeah.... That sounds about right. It's just that I'm used to web projects where the forum-data is totally seperate of the other stuff. OK, you have to sign up twice and stuff but it makes the actual project you doing much more stable.

My only experience is with MySQL stuff for php-applications and even there iz is sometimes a good idea to seperate things...

Cheers!
It's dog eat dog, rat eat rat

Kroc-style - Boom, like that


GeckoMind.net
ID: 33197 · Report as offensive
Scott Brown

Send message
Joined: 5 Sep 00
Posts: 110
Credit: 59,739
RAC: 0
United States
Message 33224 - Posted: 6 Oct 2004, 13:26:59 UTC

@Toby

A nice clear post, but...

"Since seti has so many more users than the other projects, it would be easy to say 'so many users just clog the system'."

technically it is not the number of users but the number of hosts that are the basis of the 'clog the system' argument.

ID: 33224 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 33236 - Posted: 6 Oct 2004, 14:13:43 UTC - in response to Message 33224.  

> @Toby
>
> A nice clear post, but...
>
> "Since seti has so many more users than the other projects, it would be easy
> to say 'so many users just clog the system'."
>
> technically it is not the number of users but the number of hosts that are the
> basis of the 'clog the system' argument.

Actually, it is the number of users, who have a number of hosts, that all return an even greater number of results ...
<p>
For BOINC Documentation: Click Me!


ID: 33236 · Report as offensive
Profile Troy_ND

Send message
Joined: 23 Aug 02
Posts: 8
Credit: 1,879,844
RAC: 0
United States
Message 33237 - Posted: 6 Oct 2004, 14:18:19 UTC
Last modified: 6 Oct 2004, 14:19:29 UTC

The way I imagine how the Seti/BOINC project is and possible other BOINC projects: Imagine trying to build a huge 747 passenger plane with only a $5000(USD) yearly budget. Eventually you'll be able to get the plane built how you want it to be, but it takes time.

Now I know I'm probably exagrating a little and the $5000/year is simply a number I made up, but it helps explain why thing might not always go how the project teams would like them to go.

I'm sure alot of the problems they've been having here would just go away if they could get a huge quad Xeon(HT)processor enterprise server with like 2-4Gb of RAM, along with nice large and very fast 15K SCSI drives in a hardware array. But it's tough to do if you don't have the budget for it. Coming from a IT background, I figure if the Seti project team is successfully able to run a database server with all the connections the Seti/Boinc has on a software array and probably not a very highend server, then WAY TO GO, Seti/Boinc Team!!! :)

Troy
ID: 33237 · Report as offensive
Scott Brown

Send message
Joined: 5 Sep 00
Posts: 110
Credit: 59,739
RAC: 0
United States
Message 33262 - Posted: 6 Oct 2004, 15:57:29 UTC - in response to Message 33236.  

> > @Toby
> >
> > A nice clear post, but...
> >
> > "Since seti has so many more users than the other projects, it would be
> easy
> > to say 'so many users just clog the system'."
> >
> > technically it is not the number of users but the number of hosts that
> are the
> > basis of the 'clog the system' argument.
>
> Actually, it is the number of users, who have a number of hosts, that all
> return an even greater number of results ...

Paul,

Nope...the number of users is irrelevant. For example, a single user with 1,000 hosts would provide roughly the same load as 1,000 users with 1 host each. Number of users is not even relevant for a minimum host count (i.e., at least one host per user) since one can sign up for an account and never actually crunch anything.

ID: 33262 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 33283 - Posted: 6 Oct 2004, 16:53:01 UTC

I'll tell you why BOINC is so finicky:

Not enough hardware, not enough staff, not enough air conditioning.

This isn't a complaint, or a cry for help/sympathy. This is an obvious artifact of us here at the lab trying to run classic SETI@home (which has older, but much better hardware as it has to handle currently half a million active users), while trying to start up a whole other SETI@home on far less hardware.

We were hoping the rise of BOINC and the "fall" of classic SETI@home would be much smoother, in that BOINC could take over the nicer hardware as classic SETI@home wound down and needed it less. Not exactly happening that way.

As well, we can't just add hardware for several reasons. One - we don't have the money to obtain the hardware. Two - we barely have the time to set up and integrate the hardware we already have. And Three - above all else, we only have one server closet, and it is completely maxed out as far as power, space, and
air conditioning.

So.. as users slowly ramp up on BOINC (which is a good thing), we can only adjust so much to handle each new crisis.

Okay.. I better get up to the lab for at least a few hours today to start the project back up. Then I'm off to a couple gigs where I'll make a lot more money (another sign of BOINC budgetary constraints - I make more as a musician than a systems administrator, so I gotta take the music gigs when they come up).

- Matt
BOINC/SETI@home
ID: 33283 · Report as offensive
Profile John Cropper
Avatar

Send message
Joined: 3 May 00
Posts: 444
Credit: 416,933
RAC: 0
United States
Message 33291 - Posted: 6 Oct 2004, 17:15:59 UTC - in response to Message 33283.  

> I'll tell you why BOINC is so finicky:
>
> Not enough hardware, not enough staff, not enough air conditioning.

Hardware: See if you can get some money from "Uncle Arnold"...yeah, right!

>
> This isn't a complaint, or a cry for help/sympathy. This is an obvious
> artifact of us here at the lab trying to run classic SETI@home (which has
> older, but much better hardware as it has to handle currently half a million
> active users), while trying to start up a whole other SETI@home on far less
> hardware.
>
> We were hoping the rise of BOINC and the "fall" of classic SETI@home would be
> much smoother, in that BOINC could take over the nicer hardware as classic
> SETI@home wound down and needed it less. Not exactly happening that way.
>

In the world of technology, things seldom perform as designed. Perhaps pushing the issue to the client side with a more aggressive cutover should be considered.

> As well, we can't just add hardware for several reasons. One - we don't have
> the money to obtain the hardware. Two - we barely have the time to set up and
> integrate the hardware we already have. And Three - above all else, we only
> have one server closet, and it is completely maxed out as far as power, space,
> and
> air conditioning.

Power: Put more gerbils on the wheel. ;o)
Space: Can't you get rid of the cot and just put a sleeping bag on the roof? :o)
AC: Add a wall vent and get [insert opposing political party here] to suck the air from the room.

>
> So.. as users slowly ramp up on BOINC (which is a good thing), we can only
> adjust so much to handle each new crisis.
>

It's a shame you don't have a FULLY distributed model that would allow CLIENTS to perform some of the tasks (splitting, for instance) that are needed to support the project when key components take a dirt nap. (Yeah, I know EVERYTHING can't be done outside, mainly for data integrity and security purposes). Such a model would allow the project to become more self-supporting, especially if the client could switch back and forth between tasks based on project needs.


ID: 33291 · Report as offensive
HachPi
Avatar

Send message
Joined: 2 Aug 99
Posts: 481
Credit: 21,807,425
RAC: 21
Belgium
Message 33299 - Posted: 6 Oct 2004, 17:38:03 UTC - in response to Message 33283.  
Last modified: 6 Oct 2004, 17:39:53 UTC

Matt :

1. BOINC is in my humble opinion NOT FINICKY!!!
2. Most of the REASONABLE people who have some science experience at university labs KNOW how difficult it is to run projects on low budget costs.
3. We KNOW the TEAM are doing their UTMOST, none of us could ask for more, we are VERY PROUD to have such a bunch of guys on the job.
4. The only criticism I did have was concerning the PR in the first period (NOT in the last two or three weeks - info is important even for us to get going).

Greetings from Belgium, we are HERE TO STAY AND TO SUPPORT!!!

>




> I'll tell you why BOINC is so finicky:
>
> Not enough hardware, not enough staff, not enough air conditioning.
>
> This isn't a complaint, or a cry for help/sympathy. This is an obvious
> artifact of us here at the lab trying to run classic SETI@home (which has
> older, but much better hardware as it has to handle currently half a million
> active users), while trying to start up a whole other SETI@home on far less
> hardware.
>
> We were hoping the rise of BOINC and the "fall" of classic SETI@home would be
> much smoother, in that BOINC could take over the nicer hardware as classic
> SETI@home wound down and needed it less. Not exactly happening that way.
>
> As well, we can't just add hardware for several reasons. One - we don't have
> the money to obtain the hardware. Two - we barely have the time to set up and
> integrate the hardware we already have. And Three - above all else, we only
> have one server closet, and it is completely maxed out as far as power, space,
> and
> air conditioning.
>
> So.. as users slowly ramp up on BOINC (which is a good thing), we can only
> adjust so much to handle each new crisis.
>
> Okay.. I better get up to the lab for at least a few hours today to start the
> project back up. Then I'm off to a couple gigs where I'll make a lot more
> money (another sign of BOINC budgetary constraints - I make more as a musician
> than a systems administrator, so I gotta take the music gigs when they come
> up).
>
> - Matt
> BOINC/SETI@home
ID: 33299 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 33357 - Posted: 6 Oct 2004, 20:49:35 UTC - in response to Message 33262.  

> Nope...the number of users is irrelevant. For example, a single user with
> 1,000 hosts would provide roughly the same load as 1,000 users with 1 host
> each. Number of users is not even relevant for a minimum host count (i.e., at
> least one host per user) since one can sign up for an account and never
> actually crunch anything.

I was not cleat (not that it matters), I should have been more specific; as in:

Number of users times the number of hosts times the number of results, or:

U * H * R = Collapse ...

<p>
For BOINC Documentation: Click Me!


ID: 33357 · Report as offensive
haddock29

Send message
Joined: 18 Sep 99
Posts: 36
Credit: 26,012,417
RAC: 0
France
Message 33371 - Posted: 6 Oct 2004, 21:37:32 UTC - in response to Message 33164.  

> First of all, I want to say, this is not a flame thread, and I know the people
> at SETI and the other BOINC projects are doing their best.
>
> But I am confused, how come most boinc projects (SETI, LHC, and Predictor (the
> ones I noticed and refer to with the problem so far)) have so many problems?
> I mean, I know that BOINC is still in some... semi-beta form, but most of
> these downtimes have description saying that they are replacing actual
> hardware. I mean, is how hard is BOINC to process on the server side that
> causes it to actually overload hardware?
>
> The software problems I can probably chalk up to it still being in some form
> of semi-beta... but I mean, why are there problems so often? I dunno, it just
> seems weird how often these problems arise.
>
> But yeah, I just wanted clarification to answer my curiosity of just why BOINC
> has so frequent problems.
>
> PS. Still, keep up the good works guys (and that isn't sarcastic)
>

The question may be: "Why is Seti classic so efficient, and Seti boinc so dificult to start?".
Boinc seems to be a good environment for cpdn (limited number of very large WU),may be it is not a good solution for seti.


ID: 33371 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 33402 - Posted: 6 Oct 2004, 22:23:00 UTC
Last modified: 6 Oct 2004, 22:38:10 UTC

[sarcasm mode on]
It's the old "University teached theory meets the cold hard practice".
[sarcasm mode off]

Nearly nothing i learned as the theoretically best approach at the university survived the practical test in real live conditions. You always have to adapt the theoretic plan to the real live situations. Maybe it's time to think over the much too complicated and database trashing validating mechanisms. Throwing only hardware at it will only delay the problems to a later phase. Ever heard of the "KISS principle"? It's the Keep It Small & Simple approach, working often the best. But that's, of course, not academical ;)



greetz, Uli
ID: 33402 · Report as offensive
Profile Papa Zito
Avatar

Send message
Joined: 7 Feb 03
Posts: 257
Credit: 624,881
RAC: 0
United States
Message 33464 - Posted: 7 Oct 2004, 2:23:09 UTC

I thought KISS was Keep It Simple, Stupid.

Ah well.

I have a solution to your space and air conditioning problems. Move the SETI computer stuff to Alaska, and store them in a large shack.

See, I contribute...




------------------------------------


The game High/Low is played by tossing two nuclear warheads into the air. The one whose bomb explodes higher wins. This game is usually played by people of low intelligence, hence the name High/Low.
ID: 33464 · Report as offensive
Profile Toby
Volunteer tester
Avatar

Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 6,366,949
RAC: 0
United States
Message 33488 - Posted: 7 Oct 2004, 3:19:13 UTC - in response to Message 33464.  

> I thought KISS was Keep It Simple, Stupid.

Me too

> I have a solution to your space and air conditioning problems. Move the SETI
> computer stuff to Alaska, and store them in a large shack.

Oh... They finally got internet up there? Or were you proposing using IP over avian carrier? Or maybe Elk?


---------------------------------------
- A member of The Knights Who Say NI!
Possibly the best stats site in the universe:
http://boinc-kwsn.no-ip.info
ID: 33488 · Report as offensive
JAF
Avatar

Send message
Joined: 9 Aug 00
Posts: 289
Credit: 168,721
RAC: 0
United States
Message 33498 - Posted: 7 Oct 2004, 4:08:28 UTC

OK, I'm going to ramble a bit. I see the key, major, and regular sponsors mentioned on the main page. I'm a little surprised the hardware needed to support Boinc Seti isn't setting on shelves at some of those sponsors. It would seem be a good public relations donation to get this project up to speed where it should be.

I also see a real lack of sponsorship by some companies that profit the most: (in no particular order) Intel, AMD, and Microsoft. I know these three companies make a nice profit from Seti - just look at the "top computers" list.

Just my opinion, but i think it is accurate.
ID: 33498 · Report as offensive
Profile Papa Zito
Avatar

Send message
Joined: 7 Feb 03
Posts: 257
Credit: 624,881
RAC: 0
United States
Message 33517 - Posted: 7 Oct 2004, 5:14:58 UTC - in response to Message 33488.  

> > I thought KISS was Keep It Simple, Stupid.
>
> Me too
>
> > I have a solution to your space and air conditioning problems. Move the
> SETI
> > computer stuff to Alaska, and store them in a large shack.
>
> Oh... They finally got internet up there? Or were you proposing using IP
> over avian carrier? Or maybe Elk?

I'm surprised at you. A member of the KWSN should see the obvious solution.

Our data comes and goes in packets, right? What better way to encapsulate a packet than... a coconut?

Migratory coconuts, my friend. And for faster processing, we might want to employ some swallows.




------------------------------------


The game High/Low is played by tossing two nuclear warheads into the air. The one whose bomb explodes higher wins. This game is usually played by people of low intelligence, hence the name High/Low.
ID: 33517 · Report as offensive
Profile Stephen Balch
Avatar

Send message
Joined: 20 Apr 00
Posts: 141
Credit: 13,912
RAC: 0
United States
Message 33535 - Posted: 7 Oct 2004, 6:33:44 UTC - in response to Message 33517.  


>
> Migratory coconuts, my friend. And for faster processing, we might want to
> employ some swallows.
>
An African or European swallow?

<P>"I want to go dancing on the moon, I want to frolic in zero gravity!....", and now, I might be able to go someday! Thanks, SpaceShipOne and crew!<BR><a> [/url]
ID: 33535 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Why is BOINC so finicky?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.