Why classic SETI@home is closing down and other facts of life.


log in

Advanced search

Message boards : Number crunching : Why classic SETI@home is closing down and other facts of life.

1 · 2 · 3 · 4 . . . 15 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1384
Credit: 74,079
RAC: 0
United States
Message 209002 - Posted: 10 Dec 2005, 8:57:02 UTC

I keep seeing in these forum threads various valid complaints. Here are some reasons off the top of my head that SETI@home Classic is shutting down, and other facts of life that may very well address your particular concern:

1. SETI@home Classic has no funding at this point. Hasn't had it for years now. Costs about $500,000/year at a minimum to run the thing. SETI@home as we know it now is coasting on fumes until (hopefully) more funding somehow appears. BOINC has funding. Therefore, putting SETI@home on BOINC has given it at least some life in the past two years, and is really the only chance for any kind of future.

2. SETI@home Classic was supposed to be a 3 or 4 year project to begin with. So it's well past it's proposed lifespan with no money added to keep it going.

3. The science in SETI@home Classic is basically over. We collected more than enough data with the current instrument. We have a new data recorder close to finished and a new BOINC client will be in the works to analyze this data. To keep Classic going would mean compiling a new Classic client to analyze this data. It's been a loooong time since a new Classic client has been built. The code is stale, and the build machines are ancient and painful to use (if they even exist anymore).

4. The SETI@home Classic backend is a tangled mess. There have been many problems over the years, most of which were invisible to the participants. None of these problems were fatal to the project or its science, but have resulted in an obnoxious web of ridiculous dependencies, confusing configurations, and unweildy databases. I am practically drooling dreaming of day when we get to turn all that stuff off and be done with it already. The BOINC backend is sooooo much easier to deal with.

5. The current crises are just par for the course in the history of our SETI project and public resource computing. Things break, deep breaths are taken, and they get fixed eventually. This isn't good for making our participants happy of course, but we do our best with what we got and so far our user base has stuck with us through the painful periods (thank you!!!).

6. BOINC was written so you can connect to other projects when there are server issues with a specific project. This is a good thing. SETI@home Classic has no such ability.

7. BOINC credit, while not perfect (though we're working on that), is much more fair in that it represents actual work done, and is valid between projects which do all kinds of different work. There is no way to translate Classic credit to BOINC credit, and so this will never happen. Classic credits will be noted in a separate field in a user profile (and will be eventually sync'ed up again after Classic shuts down).

8. Though I don't have any accurate numbers to back this up, I personally feel that so far SETI@home/BOINC has had more uptime on average than Classic. We had science database crashes every week at times back in the day, several whole weeks when we were down for database recovery or because somebody stole a cable, network bandwidth issues that brought us down for months. And these were just the public-facing downtime events (among but a few).

9. There is no tech support on staff. I end up with dozens of e-mails a day from people who figured out how to reach me. If I dealt with all these, that would occupy about 15-20% of my time. I don't have this time and neither does anybody else around here. Many of these e-mails go unaswered. Sad but true, and I personally find this painful but part of the big picture.

10. Yes, we can do better in the PR department. See #9 above. Don't have the staff or the money to add the staff. And it's not so easy to add news items to the page. I can't be bothered to go into detail why. I leave this as an exercise for the reader to figure out why.

11. The staff is small. Me and Jeff are continually up to our necks dealing with everything. We've both been here working on SETI long before SETI@home came around, so we both are well versed in every aspect of the "big picture" around here. Bob, the main database guy, actually only works half time. Court is busy dealing with various long term network/systems projects that Jeff and I can't handle since we're diagnosing, debugging, programming, or maybe actually getting science done. David and Rom (and other various programmers) strictly work on BOINC code. Eric works overtime on other non-SETI projects when he's not building the next SETI@home client. Dan, the project director, is spending a lot of time building spectrometers for other projects because that's where the money is. Outside of current academics (Kevin and Josh) working on other applications of SETI data, and students helping Dan build hardware that's it here at the lab. No administrative staff, no tech writers. When it comes time to fill out a new grant proposal, we all drop everything and work on that, for example.

12. If anybody complains elsewhere about any of the above, please be kind and point them to this post. People have the right to be upset with us since they are kindly donating their resources to us. However, there is a lot of misinformation or misunderstanding about this project and I hope I cleared some of it up.

Now I'm off to bed. Going to LA tomorrow. Be back Sunday night.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8219
Credit: 21,836,461
RAC: 11,536
United Kingdom
Message 209006 - Posted: 10 Dec 2005, 9:08:20 UTC

Matt,

Thank you, we now have somewhere to direct the nay-sayers too.
____________
Only two things are infinite: the universe and human stupidity, and I am not sure about the former. - Albert Einstein

Profile MikeSW17
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 209016 - Posted: 10 Dec 2005, 9:28:55 UTC

Well done Matt. Thank you very much for so concisely explaining the big picture - It's a real help to see all the issues together.

____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,566,345
RAC: 44,094
Australia
Message 209070 - Posted: 10 Dec 2005, 11:24:39 UTC


Would it be possible for someone to make this a sticky thread & put it right at the top of the list?
____________
Grant
Darwin NT.

Profile Thierry Van Driessche
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3083
Credit: 140,740
RAC: 0
Belgium
Message 209073 - Posted: 10 Dec 2005, 11:27:14 UTC - in response to Message 209070.


Would it be possible for someone to make this a sticky thread & put it right at the top of the list?

Very good suggestion Grant. Done.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,566,345
RAC: 44,094
Australia
Message 209075 - Posted: 10 Dec 2005, 11:28:15 UTC - in response to Message 209073.

Very good suggestion Grant. Done.

Thanks.

Now all we need is people to actually read it...
*fingers crossed*
____________
Grant
Darwin NT.

Profile CoolBlue87GT
Avatar
Send message
Joined: 27 Dec 03
Posts: 54
Credit: 49,337
RAC: 0
United States
Message 209096 - Posted: 10 Dec 2005, 12:16:12 UTC

Hi Matt.

I'm a former database programmer, that knows about endless hours of thankless work.

Thank you for the updated information. Good idea posting it.

I know the problems will be fixed in time, thats fine.

Thanks again.
____________

Steve MacKenzie
Volunteer tester
Send message
Joined: 2 Jan 00
Posts: 142
Credit: 3,343,946
RAC: 1,594
United States
Message 209109 - Posted: 10 Dec 2005, 12:46:13 UTC

Does anyone know if the donations made by a lot of the crunchers has helped.
Well of course they helped. But are these donations helping to get Matt and everyone what they need going forward.

How has the donations stream been going ? Was it just a quick burst of $ ? Or is there a stream running. It might be nice if there were a donations box on the home page letting us know how it's going - A goal chart or something.

If I had a vote, I'd have the old timers Matt & Jeff decide where "They" think the donations should be spent and et the most bang for the buck.

Steve.
____________

Profile m.mitch
Volunteer tester
Avatar
Send message
Joined: 27 Jun 01
Posts: 337
Credit: 79,856
RAC: 0
Australia
Message 209173 - Posted: 10 Dec 2005, 14:18:33 UTC - in response to Message 209002.

I keep seeing in these forum threads various valid complaints. Here are some reasons off the top of my head that SETI@home Classic is shutting down, and other facts of life that may very well address your particular concern:
........ [big snip] .......
12. If anybody complains elsewhere about any of the above, please be kind and point them to this post. People have the right to be upset with us since they are kindly donating their resources to us. However, there is a lot of misinformation or misunderstanding about this project and I hope I cleared some of it up.

Now I'm off to bed. Going to LA tomorrow. Be back Sunday night.

- Matt


Have good gig 8-)

Thanks Matt

Mike
____________


Click here to join the #1 Aussie Alliance in SETI

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 7954
Credit: 4,013,973
RAC: 865
United Kingdom
Message 209279 - Posted: 10 Dec 2005, 16:03:20 UTC

Matt,

Thanks for a good summary.

Sounds exactly like the University Academia that I've seen in various labs at various times, even for big "Flagship" cutting edge projects. Hang in there!


Have a good fun gig and enjoy the life,

(We'll try keeping the few noisy ranters simmering quietly :-) )

Cheers,
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 45,007,582
RAC: 13,691
United Kingdom
Message 209283 - Posted: 10 Dec 2005, 16:11:38 UTC

May I add two pennyworth from the UK?

Please correct what follows if it needs it, but gently - I'm a newbie here, part of the influx from Classic (and hence part of the problem - sorry!)

I've just analysed my BOINC message log for the last 21 hours, and in that time my machine has finished 4 Seti WUs and attempted 240 Seti uploads - that's on average 12 attempted uploads for each of the 20 previous WUs still stuck in the queue.

Collectively, we're mounting a classic (sorry, no pun intended) denial-of-service attack on poor old Kryten!

No wonder something in the Cogent / router / kryten nexus is throwing a hissy fit.

If we, the users, carry on like this, the problem will continue to grow almost exponentially for the next ten days, until the last WU in the longest (legal) cache has joined the clamour.

So, in the short term, wouldn't it be better if everyone voluntarily chose to 'suspend network activity' for the time being? This applies especially to the geeks with the largest cache / longest upload queue. That way, Matt could come back from LA tomorrow to a cool, quiet server closet....

In the long term, this translates to a BOINC problem, not a Seti problem (contrary to what some people have been asserting in the BOINC fora). If the BOINC upload algorithm could be modified so that only one WU per project could be in the 'uploading' or 'retry [upload] in hh:mm:ss' states, and others could go into a new 'queued' state until the first one finishes, then a minor or temporary glitch wouldn't grow out of control like this one appears to have done.

Hope that helps.

Richard Haselgrove

Angie
Send message
Joined: 2 Jun 04
Posts: 3
Credit: 92,756
RAC: 0
United States
Message 209311 - Posted: 10 Dec 2005, 16:41:18 UTC

What makes this whole thing extremely frustrating to where I'd like to use language here that I won't is that I wanted to hit 20k seti classic units before they closed on the 15th. I'm at 19,071 and haven't been able to process any of those or any boinc units since around December 1st? I think I'll just stick with crunching numbers for united devices til sometime next year.
____________

Profile kinhull
Volunteer tester
Avatar
Send message
Joined: 3 Oct 03
Posts: 1029
Credit: 636,475
RAC: 0
United Kingdom
Message 209319 - Posted: 10 Dec 2005, 16:50:57 UTC - in response to Message 209311.

What makes this whole thing extremely frustrating to where I'd like to use language here that I won't is that I wanted to hit 20k seti classic units before they closed on the 15th. I'm at 19,071 and haven't been able to process any of those or any boinc units since around December 1st? I think I'll just stick with crunching numbers for united devices til sometime next year.


No matter when they close down Classic, someone will miss achieving a personal goal/target.
____________
Join TeamACC

Sometimes I think we are alone in the universe, and sometimes I think we are not. In either case the idea is quite staggering.

Profile John Clark
Volunteer tester
Avatar
Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 209327 - Posted: 10 Dec 2005, 16:57:06 UTC - in response to Message 209283.
Last modified: 10 Dec 2005, 16:58:01 UTC

May I add two pennyworth from the UK?

So, in the short term, wouldn't it be better if everyone voluntarily chose to 'suspend network activity' for the time being?


UK user or not, I agree ;-))

This was suggested before in other threads, but many of the BOINC Community run multiple projects. So, disabling BOINC's network access will cut off all BOINC subscribed projects.

As a BOINC user subscribing to SETI only I have suspended my network access as suggested. Hopefully this small contribution will assist remove part of an outline DOS attack.

Keep on crunching as well!!
____________
It's good to be back amongst friends and colleagues



Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 7954
Credit: 4,013,973
RAC: 865
United Kingdom
Message 209368 - Posted: 10 Dec 2005, 17:36:30 UTC - in response to Message 209283.
Last modified: 10 Dec 2005, 17:39:08 UTC

... Collectively, we're mounting a classic (sorry, no pun intended) denial-of-service attack on poor old Kryten!

Kryten has a good ole big head on it to take it!!

There's also "exponential backoff" code in the Boinc clients so that failed connection attempts back-off an ever longer time period to avoid pestering Kryten with failed attempts too often.

Its all in the code and in some ways is part of the experiment and development.

So, in the short term, wouldn't it be better if everyone voluntarily chose to 'suspend network activity' for the time being? This applies especially to the geeks with the largest cache / longest upload queue. That way, Matt could come back from LA tomorrow to a cool, quiet server closet....

You could do that but considering percentages, I think it would make little difference.

Note that Boinc is designed to be resiliant for situations such as this. If you have a second project to crunch for, then let Boinc crunch that for a while. If Boinc stalls on s@h due to lack of WUs, then "long Term Debt" will build up and s@h will catch up when the WUs flow oncemore.

Berkeley are very aware of the present problems and the biggest problem is user fear and misunderstanding. Let Boinc do its stuff and watch the fun. All will clear when the bottleneck is cleared at Berkeley, and they've extended the deadlines so that no credit and no WUs should be lost. A few WUs (or few thousand WUs) results queued up on your machine awaiting return is no problem. They'll clear ok when Berkeley can accept them.

(And real Geeks that understand this system run with small cache sizes ;-) )


Leave Boinc to do its stuff as programmed. All will clear fine long before there is any real breakage.

Hope that helps,

Happy crunchin',
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 11,472,330
RAC: 5,301
United States
Message 209376 - Posted: 10 Dec 2005, 17:43:44 UTC - in response to Message 209327.

Instead of suspending all network activity, when you have multiple projects on a computer, you can suspend SETI. While this won't shut off timed unit upload attempts, it will stop the number of uploads from increasing and will stop any download requests as it stops the 'need more work' downloads.

The other thing I've done as well is reduce the resource share for SETI compared to the two other projects I have set up (Climate and Einstein). Those projects have significantly less downtime -- I suspect for two reasons -- one the much smaller user base (less than 1/4 to 1/5 of the computers supported), two, I suspect they have a better funding base relative to the load they support. Also, with Climate, they implemented a client approach which reduces the load as they have long range 'units' (measured hours per unit being something like 30 to 40 times a current SETI unit), and have a staged upload 'trickle' of interim status -- that reduces the download activity a lot. The Einstein units are about double the size of the current SETI unit, so I suspect that also helps that project.

The only thing I'd take exception to in Matt's EXCELLENT summary, is relative downtime -- at least over the past 6 months, and probably since SETI BOINC started. But that is something of an 'apples/oranges' comparison -- perhaps Matt was comparing the first 18 months of SETI classic to the first 18 months of SETI BOINC. But users make the apples/oranges comparison because that is what they 'see' over the last 18 months.

The sad thing here is the current problems with SETI BOINC are resulting in 'opt out' for SETI Classic expatriates, instead of moving over to SETI BOINC (AND the other worthy projects as well). So in a sense the current project problem is reducing ramp up effects over in the other projects as SETI classic expatriates are not all that likely to simply jump from Classic to Einstein or Climate -- they try over hear, see a currently non-functioning environment (at least for the past week) and go away.




This was suggested before in other threads, but many of the BOINC Community run multiple projects. So, disabling BOINC's network access will cut off all BOINC subscribed projects.



____________

Ingleside
Volunteer developer
Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 3,575,865
RAC: 26
Norway
Message 209381 - Posted: 10 Dec 2005, 17:48:12 UTC - in response to Message 209283.

Please correct what follows if it needs it, but gently - I'm a newbie here, part of the influx from Classic (and hence part of the problem - sorry!)

...

In the long term, this translates to a BOINC problem, not a Seti problem (contrary to what some people have been asserting in the BOINC fora). If the BOINC upload algorithm could be modified so that only one WU per project could be in the 'uploading' or 'retry [upload] in hh:mm:ss' states, and others could go into a new 'queued' state until the first one finishes, then a minor or temporary glitch wouldn't grow out of control like this one appears to have done.


Welcome. :)


As Martin has already mentioned, BOINC already includes "exponential backoff" to spread connection-attempts over longer time-interwall.

To further improve on this, new code with backing-off all files in a project if problems to upload/download was added to BOINC core client 30.09.2005, but during alpha-testing various problems surfaced so it was temporarily removed again in v5.2.2.

When the problems is ironed-out, expects the fixed code should be added back in, but it's likely not released before in v5.4.x.

Jack Gulley
Send message
Joined: 4 Mar 03
Posts: 423
Credit: 526,566
RAC: 0
United States
Message 209385 - Posted: 10 Dec 2005, 17:53:32 UTC

Actually, suspending network activity is not necessary. BOINC by design avoids creating its own DDS in times like this. Each client backs off a random amount of time before trying to send each result after a failure to connect. Even Seti Classic backed off for an hour. (Unlike some of the the add-on software.) But the requests have always been spread out.

As far as network (Internet) traffic goes, from the looks of the graphs, Seti Classic's 20Mbps share of the load has dropped down to about 15Mbps during the past week. So it looks like about 25% of the active Classic crunchers have already been shut down. But that has never been part of BOINC's problem.

The current problems with the BOINC/Seti upload/download servers is not a network (Internet) bandwidth problem. From Ethereal traces my client has had no trouble quickly establishing a connection, every time. And as I see it, that may be part of the current problem. The server may be trying to handle to many connections at the same time. Only in the past day have I seen the server start to refuse the initial connection attempt and tell it to go away, which the client does not seem to handle well because it still takes a long time to time out.

And as has been pointed out, if everyone backs off and things start working, how will they know if the next change corrects the problem(s). Right now they have a very good stress test of their system going.

As I understand the recent posts, Matt has moved the results download over to a different server, and that part of the process still has some problems getting results out, indicating the real problems are farther back in the "plumbing".

Profile kzhorse
Avatar
Send message
Joined: 30 Jun 03
Posts: 113
Credit: 2,476,310
RAC: 0
United States
Message 209451 - Posted: 10 Dec 2005, 19:00:08 UTC

I dont really know what the problem is but all my machines have a steady flow of work,The only problem I see is that I cant send the finished work back so when they get caught up mine will get sent and I will get credit.
I should have enough hard drive space on most of them that can go for atleast another week or so.

Good luck with what ever you do,Scott
____________
" "

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 45,007,582
RAC: 13,691
United Kingdom
Message 209453 - Posted: 10 Dec 2005, 19:02:05 UTC - in response to Message 209381.


As Martin has already mentioned, BOINC already includes "exponential backoff" to spread connection-attempts over longer time-interval.


Doesn't quite seem to work like that. I have one WU which has attempted upload 77 times since 4 December. It reached a maximum 'back-off' of 3 hours 59 minutes in less than a day, and since then it has been held for a random time between 0 and 4 hours. One 'back-off' on day five was for less than two minutes.

As Jack Gulley says,


The server may be trying to handle too many connections at the same time.


I still think that the fact that the back-off is per result, rather than per project, is a design weakness in BOINC which is contributing to 'too many connections'. If Matt asks for more evidence when he gets back, I'll gladly turn things back on: but in the meantime, I think we've got enough of a problem and I'll avoid contributing more to the network congestion than is strictly necessary.

Richard

1 · 2 · 3 · 4 . . . 15 · Next

Message boards : Number crunching : Why classic SETI@home is closing down and other facts of life.

Copyright © 2014 University of California