Confessions of a SETI@home Cruncher

Message boards : Number crunching : Confessions of a SETI@home Cruncher
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Greg Beach
Avatar

Send message
Joined: 7 Jun 99
Posts: 23
Credit: 4,978,313
RAC: 0
Canada
Message 1006558 - Posted: 20 Jun 2010, 13:16:30 UTC

Most times when we have a major unplanned outage there is invariably someone who announces they are leaving the project. For what it's worth I thought it was time for someone to announce that they are staying and to offer an alternate perspective.

There are casual crunchers and hardcore crunchers. After 11 uninterrupted years contributing to SETI@home I am definitely not a casual cruncher but I'm not as serious as some of the hardcore crunchers here. I guess you could call me a softcore (insert porn joke here) cruncher.

As a softcore cruncher I too have been frustrated over the last few years with the outages and I have to admit I have considered quitting. There's nothing more frustrating to a hard/softcore cruncher than a machine that is sitting idle. Here are a few tips that helped me minimize the frustration:

  • Typically minor outages last 1-2 days and major outages 3-4 days. Hardcore crunchers may want to carry enough cache to ride out the major outages to prevent the frustration of having their rigs run dry. Casual crunchers will probably want to stick with the default settings.
  • There are loads of worthy projects out there that would be more than happy to keep your rigs supplied with work whenever SETI@home has problems.
  • Like many others I started as a SETI@home member when there was no BOINC. It's taken me a while but I no longer see myself as a SETI@home member but as a BOINC member. And by focusing on my global RAC rather than my SETI@home RAC it was easier to begin contributing to other projects ensuring that my machine is always supplied with work.
  • Like many hard/softcore crunchers I run optimized clients. This is my choice. Any complictions due to server upgrades, configuration changes etc. that arise as a result of my choice to run optimized clients is my problem, not the fault of the SETI@home team and I will not blame them.
  • Remember you're a volunteer not a customer or an employee. We do not pay for nor do we make a living off SETI@home (how cool would that be). They do not owe us workunits to keep our machines busy.


Another thing to consider when there is an extended outage is that the small SETI@home staff support hundreds of thousands of active users in their spare time. They do have lives outside SETI@home that rightfully take precedence and we should be grateful for the work they do. Here is a quote from a post by Matt on 16 Jun 2010 on the Technical News page that I think sums up the story from the other side:

"I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects."


While I appreciate the work done by the SETI@home staff they could improve their communication during an outage. Something as simple as a short note at the top of the server status page that indicates they know about the problem and when they expect everything will be back to normal would only take a few seconds of their time and go a long way to minimize the frustration level experienced by the most dedicated crunchers. It might even reduce the length of the server panic threads.

While it's unfortunate when any cruncher, let alone a long time cruncher, leaves the project I understand their decision and hope that in the future they will consider returning.

Anyway, sorry for droning on for so long but that's by 2 cents.
ID: 1006558 · Report as offensive
Norwich Gadfly
Avatar

Send message
Joined: 29 Dec 08
Posts: 100
Credit: 488,414
RAC: 0
United Kingdom
Message 1006579 - Posted: 20 Jun 2010, 14:30:43 UTC - in response to Message 1006573.  

I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse.

Perhaps the project would run more smoothly if the maximum cache size were reduced to say three days, and the deadlines tightened from the ridiculously long six weeks to say three weeks.

Looking forward to some flames in response...
ID: 1006579 · Report as offensive
Profile Dirk Villarreal Wittich
Avatar

Send message
Joined: 25 Apr 00
Posts: 2098
Credit: 434,834
RAC: 0
Holy See (Vatican City)
Message 1006587 - Posted: 20 Jun 2010, 14:49:19 UTC - in response to Message 1006579.  

I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse.

Perhaps the project would run more smoothly if the maximum cache size were reduced to say three days, and the deadlines tightened from the ridiculously long six weeks to say three weeks.

Looking forward to some flames in response...



Perhaps the project would run more smoothly if the maximum cache size were reduced to say three days,
 and the deadlines tightened from the ridiculously long six weeks to say three weeks.


I guess the reason for doing/not doing it is that there still are lots of PC´s with old/not upgraded settings and hardware.
I would say that the aim of the project is to give as much people as possible a chance to participate, even with old fashioned PC´s.
Not everybody is willing and able to purchase every now and then a new PC, graphic card , motherboard or whatever you think of.

ID: 1006587 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1006590 - Posted: 20 Jun 2010, 15:08:26 UTC

Have you ever noticed now many thoughtful posts on these threads come from Canada?

ID: 1006590 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 1006592 - Posted: 20 Jun 2010, 15:15:50 UTC
Last modified: 20 Jun 2010, 15:16:17 UTC

I think it comes from having an elephant living next door. It becomes part of your nature to tip toe around.
ID: 1006592 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1006593 - Posted: 20 Jun 2010, 15:17:02 UTC - in response to Message 1006579.  

I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse.

Perhaps the project would run more smoothly if the maximum cache size were reduced to say three days, and the deadlines tightened from the ridiculously long six weeks to say three weeks.

Looking forward to some flames in response...

This is a subject that has been debated many times around here and although some might consider me a "power cruncher" I quite agree with you. Smaller caches and reduced return times would definitely reduce the stress on the servers by reducing validation times thus reducing the size of the database. Database problems appear to be one of main causes of server problems.

It's a weird feedback loop situation. The servers and their software appear to be inherently unstable and prone to failure so people up their cache to compensate for the outages, this places more stress on the servers which makes them more prone to failure so more people up their caches which puts more stress on the servers and so on and so on ad infinitum. Call it the SETI Spiral. Projects like Malaria Control and some WCG projects do quite well with a 2 to 3 week return time.

For the record, I only run a 3 day cache for this very reason and I do not really consider myself a "power cruncher". I don't run any heavy battleships, just a flotilla of 2 light cruisers and a couple of frigates. But I do alright :-)

Brodo
ID: 1006593 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1006603 - Posted: 20 Jun 2010, 15:56:53 UTC

[lots of text warning]

I do also agree with the above. I consider myself to be a "softcore" cruncher, since I only had about 6 months of idle CPUs (was in about 2003ish.. I forgot to reinstall..classic(?) after a Windows re-install), and do intend to ride the current issues out and see what the "smooth" outcome is..if we ever get to a smooth outcome.

Personally, I prefer crunching Astropulse over Multibeam, and have only been able to do that by omitting multibeam in my app_info. It takes a few hours to get all four cores crunching, and another day or two to have a 4-day cache, simply because of the scarcity of astropulse and needing the perfect timing for a work request.

AP-only was working quite well for me for the past 2-3 months. Only once or twice did I end up with four idle cores, but that lasted less than a day and I managed to get more to work on. During the past week though, I gave up with AP-only because I went five consecutive days without a single WU, and that was with a combination of letting BOINC do its thing, and me mashing the 'update' button on occasion. Got myself a 4-day cache of MB with ease and the cores are busy again.

I am well-aware that the purpose of BOINC is to be able to tie multiple projects into one central application, and that no project (or at least SAH) guarantees there will be work at all times. This is why I have never complained about anything that wasn't trivial. I choose to run only this one project, and have tried to have a decent cache to ride out the storms over the years. My 4-day cache setting usually ended up being about 5 days instead, and for the first time in well over 6 years, I did actually run out of work within the past few months, and also multiple times since then.

Still, I wait and see what happens next. No threats about leaving (or actually leaving), because I use my computer for video games and my employment income. The idle CPU time gets used for crunching--the way it's supposed to be. I understand some hardcore people go and spend ludicrous amounts of money to see what kind of RAC they can get, and more power to them. I would choose to spend that money in a different way.

Somewhere along the line, they have to realize that it has been the promise since day one of distributed computing, that 24/7 availability is just not going to happen. There will be downtime, there will be back-end problems that take days to fix. If you're worried about running out of work, grab another project and give it a tiny resource share, that way when this one goes down, the other will keep you busy until this one comes back. If you're like me and choose to be exclusive to just one project, then you'll just go idle.

I've been running optimized apps since October 2008, and even as I type this, they still work for me. I get work, the server doesn't tell me that I'm at a quota limit, and my crunched tasks validate with the stock app about 99.995% of the time. The small percentage of inconclusive validation is because my wingman was a CUDA cruncher that -9'ed, and when you pull up their task list--all 1000+ tasks--they have a high rate of errors. Or tasks that I turned in within days of being issued and have to wait six weeks for someone to time out and have it re-issued. I'm not a credit hound, but WUs waiting in limbo eat up valuable disk space on the servers.

The deadlines are just way too long. Yes, when the precision was doubled a few months ago on MB tasks, the deadline was also doubled, and I think that causes a problem. Even with twice the work out of every WU, the deadline can stay at 3 weeks. BOINC is supposed to make sure every task can be turned in before the deadline, so unless you are still running a computer that takes three weeks to do a single MB, there shouldn't be a problem. The main problem with the long deadlines are the disk space on the servers that get eaten up by having to hold pending tasks for that much longer.

Anyone else notice how a year ago, four million in the 'returned and awaiting validation' was a cause for serious concern? Here in the past year, 'results in the field' have only gone up about 50% and 'returned and awaiting' has more than doubled? That is terabytes of disk space eaten up, waiting for the so-called "hit-and-run" hosts to time-out sometime next month. Reducing the number of "active" tasks would probably reduce the server load by a large amount, thus reducing the amount of problems that we are experiencing.

I guess that's enough of my comments on the matter.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1006603 · Report as offensive
Lane42

Send message
Joined: 17 May 99
Posts: 59
Credit: 227,150,556
RAC: 11
United States
Message 1006606 - Posted: 20 Jun 2010, 16:05:05 UTC - in response to Message 1006593.  

Looking at some of my pendings i have a wingman
with a core 2 duo and a GS (very slow) gpu with
a cache of over 4k...if he's lucky he'll only
finish half of that. how can this happen.
ID: 1006606 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1006608 - Posted: 20 Jun 2010, 16:33:12 UTC
Last modified: 20 Jun 2010, 16:34:26 UTC

I have an Opteron 1210 CPU at 1.8 GHz running Linux and 5 BOINC projects. I have 8 SETI WUs in a pending state. But I also am running SETI on a virtual machine with OpenSolaris via VirtualBox, much more slowly. It has 3 WUs in a pending state. How come? All of my wingmen have faster CPUs and some even GPUs, I have no GPU. The only reason I see is that people load many more WUs on their faster machines. I have a 0.25 day cache and am never out of work.
Tullio
ID: 1006608 · Report as offensive
Greg Beach
Avatar

Send message
Joined: 7 Jun 99
Posts: 23
Credit: 4,978,313
RAC: 0
Canada
Message 1006648 - Posted: 20 Jun 2010, 19:24:45 UTC - in response to Message 1006573.  

While I appreciate the work done by the SETI@home staff they could improve their communication during an outage.


They are under no obligation to do so outside of their working hours. The fact that they do, and drive in on long journeys, is a credit to their personal dedication.


I didn't mean to suggest that they should monitor 24/7. As I mentioned, they have a life outside SETI@home that must take precedence, as it should. I only meant to suggest that once they are aware of a problem that one of the first steps they take might want to take would be to update the server status page.
ID: 1006648 · Report as offensive
Greg Beach
Avatar

Send message
Joined: 7 Jun 99
Posts: 23
Credit: 4,978,313
RAC: 0
Canada
Message 1006651 - Posted: 20 Jun 2010, 19:43:04 UTC - in response to Message 1006579.  

I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse.

Perhaps the project would run more smoothly if the maximum cache size were reduced to say three days, and the deadlines tightened from the ridiculously long six weeks to say three weeks.

Looking forward to some flames in response...


You agreed with me so I'm not going to flame you. Sorry to disappoint you. :D

In fact, I will agree with you and second your suggestion for smaller maximum cache sizes and tighter deadlines.

ID: 1006651 · Report as offensive
TheFreshPrince a.k.a. BlueTooth76
Avatar

Send message
Joined: 4 Jun 99
Posts: 210
Credit: 10,315,944
RAC: 0
Netherlands
Message 1006670 - Posted: 20 Jun 2010, 20:40:39 UTC

Wouldn't it be possible to send WU's to computers that have about the same return-rate (or whatever you call it).

Then slower computers will have more time to return and faster computers wil validate quicker because their wingman is also fast.
That would also lower the used diskspace on the servers.

Another option is to give higher credit when a result is returned quicker.
Then people will use a smaller cache.

Just some idea's...
Rig name: "x6Crunchy"
OS: Win 7 x64
MB: Asus M4N98TD EVO
CPU: AMD X6 1055T 2.8(1,2v)
GPU: 2x Asus GTX560ti
Member of: Dutch Power Cows
ID: 1006670 · Report as offensive
Norwich Gadfly
Avatar

Send message
Joined: 29 Dec 08
Posts: 100
Credit: 488,414
RAC: 0
United Kingdom
Message 1006678 - Posted: 20 Jun 2010, 21:31:17 UTC - in response to Message 1006670.  

Wouldn't it be possible to send WU's to computers that have about the same return-rate (or whatever you call it).

Then slower computers will have more time to return and faster computers wil validate quicker because their wingman is also fast.
That would also lower the used diskspace on the servers.

Another option is to give higher credit when a result is returned quicker.
Then people will use a smaller cache.

Just some idea's...


I love both those ideas ! I can just predict now all the "my score is bigger than yours" brigade reducing their cache from 10 days to zero !

ID: 1006678 · Report as offensive
Profile Robert Waite
Avatar

Send message
Joined: 23 Oct 07
Posts: 2417
Credit: 18,192,122
RAC: 59
Canada
Message 1006681 - Posted: 20 Jun 2010, 21:40:00 UTC

I crunch for SETI@Home only.
If there's an outage and I've run out of work, I shut my computer off.
There's no point in getting hysterical over technical problems and throwing a hissy fit isn't going to make the problem go away.


I do not fight fascists because I think I can win.
I fight them because they are fascists.
Chris Hedges

A riot is the language of the unheard. -Martin Luther King, Jr.
ID: 1006681 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1006703 - Posted: 20 Jun 2010, 22:51:19 UTC - in response to Message 1006678.  

Wouldn't it be possible to send WU's to computers that have about the same return-rate (or whatever you call it).

Then slower computers will have more time to return and faster computers wil validate quicker because their wingman is also fast.
That would also lower the used diskspace on the servers.

Another option is to give higher credit when a result is returned quicker.
Then people will use a smaller cache.

Just some idea's...


I love both those ideas ! I can just predict now all the "my score is bigger than yours" brigade reducing their cache from 10 days to zero !

Sad to disappoint you but I would still keep my 10 day cache, as it is not just about RAC but about getting more work done. 0 day cache = many days not crunching, many days not looking for ET. Thats just me, for me pendings are credits in the bank.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1006703 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 1006704 - Posted: 20 Jun 2010, 22:56:59 UTC

SETI was obviously my 1st project, started with Classic, then to BOINC. Since then, as seen by my sig, I've attached to many projects, as they all have a different meaning to me. My 2 computers crunch regardless if a project is up or down, but thats because I believe in all the research that the different projects offer. SETI will always remain my favorite, WCG is a strong 2nd, Einstein would have to be 3rd. I also crunch different projects with my team depending on what is picked as a Project of The Month. If its something I don't want to do, I'll ramp up something else. I'm by no means a heavy cruncher, no GPU's, and lowly CPU's. I'm still chugging along with my 2.8Ghz P4 w/HT, and when I got my laptop, it does its fair share of work as well. Would I like to buy a new computer? Only if I want my wife to kill me!!! LOL
ID: 1006704 · Report as offensive
Cameron
Avatar

Send message
Joined: 27 Nov 02
Posts: 110
Credit: 5,082,471
RAC: 17
Australia
Message 1006763 - Posted: 21 Jun 2010, 2:50:16 UTC

I too am a BOINC Community Member
Consider myself a Dedicated Cruncher
-- Long Term Perserverence
-- CPUs Contributing to One Project or Another
-- Courteous Return of Finished Work

SETI@Home is probably my favourite Project. (Because it's the First)
Einstein@Home and MilkyWay@Home my favourite Backup Projects.


ID: 1006763 · Report as offensive
mertin

Send message
Joined: 27 Jun 10
Posts: 1
Credit: 897
RAC: 0
Argentina
Message 1017945 - Posted: 20 Jul 2010, 15:11:47 UTC - in response to Message 1006579.  

The deadlines are long simply because BOINC in general is supposed to run when the computer is turned on but idle, wich is nowhere near 24/7. I guess most people don't care much about it and just rejoice in the fact that they're contributing to humanity with their spare cpu time. And that's ok.

Maybe BOINC could organize to group all the hardcore crunchers together so that time is not wasted in waiting for the "lazy" idle cpu / low spec hardware validate their super fast crunching... Was this ever implemented or even discussed at all? I'm new to this.
ID: 1017945 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1017948 - Posted: 20 Jul 2010, 15:29:15 UTC - in response to Message 1017945.  

Hi Mertin, welcome to SETI,

Maybe BOINC could organize to group all the hardcore crunchers together so that time is not wasted in waiting for the "lazy" idle cpu / low spec hardware validate their super fast crunching


I think if you look at some of the tasks done, you will notice most of the time it is the low spec hardware that is waiting for the super crunchers to finish their work. This is mostly because the super crunchers run such high caches in order to make sure they don't run out during an outage.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1017948 · Report as offensive
Profile Peter M. Ferrie
Volunteer tester

Send message
Joined: 28 Mar 03
Posts: 86
Credit: 9,967,062
RAC: 0
United States
Message 1017953 - Posted: 20 Jul 2010, 15:33:10 UTC - in response to Message 1017945.  

ID: 1017953 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Confessions of a SETI@home Cruncher


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.