Message boards :
Number crunching :
Confessions of a SETI@home Cruncher
Message board moderation
Author | Message |
---|---|
Greg Beach Send message Joined: 7 Jun 99 Posts: 23 Credit: 4,978,313 RAC: 0 |
Most times when we have a major unplanned outage there is invariably someone who announces they are leaving the project. For what it's worth I thought it was time for someone to announce that they are staying and to offer an alternate perspective. There are casual crunchers and hardcore crunchers. After 11 uninterrupted years contributing to SETI@home I am definitely not a casual cruncher but I'm not as serious as some of the hardcore crunchers here. I guess you could call me a softcore (insert porn joke here) cruncher. As a softcore cruncher I too have been frustrated over the last few years with the outages and I have to admit I have considered quitting. There's nothing more frustrating to a hard/softcore cruncher than a machine that is sitting idle. Here are a few tips that helped me minimize the frustration:
"I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects." While I appreciate the work done by the SETI@home staff they could improve their communication during an outage. Something as simple as a short note at the top of the server status page that indicates they know about the problem and when they expect everything will be back to normal would only take a few seconds of their time and go a long way to minimize the frustration level experienced by the most dedicated crunchers. It might even reduce the length of the server panic threads. While it's unfortunate when any cruncher, let alone a long time cruncher, leaves the project I understand their decision and hope that in the future they will consider returning. Anyway, sorry for droning on for so long but that's by 2 cents. |
Norwich Gadfly Send message Joined: 29 Dec 08 Posts: 100 Credit: 488,414 RAC: 0 |
I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse. Perhaps the project would run more smoothly if the maximum cache size were reduced to say three days, and the deadlines tightened from the ridiculously long six weeks to say three weeks. Looking forward to some flames in response... |
Dirk Villarreal Wittich Send message Joined: 25 Apr 00 Posts: 2098 Credit: 434,834 RAC: 0 |
I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse. Perhaps the project would run more smoothly if the maximum cache size were reduced to say three days, and the deadlines tightened from the ridiculously long six weeks to say three weeks. I guess the reason for doing/not doing it is that there still are lots of PC´s with old/not upgraded settings and hardware. I would say that the aim of the project is to give as much people as possible a chance to participate, even with old fashioned PC´s. Not everybody is willing and able to purchase every now and then a new PC, graphic card , motherboard or whatever you think of. |
Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0 |
Have you ever noticed now many thoughtful posts on these threads come from Canada? |
Aurora Borealis Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0 |
I think it comes from having an elephant living next door. It becomes part of your nature to tip toe around. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse. This is a subject that has been debated many times around here and although some might consider me a "power cruncher" I quite agree with you. Smaller caches and reduced return times would definitely reduce the stress on the servers by reducing validation times thus reducing the size of the database. Database problems appear to be one of main causes of server problems. It's a weird feedback loop situation. The servers and their software appear to be inherently unstable and prone to failure so people up their cache to compensate for the outages, this places more stress on the servers which makes them more prone to failure so more people up their caches which puts more stress on the servers and so on and so on ad infinitum. Call it the SETI Spiral. Projects like Malaria Control and some WCG projects do quite well with a 2 to 3 week return time. For the record, I only run a 3 day cache for this very reason and I do not really consider myself a "power cruncher". I don't run any heavy battleships, just a flotilla of 2 light cruisers and a couple of frigates. But I do alright :-) Brodo |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
[lots of text warning] I do also agree with the above. I consider myself to be a "softcore" cruncher, since I only had about 6 months of idle CPUs (was in about 2003ish.. I forgot to reinstall..classic(?) after a Windows re-install), and do intend to ride the current issues out and see what the "smooth" outcome is..if we ever get to a smooth outcome. Personally, I prefer crunching Astropulse over Multibeam, and have only been able to do that by omitting multibeam in my app_info. It takes a few hours to get all four cores crunching, and another day or two to have a 4-day cache, simply because of the scarcity of astropulse and needing the perfect timing for a work request. AP-only was working quite well for me for the past 2-3 months. Only once or twice did I end up with four idle cores, but that lasted less than a day and I managed to get more to work on. During the past week though, I gave up with AP-only because I went five consecutive days without a single WU, and that was with a combination of letting BOINC do its thing, and me mashing the 'update' button on occasion. Got myself a 4-day cache of MB with ease and the cores are busy again. I am well-aware that the purpose of BOINC is to be able to tie multiple projects into one central application, and that no project (or at least SAH) guarantees there will be work at all times. This is why I have never complained about anything that wasn't trivial. I choose to run only this one project, and have tried to have a decent cache to ride out the storms over the years. My 4-day cache setting usually ended up being about 5 days instead, and for the first time in well over 6 years, I did actually run out of work within the past few months, and also multiple times since then. Still, I wait and see what happens next. No threats about leaving (or actually leaving), because I use my computer for video games and my employment income. The idle CPU time gets used for crunching--the way it's supposed to be. I understand some hardcore people go and spend ludicrous amounts of money to see what kind of RAC they can get, and more power to them. I would choose to spend that money in a different way. Somewhere along the line, they have to realize that it has been the promise since day one of distributed computing, that 24/7 availability is just not going to happen. There will be downtime, there will be back-end problems that take days to fix. If you're worried about running out of work, grab another project and give it a tiny resource share, that way when this one goes down, the other will keep you busy until this one comes back. If you're like me and choose to be exclusive to just one project, then you'll just go idle. I've been running optimized apps since October 2008, and even as I type this, they still work for me. I get work, the server doesn't tell me that I'm at a quota limit, and my crunched tasks validate with the stock app about 99.995% of the time. The small percentage of inconclusive validation is because my wingman was a CUDA cruncher that -9'ed, and when you pull up their task list--all 1000+ tasks--they have a high rate of errors. Or tasks that I turned in within days of being issued and have to wait six weeks for someone to time out and have it re-issued. I'm not a credit hound, but WUs waiting in limbo eat up valuable disk space on the servers. The deadlines are just way too long. Yes, when the precision was doubled a few months ago on MB tasks, the deadline was also doubled, and I think that causes a problem. Even with twice the work out of every WU, the deadline can stay at 3 weeks. BOINC is supposed to make sure every task can be turned in before the deadline, so unless you are still running a computer that takes three weeks to do a single MB, there shouldn't be a problem. The main problem with the long deadlines are the disk space on the servers that get eaten up by having to hold pending tasks for that much longer. Anyone else notice how a year ago, four million in the 'returned and awaiting validation' was a cause for serious concern? Here in the past year, 'results in the field' have only gone up about 50% and 'returned and awaiting' has more than doubled? That is terabytes of disk space eaten up, waiting for the so-called "hit-and-run" hosts to time-out sometime next month. Reducing the number of "active" tasks would probably reduce the server load by a large amount, thus reducing the amount of problems that we are experiencing. I guess that's enough of my comments on the matter. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Lane42 Send message Joined: 17 May 99 Posts: 59 Credit: 227,150,556 RAC: 11 |
Looking at some of my pendings i have a wingman with a core 2 duo and a GS (very slow) gpu with a cache of over 4k...if he's lucky he'll only finish half of that. how can this happen. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I have an Opteron 1210 CPU at 1.8 GHz running Linux and 5 BOINC projects. I have 8 SETI WUs in a pending state. But I also am running SETI on a virtual machine with OpenSolaris via VirtualBox, much more slowly. It has 3 WUs in a pending state. How come? All of my wingmen have faster CPUs and some even GPUs, I have no GPU. The only reason I see is that people load many more WUs on their faster machines. I have a 0.25 day cache and am never out of work. Tullio |
Greg Beach Send message Joined: 7 Jun 99 Posts: 23 Credit: 4,978,313 RAC: 0 |
While I appreciate the work done by the SETI@home staff they could improve their communication during an outage. I didn't mean to suggest that they should monitor 24/7. As I mentioned, they have a life outside SETI@home that must take precedence, as it should. I only meant to suggest that once they are aware of a problem that one of the first steps they take might want to take would be to update the server status page. |
Greg Beach Send message Joined: 7 Jun 99 Posts: 23 Credit: 4,978,313 RAC: 0 |
I agree with both Greg and Chris. As well as this project, I run World Community Grid and am never out of work, despite having a cache of only two days. I turn round the vast majority of my S@H units in a day or two, but when I look at my pending units I see many of my wingmen have vast work caches but take ages to complete them, and often either don't complete them at all or complete with errors. It seems to me that such behaviour makes the work shortage worse. You agreed with me so I'm not going to flame you. Sorry to disappoint you. :D In fact, I will agree with you and second your suggestion for smaller maximum cache sizes and tighter deadlines. |
TheFreshPrince a.k.a. BlueTooth76 Send message Joined: 4 Jun 99 Posts: 210 Credit: 10,315,944 RAC: 0 |
Wouldn't it be possible to send WU's to computers that have about the same return-rate (or whatever you call it). Then slower computers will have more time to return and faster computers wil validate quicker because their wingman is also fast. That would also lower the used diskspace on the servers. Another option is to give higher credit when a result is returned quicker. Then people will use a smaller cache. Just some idea's... Rig name: "x6Crunchy" OS: Win 7 x64 MB: Asus M4N98TD EVO CPU: AMD X6 1055T 2.8(1,2v) GPU: 2x Asus GTX560ti Member of: Dutch Power Cows |
Norwich Gadfly Send message Joined: 29 Dec 08 Posts: 100 Credit: 488,414 RAC: 0 |
Wouldn't it be possible to send WU's to computers that have about the same return-rate (or whatever you call it). I love both those ideas ! I can just predict now all the "my score is bigger than yours" brigade reducing their cache from 10 days to zero ! |
Robert Waite Send message Joined: 23 Oct 07 Posts: 2417 Credit: 18,192,122 RAC: 59 |
I crunch for SETI@Home only. If there's an outage and I've run out of work, I shut my computer off. There's no point in getting hysterical over technical problems and throwing a hissy fit isn't going to make the problem go away. I do not fight fascists because I think I can win. I fight them because they are fascists. Chris Hedges A riot is the language of the unheard. -Martin Luther King, Jr. |
hiamps Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0 |
Wouldn't it be possible to send WU's to computers that have about the same return-rate (or whatever you call it). Sad to disappoint you but I would still keep my 10 day cache, as it is not just about RAC but about getting more work done. 0 day cache = many days not crunching, many days not looking for ET. Thats just me, for me pendings are credits in the bank. Official Abuser of Boinc Buttons... And no good credit hound! |
KB7RZF Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 |
SETI was obviously my 1st project, started with Classic, then to BOINC. Since then, as seen by my sig, I've attached to many projects, as they all have a different meaning to me. My 2 computers crunch regardless if a project is up or down, but thats because I believe in all the research that the different projects offer. SETI will always remain my favorite, WCG is a strong 2nd, Einstein would have to be 3rd. I also crunch different projects with my team depending on what is picked as a Project of The Month. If its something I don't want to do, I'll ramp up something else. I'm by no means a heavy cruncher, no GPU's, and lowly CPU's. I'm still chugging along with my 2.8Ghz P4 w/HT, and when I got my laptop, it does its fair share of work as well. Would I like to buy a new computer? Only if I want my wife to kill me!!! LOL |
Cameron Send message Joined: 27 Nov 02 Posts: 110 Credit: 5,082,471 RAC: 17 |
I too am a BOINC Community Member Consider myself a Dedicated Cruncher -- Long Term Perserverence -- CPUs Contributing to One Project or Another -- Courteous Return of Finished Work SETI@Home is probably my favourite Project. (Because it's the First) Einstein@Home and MilkyWay@Home my favourite Backup Projects. |
mertin Send message Joined: 27 Jun 10 Posts: 1 Credit: 897 RAC: 0 |
The deadlines are long simply because BOINC in general is supposed to run when the computer is turned on but idle, wich is nowhere near 24/7. I guess most people don't care much about it and just rejoice in the fact that they're contributing to humanity with their spare cpu time. And that's ok. Maybe BOINC could organize to group all the hardcore crunchers together so that time is not wasted in waiting for the "lazy" idle cpu / low spec hardware validate their super fast crunching... Was this ever implemented or even discussed at all? I'm new to this. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Hi Mertin, welcome to SETI, Maybe BOINC could organize to group all the hardcore crunchers together so that time is not wasted in waiting for the "lazy" idle cpu / low spec hardware validate their super fast crunching I think if you look at some of the tasks done, you will notice most of the time it is the low spec hardware that is waiting for the super crunchers to finish their work. This is mostly because the super crunchers run such high caches in order to make sure they don't run out during an outage. PROUD MEMBER OF Team Starfire World BOINC |
Peter M. Ferrie Send message Joined: 28 Mar 03 Posts: 86 Credit: 9,967,062 RAC: 0 |
|
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.