Message boards :
Number crunching :
Panic Mode On (10) Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 13 · Next
Author | Message |
---|---|
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
john deneer Send message Joined: 16 Nov 06 Posts: 331 Credit: 20,996,606 RAC: 0 |
Uploads started working again approx. one minute ago. Reporting too .... John. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66362 Credit: 55,293,173 RAC: 49 |
WU's are uploading here too, And on reporting I'm not seeing any problems as I have nothing now to report. :D Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I think my connection to seti has been blocked for at least 4h if not 24h. Anybody also in misery? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I think my connection to seti has been blocked for at least 4h if not 24h. Anybody also in misery? It was working fine for me 4h ago, so I'd go with your shorter estimate rather than your longer one - but yes, no uploads since then (haven't tried downloads). Server status page is frozen at 27 Oct 2008 1:30:21 UTC, so I guess it's just going to be another one of those Monday mornings in Berkeley..... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I think my connection to seti has been blocked for at least 4h if not 24h. Anybody also in misery? Downloads on Seti Main are working, but aren't on Beta. Claggy Edit: Actually Beta downloads are working, I've managed to download one WU, the other nine just drop their connections. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I re-try'd all my pending uploads and they all failed. I then did a manual update and got no new work (presumably because the uploads aren't complete). So they can't run, it seems, more than about 48h without some sort of process stopping glitch, usually network related perhaps. Is this because of all us chicks are chirping in unison for attention, or is there something fundamentally incorrectly engineered? I do remember the promise of Nirvana when something was done to move the network bandwidth peak from about 50 to 100 Mbps. Nirvana was fleeting in that case. So what is 'wrong', assuming that you agree something is actually wrong? |
Zebra3 Send message Joined: 22 Oct 01 Posts: 186 Credit: 13,658,148 RAC: 0 |
I get up every morning and update my minifarm of pc's to transfer my work from overnight. Occasionally we have outages at Berkeley unfortunately for various reasons that are out of their control. It seems to happen more on the weekends when no one is there to repair the failure...Murphy's Law...but &#$*!% happens! The crew of volunteers that manage the project can't be there 24/7 as they do have lives outside of the project. The way I deal with Seti@home is to keep my cache at a reasonable level so I will always have WU's and let the project do the rest. If I wake up like I have the last few mornings and things are not working at 100% I do what I normally do and go back to bed. The sun will rise tomorrow and maybe all will be well but if it dosen't worrying about Seti will be the least of my problems!!! |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Last contact to the server: 27 Oct 2008 - 09:17:56 UTC |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I get up every morning and update my minifarm of pc's to transfer my work from overnight. Occasionally we have outages at Berkeley unfortunately for various reasons that are out of their control. It seems to happen more on the weekends when no one is there to repair the failure...Murphy's Law...but &#$*!% happens! The crew of volunteers that manage the project can't be there 24/7 as they do have lives outside of the project. The way I deal with Seti@home is to keep my cache at a reasonable level so I will always have WU's and let the project do the rest. If I wake up like I have the last few mornings and things are not working at 100% I do what I normally do and go back to bed. The sun will rise tomorrow and maybe all will be well but if it dosen't worrying about Seti will be the least of my problems!!! Please, give us a break. It is likely that many of us do about the same thing you are boasting about. And chanting the hoary Rosary about limited manpower has gotten to a point that makes the hairs on my back stand up. Repeating the obvious becomes tedious. My point is that we are constantly having network connection issues. Note that we are on the 10th edition of this thread. I'm sure over time, there have been many reasons for the failures. But today I am merely asking what the main source of the problem is today. If we agree there is a problem, and the source is understood, then wouldn't it make sense to fix it so that the limited manpower could be used for something more useful, and so that our distributed computing system runs more productively? |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
In about 15 minutes the boyz should be back in the lab and the server kicking shall commence.......hopefully it is something that can be put back into action before tomorrow's maintenance outage. I would guess that Matt might report what the actual problem is if he posts in technical news this afternoon. Until then.........just keep crunching..... "Time is simply the mechanism that keeps everything from happening all at once." |
Zebra3 Send message Joined: 22 Oct 01 Posts: 186 Credit: 13,658,148 RAC: 0 |
Thank you very much PhonAcq for that biting response to my post. I am glad I did not have my coffee handy or im sure my monitor would be in need of cleaning...lol. In response to it I will only offer this comment. If even half of the 1.5 million of us crunching WU's donated just a few dollars to the project that you are harping about WE could have newer, better and more stable equipment and these outages would be non existant. The project only has so much cash to use and the rest must come from generous donations. We can only give what we have which I understand is tough in these days. If nothing else..a donation gives you a bright green star so you stand out from the madding crowd...lol. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
In about 15 minutes the boyz should be back in the lab and the server kicking shall commence....... Spot on, Mark - mine have started to go already. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
In about 15 minutes the boyz should be back in the lab and the server kicking shall commence....... Yup....I am kicking all of mine through as we speak..... And am getting downloads as well. "Time is simply the mechanism that keeps everything from happening all at once." |
Zebra3 Send message Joined: 22 Oct 01 Posts: 186 Credit: 13,658,148 RAC: 0 |
In about 15 minutes the boyz should be back in the lab and the server kicking shall commence....... I am also up and going as well..another day of sparring behind me...everyone have a good day!! |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
In about 15 minutes the boyz should be back in the lab and the server kicking shall commence....... Yourself as well sir.... "Time is simply the mechanism that keeps everything from happening all at once." |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Thank you very much PhonAcq for that biting response to my post. I am glad I did not have my coffee handy or im sure my monitor would be in need of cleaning...lol. In response to it I will only offer this comment. If even half of the 1.5 million of us crunching WU's donated just a few dollars to the project that you are harping about WE could have newer, better and more stable equipment and these outages would be non existant. The project only has so much cash to use and the rest must come from generous donations. We can only give what we have which I understand is tough in these days. If nothing else..a donation gives you a bright green star so you stand out from the madding crowd...lol. Coffee in face: I have that effect on people. And "that's a good thing" some would say. 1.5 million?: Try 155K active users, that is, the count of the users who have recently (1 month?) contributed. The official count is actually about 900K, of which there are seems to be many, many departed souls. Still 155K users is a mighty impressive number. "I need more money" chant: What project have you ever worked on that didn't need more money?? You aren't actually saying anything by repeating it over and over again. (tautology intended) Donation amour propre: Kind of off point on this thread, isn't it? (I had to learn a new Frenchy phrase for this bullet!) My desired result: A critical analysis of why things aren't better, yielding specific action plans and a path to more efficient and productive use of existing resources. Oh, yes, the end of world hunger, too. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
My point is that we are constantly having network connection issues. Note that we are on the 10th edition of this thread. I'm sure over time, there have been many reasons for the failures. ... and the point that I've been trying to make is that the "failures" really aren't. The BOINC client and the BOINC servers act together as a system. There are features in the client to cache new work, and to cache completed work. The caching allows BOINC to run on machines that are not connected 100% of the time, and the caching allows BOINC to work even when the servers are not 99.999% reliable. It is interesting to what the BOINC client does through the logs, and it is interesting to see what's happening with the servers at Berkeley, but overall, it's like kissing your sister. It's nice, but it doesn't mean anything. If we demonstrate through our complaints that a successful BOINC project needs to spend enough money to have 99.999% reliability, then we also demonstrate that one of the key concepts behind BOINC is false -- we're telling the world that you can't do big computing on a very small budget. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I agree that the boinc server/client is fault tolerant in the spirit of your description. Fine. What I consider a failure is when someone has to kick a server or a network box to get it going again (like what happened this morning, I surmise), or when some other human intervention has to occur. The random breakdowns that require Matt's fast fingers to fix should be analyzed and, ideally remedied, so that each such resource drain is eliminated (for good) in turn. In this vein, a planned service, such as our Tuesday Time-outs, is also a failure, but is obviously accepted as part of the current operational/ engineering plan. I'm personally not as concerned about it because it is predictable and I suspect that it could be nearly eliminated in the future with sufficient planning, funding, and/or cleaverness. But in a real sense, it is a band-aid that isn't getting better with time. So, if what you are actually saying is that the generalized boinc admin(s)/server/client system is fault tolerant, that is probably closer to the truth, but it doesn't say much. Conversely, it would be nice to know to what level boinc is indeed reliable, stripping away its fault tolerance protocols. Looking at the number of berkeley related connection errors I see in my logs indicates that the actual reliability for berkeley's implementation of boinc is running very low. Error correcting protocols are always inefficient and sub-optimal, whether you are talking memory, communications, or engineering systems. So it is always best to have high intrinsic reliability (or signal strength, or whatever) so that the error correction can be minimized. Understanding the sources of reliability loss at berkeley, should lead to steps to take to improve its underlying reliablity. For example, each missed upload request, leads to a sequence of subsequent requests as part of the fault tolerant protocols. This impacts the servers, network, and clients, each to some degree. Multiplied by the 300K or so hosts, leads to a lot of unproductive 'work'. Wouldn't we all be better of to get rid of this type of error, if we can? Regarding boinc's underlying premise, you allude to, I don't pay much attention to it frankly. I view boinc as a development engineering system, and as such should reflect the best engineering (albeit with limited resources) that can be developed. Boinc is not science to me, because Computer Science is almost always better described as Computer Engineering. However, the application of the boinc engine to seti has the potential of producing science. The problem here is that we have run for years now and have not generated a scientific result. I don't mean finding ET, but rather I mean a critical analysis of the data processed, contrasted to relevant theories as appropriate, with a clear statement of testable conclusions. Null results are still results, but they have to be analyzed scientifically. Physics majors may remember the importance of the Michaelson-Morley experiment, which itself was a null result that provided upper bounds on the existence of the ether. So at best, seti is in the middle(?) of the first phase of an ambitious science project, but we haven't actually completed any 'big computing' yet (i.e. let's not tell the world that we have). =================== sorry everyone, too many thoughts in one place due to coffee overload. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Regarding boinc's underlying premise, you allude to, I don't pay much attention to it frankly. It wasn't an allusion, it was a statement based on the various papers available at http://boinc.berkeley.edu/trac/wiki/BoincPapers. The first goal listed in this paper is "Reduce the barriers of entry to public resource computing." I'll let you read the paper if you wish, it explains alot. ... and while I agree that it'd be nice if the BOINC servers at SETI@Home didn't have to be "kicked" periodically, it seems to me that the problem is that the servers are running at a pretty high load all the time. Certainly, other resources (especially Bandwidth) often exceed what is available. Usually, problems like this are solved by getting more resources: bigger, faster servers with more storage, faster networks, a higher-speed connection from the Lab all the way to the 'net -- and more than one connection. Plus a couple more "Matts" to get it all integrated. Certainly, if you wanted to serve up something like Amazon.com where downtime means missed orders that's what you'd do. When you have a client that runs on each PC, you get the opportunity to relax the requirements on the server side. It becomes less important to have 99.99% reliability. So, while I agree with you that it'd be nice (or "will be nice") when things are running more smoothly, I'd like to see it because it'll be easier on Matt and Jeff and Eric than because it's any kind of requirement. SETI is the flagship BOINC project, and it is certainly the poster child for "less is more" -- but BOINC is also a work in progress. Overall, it seems to work -- even with all of the shortcomings, and even with the less than 100% reliable infrastructure. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.