Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 89 · 90 · 91 · 92 · 93 · 94 · Next
| Author | Message |
|---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
We might be able to raise the money but how can we help with the Time? So how do we raise/funnel the money to fund another Programmer/System Admin? I assume that the director would have to be willing to hire.... And I assume we would need to come up with more than a single years worth of funding. Tom A proud member of the OFA (Old Farts Association). |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14010 Credit: 208,696,464 RAC: 304
|
The splitters are down..... again..... This will be going to be a long weekend of ups & downs.The splitter output is still being limited by the database size. For a while now it's been running in very short bursts. Most of my requests don't result in any work, but when i do get work i get plenty of it for a couple of requests & it's enough to keep the Lunix system fed and the Windows system full to the serverside limits. Grant Darwin NT |
|
pututu Send message Joined: 21 Jul 16 Posts: 12 Credit: 10,108,801 RAC: 6
|
There are 14 servers of various 2X quad/hex core machines running. How much electricity cost do they consume annually? Considering that these servers are in California (Berkeley?), the electricity rates are relatively high perhaps good to know the actual annual electricity cost. My gut feeling is that if we can consolidate this to a few (Ivy-Bridge, Haswell, Broadwell or EPYC) servers, the payback should be favorable? From reading some of the posts below, it seems that there are two or three main items that can help to reduce the current outage/panic/server issues: 1. Tasks/WU management and distribution (e.g. setting short deadline, limit the number of tasks downloaded depending on task time return, etc, etc) 2. Hardware (bandwidth, CPU speed, storage, RAM, etc). 3. Maybe software upgrade/overhaul (goes together to take advantage of the newer hardware capabilities?) From my limited experience and exposure to SETI@home, this is what I think. Item 1 seems like something that can be optimized relatively easy when compare to item 2 and 3 and perhaps as a short term solution. May not cost a lot of money and time when compare to item 2 or 3 or I'm completely out of touch. However after having optimizing the task management/distribution, the project will likely hit a wall again because it will eventually be hardware limited if SETI@home decides to attract more donors. My humble one cent. :) |
|
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266
|
There are 14 servers of various 2X quad/hex core machines running. How much electricity cost do they consume annually? Considering that these servers are in California (Berkeley?), the electricity rates are relatively high perhaps good to know the actual annual electricity cost. My gut feeling is that if we can consolidate this to a few (Ivy-Bridge, Haswell, Broadwell or EPYC) servers, the payback should be favorable? These are definitely higher costs than most areas, which is counter intuitive to California's power plan as the state is actually paying Arizona to take power off its grid due to overproduction. One thing which is very debatable in my most humble of opinions is the ability for Berkeley to even interest a soul to work there for a mere $100K/yr. in one of the top 5 highest cost of living areas in the nation, possibly the world. Perhaps some undergrad attending the university wishing to enter into an internship. We might as well be talking about funding a solar/wind generation site for the university for cost reduction and redundancy when the state or PG&E forces brownouts. Perhaps we can gain interest from the panic crowds in the Global Climate debates??? :) That would be a long term reduction of costs for the university as a whole. #1 is a very feasible suggestion, and most of that work is already done if you look at the breakdown per host on a per plan class basis (needs a bit of tweaking, but it is a good start), it would be a feasible move to limit outgoing work on a machine's own track record, and should be easy to implement with little overhead. For the sake of the crowd who wants to allow the largest number of members the ability to crunch numbers, I would suggest that we set limits based on a security measure. That is, don't allow hardware on the system which will not run a supportable operating system which is actively writing fixes, or we disallow hardware which is outside that scope by a time limit no more than 2 years outside of such software limits. That just makes sense, and allows the system a higher floor, which corresponds with actual growth in the field. There is something to be said about the soft bigotry of low expectations here. The system as a whole should not cut off its nose to spite its face. We want people to participate at maximum potential, but when the entire system suffers because of our want to allow some to run antiquated hardware, we shoot for mediocrity not excellence. Edit* That said, if one sets limits too low on their equipment, or doesn't turn on their computer very often, that is a choice the person makes. Setting a much lower turn around limit is not an impractical move. If storage space, memory, and software limitations were not an issue, we wouldn't even have this discussion. We are however, having this discussion. These policies have led us to what is perhaps our own limits set not by the hardware, but our own permissiveness in policy. This is having serious ramifications systemwide, and is correctable. We can talk about increasing the hardware in a dream list, and come up with other measures at a cost. We can also make a decision to change a policy and avoid the cost while at the same time correcting the system, and extending the life of these systems as they exist. My humble opinion is to say that policy > hardware changes, at least in the foreseeable timeframe. This can extend the ability for some to make rational purchase decisions for the future based on sensible, and planned needs. My stinky opinion for what it's worth. |
|
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50
|
Tnx...makes sense...just finding it funny that it took 3 years... :)It just took 3 years for you to notice. I guess you didn't watch your queue all the time during those 3 years to make sure no non SoG task slips through. I look through my tasks each morning and at least a couple of times throughout the day. On my slow cruncher, that would catch such strangers. Humans may rule the world...but bacteria run it... |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22922 Credit: 416,307,556 RAC: 380
|
Simple - just make a donation to SETI@Home by our preferred mechanism, if you don't say it is for a special fund raiser then it goes to running the pool for general running of the project. The funding for a new member of staff should not be for one year, but for a number of years - short term staff (including interns and research students) are OK for doing a secific, well defined task, but what the project needs is a stable core of staff onto which one adds the short-term staff. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
It does help keep our sinuses clear though... :) Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
For what it is worth of the projects I participate in, Seti@Home has the most relaxed "due" schedule. Many of my other projects allow a week or less per task. An experiment might be reduce the deadline to say 2 weeks and see if the load drops off because tasks are not sitting idle in the DB. If possible could we split the deadlines making the gpu tasks say a week or under? Tom A proud member of the OFA (Old Farts Association). |
Gene ![]() Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118
|
TomM proposes/suggests reducing the deadline to 2 weeks. I would vote for a less "ambitious" adjustment to the deadlines. I do observe that the AstroPulse tasks are issued with a 26-day deadline, as compared to the 60-day deadline for everything else. If the deadline were reduced to perhaps 40 or 50 days and allowed to remain there a couple of months (i.e. long enough to stabilize to some sort of equilibrium) that ought to give the project some hard data on the effects on database issues and resend statistics. Then decide whether it was a mistake - and revert to previous values; or, decide it was a positive move and, perhaps, continue adjusting deadlines in similar small steps. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14010 Credit: 208,696,464 RAC: 304
|
An experiment might be reduce the deadline to say 2 weeksSee all of my previous posts on reducing deadlines. Grant Darwin NT |
|
Speedy Send message Joined: 26 Jun 04 Posts: 1649 Credit: 12,921,799 RAC: 89
|
Tasks are been removed after they have sat in your validated list for a period of time. But I am unsure if the purge queue is working properly on the server status page because I have not seen this rise over 300,000 in probably almost a week. I just find it interesting that all I'm pleased results are being deleted. I also see that we are back up to over 6 million results out in the field
|
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22922 Credit: 416,307,556 RAC: 380
|
I'm republishing this chart I posted a month ago. The x-axis is time *days pending), the y-axis is the percentage remaining to be validated. At 10 days pending there are still between 50% & 60% of the tasks waiting, at 30 days, this drops to between 20, and 30%and by 50 days we are down into the noise. The peculiar gap between 30 and 40 days was the "Christmas Debacle", when nothing really got validated. So what does this mean? A deadline reduction to 10 days would shove the resends through the roof, and turn a large number of currently useful-if-slow hosts into slow-but-useless hosts - they just wouldn't return their data in time and that data forms a very large proportion of the total. A deadline reduction to 30 day still push the resends through the roof, and again turn a large number of currently useful-if-slow hosts into slow-but-useless hosts - they just wouldn't return their data in time and that data forms a fair proportion of the total. A deadline of 40 days would see between 10% and 20% resends, getting less hostile to the slow-but-productive hosts. A deadline of 50 days, and the resends drop to ~5%, much less hostile on the slow-but-productive hosts. I would suggest that the sweet-spot may be deadlines around 40-50 days, where the impact on the slowest hosts is probably about as low as one can reasonably expect. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
It would be interesting to know - but we can probably only speculate - under what circumstances a slow-but-useful computer returns a valid task after 50 days. 1) It really is that slow! 2) It only crunches part time. 3) It crunches, but only a small proportion of its time is spent on SETI 4) It broke down, but the owner took time to source and install replacement parts 5) It broke down, but the owner didn't notice 6) It was caught by a driver problem, but the owner didn't know how to handle it 7) It spends a lot of time out of reach of the internet and so on. I'm sure we can think of many more. |
Stephen "Heretic" ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628
|
For what it is worth of the projects I participate in, Seti@Home has the most relaxed "due" schedule. Many of my other projects allow a week or less per task. . . Tom! Are you after my nick? Talk like that might get you excommunicated. ... Stephen :) |
Stephen "Heretic" ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628
|
TomM proposes/suggests reducing the deadline to 2 weeks. I would vote for a less "ambitious" adjustment to the deadlines. I do observe that the AstroPulse tasks are issued with a 26-day deadline, as compared to the 60-day deadline for everything else. If the deadline were reduced to perhaps 40 or 50 days and allowed to remain there a couple of months (i.e. long enough to stabilize to some sort of equilibrium) that ought to give the project some hard data on the effects on database issues and resend statistics. Then decide whether it was a mistake - and revert to previous values; or, decide it was a positive move and, perhaps, continue adjusting deadlines in similar small steps. . . I would vote for 28 days myself. I remain convinced the project would be perfectly viable with an even shorter deadline but in the spirit of compromise 28 days seems way more than sufficient. Stephen . . |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14010 Credit: 208,696,464 RAC: 304
|
So what does this mean?I don't believe that to be the case. The reason there are so many systems that take so long to return work, is because of the exiting long deadlines. We are allowing it to occur. And even so, the Average turnaround time at present is only 34 hours! Even the slowest of the slow systems can return a WU within 2 days. Even allowing them to spend much of their time not actually processing work or working on another project, they can still return the longest to process WU within a week. But people do have issues- power, comms, system etc. So we set deadlines at 4 weeks. In that time the slowest of the slow that spends most of it's time powered off will still be able to return several WUs. And even if there are floods, fires, storms etc that make it impossible for systems to return the work within a week, people will still be able to return finished work before it times out by giving them that 28 day deadline. Grant Darwin NT |
Stephen "Heretic" ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628
|
I would suggest that the sweet-spot may be deadlines around 40-50 days, where the impact on the slowest hosts is probably about as low as one can reasonably expect. . . Except that the majority of that 'delay' on the slow hosts is not due to their low productivity so much as their oversized caches. The reason they sit on tasks for 50 days is not because it takes them that long to process a task, but because WUs sit in their 'in progress' status for weeks on end before they get around to processing them. Shortening the deadline and if necessary reducing their work fetch limits would eliminate that unnecessary period of WUs sitting in purgatory. To avoid large numbers of time outs and system imposed work allocation limits they would have to actually administer their hosts more responsibly and reduce their caches to a size that matches their level of productivity. What a shame that would be ... Stephen :( |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
... oversized caches ...Now there's a challenge! I'll have a look through some of my pendings later, and see how many of my wingmates fall into that category. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14010 Credit: 208,696,464 RAC: 304
|
. . Except that the majority of that 'delay' on the slow hosts is not due to their low productivity so much as their oversized caches. The reason they sit on tasks for 50 days is not because it takes them that long to process a task, but because WUs sit in their 'in progress' status for weeks on end before they get around to processing them.In theory, if a WU is processed it should be done within 20 days (10+10 for cache settings).* Any longer than that, and still returned by that host, would most likely be due to outside factors (System, power, comms etc issues), or a recently connected very slow host, possibly with more than one project with 10+10 cache settings still figuring things out. *Unless bunkering or other such user manipulation is at play. Grant Darwin NT |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22922 Credit: 416,307,556 RAC: 380
|
Just take a look at the graph before making ANY assumption about "having no effect", "Because of long deadlines" - these two are totally and utterly WRONG. The truth is, and some do not accept this, is that SETI@Home has a POLICY of supporting a very wide range of computer performance, and human activity such as holidays and forgetting to stop a host, infrequent processing and so on. Twenty days would mean about 40% of the task sent out would have to be resent, and, as these are probably on hosts that only do a very small number of tasks per year that means alienating a very large proportion of the user base, which according to many reports is shrinking - do you want to decimate that base over night? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.