The beginning of the end?

Message boards : Number crunching : The beginning of the end?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
CJOrtega

Send message
Joined: 15 May 99
Posts: 186
Credit: 1,126,273
RAC: 0
United States
Message 82181 - Posted: 23 Feb 2005, 15:12:06 UTC

The SETI Classic data server has been down since abt 1630 pst yesterday, and this notice is on the home page:

NOTICE: SETI@home is in the process of switching to new software called BOINC, which lets you run other projects (like Climateprediction.net and Einstein@home) on your computer as well as SETI@home. Please visit our new SETI@home/BOINC web site.

Getting closer to the deluge?


ID: 82181 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 82187 - Posted: 23 Feb 2005, 15:30:46 UTC

Could be -- seems there is an unacknowledged wire problem -- (the status message continues to say 'server up, dropping connections' -- when in fact it has been 'server may or may not be up, but if up, is dropping ALL connections').

Then again, I can't get BOINC SETI data packets either.

I figured the BOINC SETI tech folks who have the excitement of an ongoing beta with new problems to resolve and the knowledge that SETI classic is moribund and its users 'old hat', might be a tad more proactive about detailing the actual problem and and resolving it...

Barry Schnur
ID: 82187 · Report as offensive
Jan Inge

Send message
Joined: 24 Sep 02
Posts: 21
Credit: 1,655,076
RAC: 0
Norway
Message 82199 - Posted: 23 Feb 2005, 16:15:25 UTC

The text used to be:
"BOINC users: for more information about the BOINC/SETI@home project, click here."
So something is going on. But i think it was matt that said in a post here that when they are ready they will send out a mail to everyone. It may just be a notice so more users will switch now, and they dont get everyone switching at the same time.

ID: 82199 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 82202 - Posted: 23 Feb 2005, 16:40:25 UTC - in response to Message 82199.  

> So something is going on. But i think it was matt that said in a post here
> that when they are ready they will send out a mail to everyone. It may just be
> a notice so more users will switch now, and they dont get everyone switching
> at the same time.
>
Though at the moment even with the encouragement of the classic data server being essentially dead for the past 16 hours, switching over doesn't seem to be an option -- that is, I've not been able to download work for BOINC either since last evening.

Barry Schnur
ID: 82202 · Report as offensive
CJOrtega

Send message
Joined: 15 May 99
Posts: 186
Credit: 1,126,273
RAC: 0
United States
Message 82208 - Posted: 23 Feb 2005, 16:50:07 UTC - in response to Message 82202.  

> > So something is going on. But i think it was matt that said in a post
> here
> > that when they are ready they will send out a mail to everyone. It may
> just be
> > a notice so more users will switch now, and they dont get everyone
> switching
> > at the same time.
> >
> Though at the moment even with the encouragement of the classic data server
> being essentially dead for the past 16 hours, switching over doesn't seem to
> be an option -- that is, I've not been able to download work for BOINC either
> since last evening.
>
> Barry Schnur
>

I just checked the logs in BoincView, and see that I am not having any trouble sending and receiving data to/from the Seti/Boinc servers. Must be something local to your system.


ID: 82208 · Report as offensive
Profile Scott Simontis

Send message
Joined: 6 May 03
Posts: 8
Credit: 178,350
RAC: 0
United States
Message 82766 - Posted: 26 Feb 2005, 21:09:44 UTC - in response to Message 82208.  


> > > that when they are ready they will send out a mail to everyone. It
> may
> > just be
> > > a notice so more users will switch now, and they dont get everyone
> > switching
> > > at the same time.
I think that the project isn't quite ready yet. There are still a few things that need to be worked out, and the servers need to be able to handle all the users, and I am not sure if they could right now. They will also have to do something to avoid outages. I know mistakes happen, but why would you not have a server hooked up to a UPS under any conditions?
<IMG SRC="http://boinc.mundayweb.com/one/stats.php?userID=395&amp;prj=1&amp;trans=off">
ID: 82766 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 82772 - Posted: 26 Feb 2005, 21:21:48 UTC - in response to Message 82766.  

>I know mistakes happen, but why would you not have
> a server hooked up to a UPS under any conditions?

One problem was one of the newest DB servers wasn't on UPS as it was in "testing mode" (hadn't been moved to server rack yet). Note: no additional systems can be moved to the actual server closet as it is maxed out for power.

Another problem was they needed to do carry out specific tasks on each of the systems to shut them down when power went out...and they could not complete all these steps on each server before one of the server's UPS ran out of juice.

An additional problem is money. They don't have top of the line equipment (UPS, diesel backup generators, etc) as it runs on a shoestring budget and donations.

ID: 82772 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82819 - Posted: 26 Feb 2005, 23:37:14 UTC - in response to Message 82766.  

> They will also have to do something to avoid outages.

Actually, I'm not sure how much needs to be done here.

Compare E-Mail and the web.

If I type "http://setiweb.ssl.berkeley.edu" into my browser and they are down, I don't get the page.

If I send a message to someone @ssl.berkeley.edu and they're down, well, my mail server will deliver it when they're back. They can be down for an hour and I would not even notice.

BOINC is not like the web, it's like E-Mail.

So the UPS does not need to keep everything running for a couple of hours, it just needs to keep power up long enough for a safe shutdown (with a safety margin). A generator might make some of the more compulsive people happier, but BOINC doesn't need it because the power will come back on, the servers will be back up, and everything done while they were down will get through when they're back.
ID: 82819 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82821 - Posted: 26 Feb 2005, 23:38:49 UTC - in response to Message 82772.  


> Another problem was they needed to do carry out specific tasks on each of the
> systems to shut them down when power went out...and they could not complete
> all these steps on each server before one of the server's UPS ran out of
> juice.

Several suggested that the UPS died before the servers shut down, but I'm not sure I actually saw anything from the project on that.
ID: 82821 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 82825 - Posted: 26 Feb 2005, 23:46:35 UTC - in response to Message 82821.  

>
> > Another problem was they needed to do carry out specific tasks on each of
> the
> > systems to shut them down when power went out...and they could not
> complete
> > all these steps on each server before one of the server's UPS ran out of
> > juice.
>
> Several suggested that the UPS died before the servers shut down, but I'm not
> sure I actually saw anything from the project on that.
>

It's in the tech news report.
You will be assimilated...bunghole!

ID: 82825 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 82831 - Posted: 26 Feb 2005, 23:56:44 UTC - in response to Message 82819.  

> So the UPS does not need to keep everything running for a couple of hours, it
> just needs to keep power up long enough for a safe shutdown (with a safety
> margin). A generator might make some of the more compulsive people happier,
> but BOINC doesn't need it because the power will come back on, the servers
> will be back up, and everything done while they were down will get through
> when they're back.


Not so much a generator -- but either higher power UPS for each server to provide more time for proper shutdown, or, perhaps, given that there are a fair number of different servers in the room, a power backup configuration (like the larger rack based APC systems) which can handle power outages for the room centrally and provide more effective shut down sequences.

Admittedly, this is a higher cost approach -- but it is NOT a case of needing a generator (overkill in my view as well), but rather adequate power coverage.

I'd also be concerned that the root cause of the circuit breaker failure is (at least as far as the available reports are concerned), not yet known.

Barry Schnur
ID: 82831 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82873 - Posted: 27 Feb 2005, 1:01:20 UTC - in response to Message 82831.  

> Not so much a generator -- but either higher power UPS for each server to
> provide more time for proper shutdown, or, perhaps, given that there are a
> fair number of different servers in the room, a power backup configuration
> (like the larger rack based APC systems) which can handle power outages for
> the room centrally and provide more effective shut down sequences.

Remember that SETI is somewhat in flux: there are a small stack of machines for "classic" with their attendant infrastructure, and a small stack of machines for BOINC.

When Classic shuts down, all of the resources used for Classic become available, and that includes floor/rack space, UPSes, and even power into the closet.

They can probably replace batteries in servicable UPSes, test them and bring 'em back in to help carry the BOINC load. They can refurbish some of the classic servers to carry database replicas, and a whole bunch of things that I'm sure the operational side of SETI/BOINC have already thought about.

There is no big reason (given budgets) to buy a whole bunch of stuff just to bridge a few weeks -- and no place to put 'em until the classic boxes are out of the way anyway.

I'm slightly embarrassed to say that during my last outage things were less than smooth -- mostly due to batteries that had aged since the last test. I've replaced all of the batteries since.
ID: 82873 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82880 - Posted: 27 Feb 2005, 1:10:33 UTC - in response to Message 82831.  


> Not so much a generator -- but either higher power UPS for each server to
> provide more time for proper shutdown, or, perhaps, given that there are a
> fair number of different servers in the room, a power backup configuration
> (like the larger rack based APC systems) which can handle power outages for
> the room centrally and provide more effective shut down sequences.

This is a little off-topic, but one of the best things I picked up for my operation here is an automatic transfer switch, APC has their "SU041" and there are other brands.

The switch has two power cords and a set of outlets. You plug each cord into a UPS, plug the load into the switch, and you're set.

I can unplug a UPS and time it so I know how long it takes to run flat: when the UPS dies the load switches to the other one (which is still plugged in) and I can test until the batteries are flat and still have half of my battery capacity (in the other UPS).

I don't know what Berkeley thinks of eBay, but you can find them there.
ID: 82880 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 82882 - Posted: 27 Feb 2005, 1:11:46 UTC - in response to Message 82873.  

I understand the 'flux thing' -- Seti Classic will be killed off within the next 30 to 60 days (the theory being 30 days after emails have been sent out -- and those emails haven't been sent out). And that should free up equipment.

I don't know that either the Seti classic or Seti BOINC servers could be classified as 'small machines' though. I suspect we are talking mid sized Sun boxes with fairly hefty power requirements.

And clearly, they don't (or at least didn't) have the systems configured to shut themselves down gracefully in the event of a power outage.

UPS configuration with single 'regular' servers is a different kettle of fish compared to configuration with multiple interdependent servers, hence my suggestion regarding more full featured UPS subsystems.

Heck, the full featured UPS subsystem could be restricted to the BOINC servers as they considered more mission critical for the long haul.

I suspect you realize that 'ungraceful' shut downs can cause significant disruptions -- not just in terms of possible database corruption, but also physical damage to components. With higher end systems (Sun boxes), costs to recover and go up dramatically both in time/effort to recover and costs for equipment replacement.

Barry Schnur


ID: 82882 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 82884 - Posted: 27 Feb 2005, 1:13:31 UTC - in response to Message 82880.  


> I don't know what Berkeley thinks of eBay, but you can find them there.

Well Berkeley must think nicely of eBAY -- after all, they have significant funding constraints so saving money might be very much a priority item .

Barry Schnur
ID: 82884 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82890 - Posted: 27 Feb 2005, 1:23:11 UTC - in response to Message 82882.  

> I don't know that either the Seti classic or Seti BOINC servers could be
> classified as 'small machines' though. I suspect we are talking mid sized Sun
> boxes with fairly hefty power requirements.

I said that the stack was small, and I didn't say that it was small relative to anything specific. It seems like it's 8 or 10 machines, and while the stack may be small, the individual boxes are pretty good sized.

We know the server room is small, we've seen the pictures.

I've been arguing that the UPS(es) need to be big enough to handle a graceful shutdown. You also need enough capacity to shutdown, and then carry a complete reboot and shutdown if the power comes back and then immediately goes away, probably more than once. It follows that you want to power down sooner, rather than later to reserve capacity.
ID: 82890 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 82900 - Posted: 27 Feb 2005, 2:03:15 UTC

> I don't know what Berkeley thinks of eBay, but you can find them there.

>Well Berkeley must think nicely of eBAY -- after all, they have significant >funding constraints so saving money might be very much a priority item .

Unfortunately you are forgetting that they are part of the University of California. And in most state funded universitys in this country, anything costing more than $100 must be put out for bid, a process that usually takes atleast 6 months. So unless they can get a corporate or individual sponsor to donate the equipment, getting the needed equipment is a very hard thing to do.
ID: 82900 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 82932 - Posted: 27 Feb 2005, 4:00:37 UTC

I haven't seen one post addressing the most critical issue. They need a Generator and some small inexpensive UPSes. The small UPSes will keep the servers up until the generator kicks in. Once the generator starts all the beer refridgerators will keep the beer COLD. Come on, sheesh, they're in sunny California and can't just stick their beer outdoors.

tony
ID: 82932 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 82936 - Posted: 27 Feb 2005, 4:27:54 UTC - in response to Message 82890.  

> I've been arguing that the UPS(es) need to be big enough to handle a graceful
> shutdown. You also need enough capacity to shutdown, and then carry a
> complete reboot and shutdown if the power comes back and then immediately goes
> away, probably more than once. It follows that you want to power down sooner,
> rather than later to reserve capacity.

Right -- and for the number of systems they need to support here -- and will continue to need to support, it makes sense to go with a UPS system (rather than a collection of individual VA1500 or VA2200 boxes. Instead, something like this sort of hardware:

http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SURT10000RMXLT%2D2TF5

The real pain with a 'collection' of UPS instead of a 'support the rack' UPS is the ability to handle 'global' power outages -- just a lot of scurrying around.

Been there, done that.

Barry
ID: 82936 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 82937 - Posted: 27 Feb 2005, 4:29:36 UTC - in response to Message 82900.  

> Unfortunately you are forgetting that they are part of the University of
> California. And in most state funded universitys in this country, anything
> costing more than $100 must be put out for bid, a process that usually takes
> atleast 6 months. So unless they can get a corporate or individual sponsor to
> donate the equipment, getting the needed equipment is a very hard thing to do.

OK -- if they are constrained by red tape then you are quite right -- and things will break first because compliance with red tape often enough is counter productive.
ID: 82937 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : The beginning of the end?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.