Get Out of My House (Jan 18 2011)

Message boards : Technical News : Get Out of My House (Jan 18 2011)

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile morpheus
Avatar

Send message
Joined: 5 Jun 99
Posts: 69
Credit: 38,695,374
RAC: 12,919
Germany
Message 1068205 - Posted: 19 Jan 2011, 12:15:11 UTC

Thanks for the update, Matt.
And good luck with Bruno.
.:morpheus:.
ID: 1068205 · Report as offensive
Profile Chris SCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 38190
Credit: 21,443,008
RAC: 27,636
United Kingdom
Message 1068210 - Posted: 19 Jan 2011, 12:39:27 UTC

Sounds like Synergy got there not a moment too soon! Good luck in sorting things out, we'll still be here looking forward to business as normal. :-)
Those are my principles, and if you don't like them ... well, I have others.
Groucho Marx 1895-1977

I also have mine, and if you don't like them ... tough, live with it.
Chris S 2016

Member of UCB Charter Hill Society
ID: 1068210 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1057
Credit: 802,388
RAC: 86
United States
Message 1068240 - Posted: 19 Jan 2011, 14:47:38 UTC - in response to Message 1068210.  

...we'll still be here looking forward to business as normal. :-)


This is business as usual/normal. LOL
ID: 1068240 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,538,216
RAC: 0
United States
Message 1068342 - Posted: 19 Jan 2011, 21:34:34 UTC
Last modified: 19 Jan 2011, 21:35:02 UTC

Good luck!!! Whether with Synergy or another machine, hope things go relative smoothly.

I have a question...Is it possible to separate uploads/downloads?

If the uploads could also be routed through the new lab Gbit line. This could leave the Hurricane Elec. line purely for downloads...

??
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 1068342 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8205
Credit: 4,373,372
RAC: 5,550
United States
Message 1068388 - Posted: 19 Jan 2011, 23:50:23 UTC - in response to Message 1068210.  

Sounds like Synergy got there not a moment too soon!)

Or bruno, knowing synergy was on the way, held out as long as he could before crashing. Either way, bruno has been a good soldier, hope you can get him back in business again.
Donald
Infernal Optimist / Submariner, retired
ID: 1068388 · Report as offensive
Saaby900T

Send message
Joined: 24 Dec 10
Posts: 76
Credit: 4,971,171
RAC: 0
United States
Message 1068443 - Posted: 20 Jan 2011, 5:18:21 UTC

When is looking like this is going to get back online?
ID: 1068443 · Report as offensive
Profile Geoff Gong

Send message
Joined: 11 Dec 99
Posts: 53
Credit: 1,540,658
RAC: 215
Australia
Message 1068447 - Posted: 20 Jan 2011, 5:36:53 UTC

Hi
Server status shows ONLY AP Splitters
Lando and Vader not running ,both are doing other
tasks
Is the Server Status page affected ?
ID: 1068447 · Report as offensive
edwartr
Avatar

Send message
Joined: 2 May 00
Posts: 31
Credit: 65,148,334
RAC: 28,743
United States
Message 1068454 - Posted: 20 Jan 2011, 6:11:17 UTC - in response to Message 1068447.  
Last modified: 20 Jan 2011, 6:11:46 UTC

Make sure and check the date/time on the Server Status page:

[As of 18 Jan 2011 17:10:05 UTC]

It is 20 Jan 2011 06:11 UTC as I am posting this.
I gotta fever and the only prescription is more cowbell.
ID: 1068454 · Report as offensive
Profile Adam Weichel

Send message
Joined: 30 Jul 02
Posts: 22
Credit: 12,044,735
RAC: 1,101
Canada
Message 1068533 - Posted: 20 Jan 2011, 14:34:57 UTC

It's good to hear that everything's correctable, Matt. Is there an updated hardware requirement list that's available? Looking to donate some more parts in the spring. :)
Computer nut, Distributed Computing freak, Jeeper and Dodge Ram driver.

Life is worth living... and worth discovering.

I run VMWare ESXi Free - why don't you?
ID: 1068533 · Report as offensive
Profile Jaye Ellen

Send message
Joined: 29 Nov 08
Posts: 24
Credit: 12,176,953
RAC: 6,230
United States
Message 1068565 - Posted: 20 Jan 2011, 16:21:10 UTC - in response to Message 1068032.  

Keep up the excellent work, Matt and let's try to revive Bruno before his untimely demise ??? All in fun though, I was just wondering why my uploads were just sitting here and now I know, and can stop worrying ...

Jaye Ellen
ID: 1068565 · Report as offensive
Profile Todd Hebert
Volunteer tester
Avatar

Send message
Joined: 16 Jun 00
Posts: 648
Credit: 225,305,347
RAC: 28,102
United States
Message 1068606 - Posted: 20 Jan 2011, 17:36:12 UTC

At least when Synergy (New Bruno) comes up it will be a true test to see if it can handle the load of the project with everyone uploading their completed WU's. I have over 4k to report alone.

Todd
ID: 1068606 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 65
Credit: 6,107,594
RAC: 487
United States
Message 1068691 - Posted: 20 Jan 2011, 20:00:15 UTC - in response to Message 1068032.  

Bruno hardware failure - this suggests bruno (the upload server) is a single point of failure, is it feasible to have two systems performing the same function here?

Perhaps bruno should have a companion, borat (I'm probably thinking of the wrong bruno here.)
ID: 1068691 · Report as offensive
Profile Gary CharpentierCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 18653
Credit: 21,505,879
RAC: 18,930
United States
Message 1068725 - Posted: 20 Jan 2011, 20:59:15 UTC - in response to Message 1068691.  

Bruno hardware failure - this suggests bruno (the upload server) is a single point of failure, is it feasible to have two systems performing the same function here?

Perhaps bruno should have a companion, borat (I'm probably thinking of the wrong bruno here.)

Everything on the BOINC side of the house is a single point of failure. The only system that has a hot backup is the science database.

ID: 1068725 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 1068781 - Posted: 20 Jan 2011, 22:58:08 UTC - in response to Message 1068691.  

Bruno hardware failure - this suggests bruno (the upload server) is a single point of failure, is it feasible to have two systems performing the same function here?

Perhaps bruno should have a companion, borat (I'm probably thinking of the wrong bruno here.)

BOINC is supposed to make it possible to do big science on a vanishingly small budget. That means redundant servers are often out of the question.

So instead of redundant servers so that something can always take work, we have a client that handles outages gracefully.

It has to be that way because the standard solution (throw money at the problem) is not available to BOINC projects.
ID: 1068781 · Report as offensive
Profile Todd Hebert
Volunteer tester
Avatar

Send message
Joined: 16 Jun 00
Posts: 648
Credit: 225,305,347
RAC: 28,102
United States
Message 1069118 - Posted: 21 Jan 2011, 21:44:49 UTC - in response to Message 1068781.  

I would disagree with the statement that we can make do without redundant servers for this project. Considering the number of hosts are now over 2 million for just this project that is an incredible amount of computing power and really was the spirit of the architectural design. The work still needs to go somewhere.

I don't think some people here realize the scope of the data returned and the impact of the loss of storage devices. To serve the project as has been demanded by the users, some are very vocal, there does need to be many factors considered.

It isn't about throwing money at a problem in hopes of a resolution. Things break and need to be replaced over time - requiring money or donations to achieve the goal. Not much different that expecting to drive your new car with the same tires for 150k miles or not getting an oil change.

When your dataset increases, your load and time between failures also increases. When trying to make do with piecemeal equipment it can be very challenging to make a go of it.

Todd

Bruno hardware failure - this suggests bruno (the upload server) is a single point of failure, is it feasible to have two systems performing the same function here?

Perhaps bruno should have a companion, borat (I'm probably thinking of the wrong bruno here.)

BOINC is supposed to make it possible to do big science on a vanishingly small budget. That means redundant servers are often out of the question.

So instead of redundant servers so that something can always take work, we have a client that handles outages gracefully.

It has to be that way because the standard solution (throw money at the problem) is not available to BOINC projects.


ID: 1069118 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45964
Credit: 815,555,278
RAC: 123,669
United States
Message 1069139 - Posted: 21 Jan 2011, 22:47:28 UTC - in response to Message 1069118.  

Aye, Capn' Todd....

I face challenges just keeping 8 crunching rigs online some days.

Power supplies age, motherboard components age, things change.
Adjustments to settings are required.

Of course, the kitties push the rigs pretty hard, so any change in tolerances can sometimes throw things outta whack.

I suspect that Eric, Matt, and crew sleep much better recently with the new servers at work.....

I know that even with the latest short outage, we are enjoying the longest streak of uptime in Seti history for many moons.

You and all other contributors have done well.
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1069139 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 65
United States
Message 1069235 - Posted: 22 Jan 2011, 1:25:49 UTC - in response to Message 1069118.  

I would disagree with the statement that we can make do without redundant servers for this project. Considering the number of hosts are now over 2 million for just this project that is an incredible amount of computing power and really was the spirit of the architectural design. The work still needs to go somewhere.

I don't think some people here realize the scope of the data returned and the impact of the loss of storage devices. To serve the project as has been demanded by the users, some are very vocal, there does need to be many factors considered.

It isn't about throwing money at a problem in hopes of a resolution. Things break and need to be replaced over time - requiring money or donations to achieve the goal. Not much different that expecting to drive your new car with the same tires for 150k miles or not getting an oil change.

When your dataset increases, your load and time between failures also increases. When trying to make do with piecemeal equipment it can be very challenging to make a go of it.

Todd

Bruno hardware failure - this suggests bruno (the upload server) is a single point of failure, is it feasible to have two systems performing the same function here?

Perhaps bruno should have a companion, borat (I'm probably thinking of the wrong bruno here.)

BOINC is supposed to make it possible to do big science on a vanishingly small budget. That means redundant servers are often out of the question.

So instead of redundant servers so that something can always take work, we have a client that handles outages gracefully.

It has to be that way because the standard solution (throw money at the problem) is not available to BOINC projects.


They do have some redundancy. The DB is mirrored in real time. They have raid for their drive arrays. But having completely redundant servers is too expensive.


BOINC WIKI
ID: 1069235 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 2582
Credit: 34,797,822
RAC: 19,803
United States
Message 1069546 - Posted: 22 Jan 2011, 18:08:51 UTC - in response to Message 1069118.  
Last modified: 22 Jan 2011, 18:11:31 UTC

[snip]

When your dataset increases, your load and time between failures also increases. When trying to make do with piecemeal equipment it can be very challenging to make a go of it.

Todd


It's been my experience that when the dataset increases, MTBF (Mean Time Between Failures) decreases... and your load increases by a factor of two or more (for a dataset double, your load quadruples... not saying that the increase is always exponential, though...)
.
ID: 1069546 · Report as offensive
Previous · 1 · 2

Message boards : Technical News : Get Out of My House (Jan 18 2011)


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.