Long outage...

Message boards : News : Long outage...
Message board moderation

To post messages, you must log in.

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1914728 - Posted: 24 Jan 2018, 7:17:44 UTC

The outage ran long today because we needed to run down to the data center to swap some bad drives with new ones and reboot a few of the machine to pick up kernel and mysql updates.

Sorry for the delay.
@SETIEric@qoto.org (Mastodon)

ID: 1914728 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1914730 - Posted: 24 Jan 2018, 7:30:25 UTC

Thanks for the update Eric. I was about to pack it in for the night. You pulled a long shift. Appreciated.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1914730 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1914731 - Posted: 24 Jan 2018, 7:30:48 UTC - in response to Message 1914728.  

Thanks for the update.
It's nice to know what's going on when things aren't working.
Grant
Darwin NT
ID: 1914731 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1914736 - Posted: 24 Jan 2018, 7:41:38 UTC

Thank for the update. Maybe things will run smoother now ...
ID: 1914736 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1914741 - Posted: 24 Jan 2018, 7:52:53 UTC
Last modified: 24 Jan 2018, 7:56:10 UTC

Many of the Server Statuses are yet to update, and lots of Scheduler errors when reporting work.
It's going to take while to recover from this outage...
Grant
Darwin NT
ID: 1914741 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 1914755 - Posted: 24 Jan 2018, 8:41:55 UTC

Since we have not made contact yet I won't be going anywhere soon, but thanks for the update
ID: 1914755 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1914758 - Posted: 24 Jan 2018, 8:44:35 UTC
Last modified: 24 Jan 2018, 8:51:58 UTC

And now we're starting to pick up work, downloads are stilling.
Tried both 208.68.240.127 and 208.68.240.119 in my hosts file.
No joy.

EDIT- tried 208.68.240.119 again and stalled downloads cleared.
Grant
Darwin NT
ID: 1914758 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 1914761 - Posted: 24 Jan 2018, 8:54:33 UTC

Think they loaded work b4 clocking off?
ID: 1914761 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1914765 - Posted: 24 Jan 2018, 9:03:49 UTC - in response to Message 1914761.  
Last modified: 24 Jan 2018, 9:04:47 UTC

Think they loaded work b4 clocking off?

There were already around 60 files there to be split, and it usually takes about 2 hours after an outage for the splitters to get going again.
Grant
Darwin NT
ID: 1914765 · Report as offensive
Profile marsinph
Volunteer tester

Send message
Joined: 7 Apr 01
Posts: 172
Credit: 23,823,824
RAC: 0
Belgium
Message 1914766 - Posted: 24 Jan 2018, 9:05:53 UTC - in response to Message 1914758.  

Hello Grant what is the name of 208.68.240.127 ?
I know 119 : boinc2ssl.berkeley.edu (download server)
but not the 127 ! So what is the name ? I will add to my host file)


And now we're starting to pick up work, downloads are stilling.
Tried both 208.68.240.127 and 208.68.240.119 in my hosts file.
No joy.

EDIT- tried 208.68.240.119 again and stalled downloads cleared.

ID: 1914766 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1914770 - Posted: 24 Jan 2018, 9:18:09 UTC - in response to Message 1914766.  

Hello Grant what is the name of 208.68.240.127 ?
I know 119 : boinc2ssl.berkeley.edu (download server)
but not the 127 ! So what is the name ? I will add to my host file)

Same name. There are 2 servers that handle downloads, the load is meant to be shared between them, but if one has issues then you can't download from it and have to wait till the other server gets used.
Generally I keep that commented out, and just use whichever address is working when download issues occur. In the past when things went wrong it's generally been 208.68.240.127 that has had the problems.
Grant
Darwin NT
ID: 1914770 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22526
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1914780 - Posted: 24 Jan 2018, 10:32:00 UTC

Running down the hill isn't too bad, but running back up again :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1914780 · Report as offensive
Profile NorthCup

Send message
Joined: 6 Jun 99
Posts: 108
Credit: 50,093,984
RAC: 5
Germany
Message 1917075 - Posted: 5 Feb 2018, 14:12:31 UTC

The lighthouse project of distributed computing - Seti - limp as a sick horse - stutters and paralyzes. Tow from one potion to the next and faints regularly. What can we do for you? You already have money, hardware and my full readiness from me - from us. What else do you need for Seti to become a racehorse? greetings Klaus
ID: 1917075 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1917089 - Posted: 5 Feb 2018, 15:21:13 UTC - in response to Message 1917075.  

The lighthouse project of distributed computing - Seti - limp as a sick horse - stutters and paralyzes. Tow from one potion to the next and faints regularly. What can we do for you? You already have money, hardware and my full readiness from me - from us. What else do you need for Seti to become a racehorse? greetings Klaus
Enough staff.
ID: 1917089 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 1917114 - Posted: 5 Feb 2018, 16:23:55 UTC - in response to Message 1917089.  

With all due respect:

* I wish we could know more about the seti organization, but in general the first response organizations have to "production" problems is to "hire more staff". But truthfully in most contexts, setting wise management policies and priorities is more effective than more people. Please. Being a university project is no excuse for poor execution, if in fact that is what is happening.

* Looking at other DC projects recently it appears that Seti has the most inhomogeneous bespoken set of hardware in BOINC, and the software manageability may be worse. Perhaps, it is time to migrate to a more "planned" architecture. Temporarily scale back the project if necessary to enable growth in the future. I'm sure that would be a lot of work. But then work on the web pages used to market the project because as it stands now some of those pages are so old or out of date that it gives Seti a black eye.

* I really don't understand the GBT data transfer problem and its history. But how is it that we are depending on a resource (network bandwidth) that is well known to be too meager to meet the needs or demands? We should remember that Band Aids eventually fall off, frequently before the healing is complete.

* As for today, without any tasks available why don't they just shutdown the project a day early and catch up on some tasks not getting done?
ID: 1917114 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1917175 - Posted: 5 Feb 2018, 20:01:41 UTC - in response to Message 1917114.  

Work is available today from both observatories. The GBT-Berkeley data connection is on a network that is faster than the commercial Internet.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1917175 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1917299 - Posted: 6 Feb 2018, 5:33:59 UTC - in response to Message 1917114.  

With all due respect:

* I wish we could know more about the seti organization, but in general the first response organizations have to "production" problems is to "hire more staff". But truthfully in most contexts, setting wise management policies and priorities is more effective than more people.

Not when you don't have the staff to actually carry out those policies & priorities. 1 person can't do the work of 5 or more when 5 or more are necessary to do what's needed to meet those policies & priorities.

Perhaps, it is time to migrate to a more "planned" architecture.

I agree, just replacing the HDD current storage with AFAs (All Flash Arrays) would cure a lot if problems IMHO. But where are we going to get the $500,000 from?

But how is it that we are depending on a resource (network bandwidth) that is well known to be too meager to meet the needs or demands?

A 100Gb/s backbone really can't be considered meager.
And getting things by network is more reliable than relying on people to load & unload & mail HDDs with even more points of failure than using a network connection.
Grant
Darwin NT
ID: 1917299 · Report as offensive
Profile NorthCup

Send message
Joined: 6 Jun 99
Posts: 108
Credit: 50,093,984
RAC: 5
Germany
Message 1917439 - Posted: 7 Feb 2018, 12:24:49 UTC - in response to Message 1917089.  

The lighthouse project of distributed computing - Seti - limp as a sick horse - stutters and paralyzes. Tow from one potion to the next and faints regularly. What can we do for you? You already have money, hardware and my full readiness from me - from us. What else do you need for Seti to become a racehorse? greetings Klaus
Enough staff.


I understand. In Germany volunteer helpers are available for such projects. Would that be a solution for your problem? Have you ever tried to recruit staff from the circle of Seti enthusiasts?

Greetings, Klaus
ID: 1917439 · Report as offensive

Message boards : News : Long outage...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.