Long outage...

Author	Message
Eric Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60	Message 1914728 - Posted: 24 Jan 2018, 7:17:44 UTC The outage ran long today because we needed to run down to the data center to swap some bad drives with new ones and reboot a few of the machine to pick up kernel and mysql updates. Sorry for the delay. @SETIEric@qoto.org (Mastodon) ID: 1914728 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1914730 - Posted: 24 Jan 2018, 7:30:25 UTC Thanks for the update Eric. I was about to pack it in for the night. You pulled a long shift. Appreciated. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1914730 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1914731 - Posted: 24 Jan 2018, 7:30:48 UTC - in response to Message 1914728. Thanks for the update. It's nice to know what's going on when things aren't working. Grant Darwin NT ID: 1914731 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1914736 - Posted: 24 Jan 2018, 7:41:38 UTC Thank for the update. Maybe things will run smoother now ... ID: 1914736 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1914741 - Posted: 24 Jan 2018, 7:52:53 UTC Last modified: 24 Jan 2018, 7:56:10 UTC Many of the Server Statuses are yet to update, and lots of Scheduler errors when reporting work. It's going to take while to recover from this outage... Grant Darwin NT ID: 1914741 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1914755 - Posted: 24 Jan 2018, 8:41:55 UTC Since we have not made contact yet I won't be going anywhere soon, but thanks for the update ID: 1914755 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1914758 - Posted: 24 Jan 2018, 8:44:35 UTC Last modified: 24 Jan 2018, 8:51:58 UTC And now we're starting to pick up work, downloads are stilling. Tried both 208.68.240.127 and 208.68.240.119 in my hosts file. No joy. EDIT- tried 208.68.240.119 again and stalled downloads cleared. Grant Darwin NT ID: 1914758 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1914761 - Posted: 24 Jan 2018, 8:54:33 UTC Think they loaded work b4 clocking off? ID: 1914761 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1914765 - Posted: 24 Jan 2018, 9:03:49 UTC - in response to Message 1914761. Last modified: 24 Jan 2018, 9:04:47 UTC Think they loaded work b4 clocking off? There were already around 60 files there to be split, and it usually takes about 2 hours after an outage for the splitters to get going again. Grant Darwin NT ID: 1914765 ·

marsinph Volunteer tester Send message Joined: 7 Apr 01 Posts: 172 Credit: 23,823,824 RAC: 0	Message 1914766 - Posted: 24 Jan 2018, 9:05:53 UTC - in response to Message 1914758. Hello Grant what is the name of 208.68.240.127 ? I know 119 : boinc2ssl.berkeley.edu (download server) but not the 127 ! So what is the name ? I will add to my host file) And now we're starting to pick up work, downloads are stilling. Tried both 208.68.240.127 and 208.68.240.119 in my hosts file. No joy. EDIT- tried 208.68.240.119 again and stalled downloads cleared. ID: 1914766 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1914770 - Posted: 24 Jan 2018, 9:18:09 UTC - in response to Message 1914766. Hello Grant what is the name of 208.68.240.127 ? I know 119 : boinc2ssl.berkeley.edu (download server) but not the 127 ! So what is the name ? I will add to my host file) Same name. There are 2 servers that handle downloads, the load is meant to be shared between them, but if one has issues then you can't download from it and have to wait till the other server gets used. Generally I keep that commented out, and just use whichever address is working when download issues occur. In the past when things went wrong it's generally been 208.68.240.127 that has had the problems. Grant Darwin NT ID: 1914770 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380	Message 1914780 - Posted: 24 Jan 2018, 10:32:00 UTC Running down the hill isn't too bad, but running back up again :-( Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1914780 ·

NorthCup Send message Joined: 6 Jun 99 Posts: 108 Credit: 50,093,984 RAC: 5	Message 1917075 - Posted: 5 Feb 2018, 14:12:31 UTC The lighthouse project of distributed computing - Seti - limp as a sick horse - stutters and paralyzes. Tow from one potion to the next and faints regularly. What can we do for you? You already have money, hardware and my full readiness from me - from us. What else do you need for Seti to become a racehorse? greetings Klaus ID: 1917075 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1917089 - Posted: 5 Feb 2018, 15:21:13 UTC - in response to Message 1917075. The lighthouse project of distributed computing - Seti - limp as a sick horse - stutters and paralyzes. Tow from one potion to the next and faints regularly. What can we do for you? You already have money, hardware and my full readiness from me - from us. What else do you need for Seti to become a racehorse? greetings Klaus Enough staff. ID: 1917089 ·

PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1	Message 1917114 - Posted: 5 Feb 2018, 16:23:55 UTC - in response to Message 1917089. With all due respect: * I wish we could know more about the seti organization, but in general the first response organizations have to "production" problems is to "hire more staff". But truthfully in most contexts, setting wise management policies and priorities is more effective than more people. Please. Being a university project is no excuse for poor execution, if in fact that is what is happening. * Looking at other DC projects recently it appears that Seti has the most inhomogeneous bespoken set of hardware in BOINC, and the software manageability may be worse. Perhaps, it is time to migrate to a more "planned" architecture. Temporarily scale back the project if necessary to enable growth in the future. I'm sure that would be a lot of work. But then work on the web pages used to market the project because as it stands now some of those pages are so old or out of date that it gives Seti a black eye. * I really don't understand the GBT data transfer problem and its history. But how is it that we are depending on a resource (network bandwidth) that is well known to be too meager to meet the needs or demands? We should remember that Band Aids eventually fall off, frequently before the healing is complete. * As for today, without any tasks available why don't they just shutdown the project a day early and catch up on some tasks not getting done? ID: 1917114 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1917175 - Posted: 5 Feb 2018, 20:01:41 UTC - in response to Message 1917114. Work is available today from both observatories. The GBT-Berkeley data connection is on a network that is faster than the commercial Internet. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1917175 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1917299 - Posted: 6 Feb 2018, 5:33:59 UTC - in response to Message 1917114. With all due respect: * I wish we could know more about the seti organization, but in general the first response organizations have to "production" problems is to "hire more staff". But truthfully in most contexts, setting wise management policies and priorities is more effective than more people. Not when you don't have the staff to actually carry out those policies & priorities. 1 person can't do the work of 5 or more when 5 or more are necessary to do what's needed to meet those policies & priorities. Perhaps, it is time to migrate to a more "planned" architecture. I agree, just replacing the HDD current storage with AFAs (All Flash Arrays) would cure a lot if problems IMHO. But where are we going to get the $500,000 from? But how is it that we are depending on a resource (network bandwidth) that is well known to be too meager to meet the needs or demands? A 100Gb/s backbone really can't be considered meager. And getting things by network is more reliable than relying on people to load & unload & mail HDDs with even more points of failure than using a network connection. Grant Darwin NT ID: 1917299 ·

NorthCup Send message Joined: 6 Jun 99 Posts: 108 Credit: 50,093,984 RAC: 5	Message 1917439 - Posted: 7 Feb 2018, 12:24:49 UTC - in response to Message 1917089. The lighthouse project of distributed computing - Seti - limp as a sick horse - stutters and paralyzes. Tow from one potion to the next and faints regularly. What can we do for you? You already have money, hardware and my full readiness from me - from us. What else do you need for Seti to become a racehorse? greetings Klaus Enough staff. I understand. In Germany volunteer helpers are available for such projects. Would that be a solution for your problem? Have you ever tried to recruit staff from the circle of Seti enthusiasts? Greetings, Klaus ID: 1917439 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.