Scheduler is back up...


log in

Advanced search

Message boards : News : Scheduler is back up...

Previous · 1 · 2 · 3 · Next
Author Message
Profile dancer42
Volunteer tester
Send message
Joined: 2 Jun 02
Posts: 436
Credit: 1,160,474
RAC: 80
United States
Message 1312146 - Posted: 7 Dec 2012, 14:05:51 UTC - in response to Message 1312035.
Last modified: 7 Dec 2012, 14:06:26 UTC

Paganswyrd, I'm so glad you are willing to do all that work and buy all that equipment on a budget of $10k a year.

==================================================
Hay that is my budget and yet i have a star.

MLOL
____________

Profile Brother Frank
Send message
Joined: 10 Dec 11
Posts: 26
Credit: 15,142,410
RAC: 0
United States
Message 1312362 - Posted: 8 Dec 2012, 1:12:09 UTC - in response to Message 1311606.

I've been getting new work since yesterday. Then there was a blip for a while when I was not able to upload work easily. It must have gone on for most of the day today with my desktop with one Nvidia Ti 550 Card. Today has been better. My RAC really took a hit over the last two to three weeks, but for perhaps ten days I was not able to get any new work at all or upload any either. Soon after Eric's message about Servers being up but that there would still be some problems everything started to get better. I had turned most of my computers over to GPU Grid and Rosetta @ Home, and Milky Way, my backup projects.

Now I am turning them slowly over to Seti at Home again by not getting any new work from my alternates. I don't believe Seti @ Home is able to handle the work load reliably anymore and will move into and out of these alternates and World Community Grid too for my little i3 notebook. I really believe that staffing issues and low priority for them at UC Berkeley is hurting the project a lot. Since that is true from my experience over the past year, It's all a matter or spreading computing capability around some more so that disease fighting and cure discovery projects get some benefit. Don't want to get compulsive anymore about just running Seti work. If I just adjust my attitude more toward doing the best I can and not getting too unhappy when Seti at Home goes down, I'll be much happier. I enjoy sharing on Google+ a great deal too and writing back and forth to people there with similar interests (photography, nature, helping people to understand complex topics, science fiction, human interest stories, animals and conservation, and having a smile on my face and enjoying life, and my computers and love of the web and sharing). I'll start making some donations to the Graphics Processors Users Group for Seti at Home equipment needs and to Seti at Home directly too. Year after year it will add up and I really want Seti at Home to succeed. I want to be positive, constructive in any criticism, and will try to help others on Seti at Home as I gain experience in my own right with using this software and managing my computers using BOINC. One thing is for sure. I am getting more and more committed to Citizen Science and Citizen Computing Projects.
____________
Frank Elliott,Member of Carepages.com,a chronic illness support site. Was FrankLivingFully there.Free user name & pw needed. My Google+ Profile is:
https://profiles.google.com/u/0/10871372137584 Science,SF,Space,Astronomy,Medicine,Psyc Topics.

Profile betregerProject donor
Avatar
Send message
Joined: 29 Jun 99
Posts: 2672
Credit: 5,515,829
RAC: 6,532
United States
Message 1312365 - Posted: 8 Dec 2012, 1:35:31 UTC - in response to Message 1312362.

Brother Frank, good attitude.
____________

Profile Blurf
Volunteer tester
Send message
Joined: 2 Sep 06
Posts: 7641
Credit: 7,076,095
RAC: 2,274
United States
Message 1313711 - Posted: 10 Dec 2012, 21:46:48 UTC - in response to Message 1312034.

Quite frankly I don't see how you get anything done! Priority #1 should be a fail-safe program that would involve some sort of emergency backup!!!! At least A.T.&T. sees to it that any program that involves as much as S.E.T.I. has many fail-safe options. You probably would have paid for it with all the down time you have incurred since I've joined! Next you should get a programmer that knows what the heck he is doing! I cannot believe the amount of time lost due to software issues. S.E.T.I. has effectively been offline since the latter part of November. Can you imagine the uproar if A.T.&T. had been off-line as much as S.E.T.I. has been? By the way, in case you did not know it, all of A.T.&T. is electronic, or in other words computers. All the Area Codes, first three digits, and then local numbers, are computer driven. There are no mechanical switches left!! So imagine what telephone service would be like if A.T.&T. allowed their computers to be as messed up as yours!?!? So your working on a shoestring! Okay. I would rather you be down while you got a backup program (and got it together with software) than be teased! Sorry guys! But I don't let my P.C. to fall into such a mess. But then I've worked with computers since 1980 and worked for Ma Bell for 26 of those years. I'm used to superior quality!


You're comparing apples to oranges and it's not fair to either organization. AT&T is a multi-billion dollar organization--Seti is run by a University on grants. Your comparison is not workable.
____________


N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 12674
Credit: 15,023,059
RAC: 9,635
United States
Message 1313979 - Posted: 11 Dec 2012, 14:42:36 UTC - in response to Message 1313711.

Quite frankly I don't see how you get anything done! Priority #1 should be a fail-safe program that would involve some sort of emergency backup!!!! At least A.T.&T. sees to it that any program that involves as much as S.E.T.I. has many fail-safe options. You probably would have paid for it with all the down time you have incurred since I've joined! Next you should get a programmer that knows what the heck he is doing! I cannot believe the amount of time lost due to software issues. S.E.T.I. has effectively been offline since the latter part of November. Can you imagine the uproar if A.T.&T. had been off-line as much as S.E.T.I. has been? By the way, in case you did not know it, all of A.T.&T. is electronic, or in other words computers. All the Area Codes, first three digits, and then local numbers, are computer driven. There are no mechanical switches left!! So imagine what telephone service would be like if A.T.&T. allowed their computers to be as messed up as yours!?!? So your working on a shoestring! Okay. I would rather you be down while you got a backup program (and got it together with software) than be teased! Sorry guys! But I don't let my P.C. to fall into such a mess. But then I've worked with computers since 1980 and worked for Ma Bell for 26 of those years. I'm used to superior quality!


You're comparing apples to oranges and it's not fair to either organization. AT&T is a multi-billion dollar organization--Seti is run by a University on grants. Your comparison is not workable.

Also, Seti@home is not critical to anybody. If AT&T's computers crash, it affects billions of people who depend on it for personal communication, television, business, and even their very lives.

When S@H is down, it has no such effect on anybody.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


zork
Send message
Joined: 22 Jun 99
Posts: 5
Credit: 354,680
RAC: 1,159
Canada
Message 1314010 - Posted: 11 Dec 2012, 16:43:43 UTC

So, is the scheduler back up and running?

I see mention of the scheduler being up since last week, but I still have not received any work units since November, in spite of clicking the Update button dozens of times.

Thanks

Joseph Wilker
Send message
Joined: 3 Sep 99
Posts: 1
Credit: 3,887,626
RAC: 493
United States
Message 1314014 - Posted: 11 Dec 2012, 16:52:27 UTC - in response to Message 1314010.

I had received updates a couple of days ago, but it appears to be down again today. :(
____________

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7198
Credit: 29,211,872
RAC: 35,971
United Kingdom
Message 1314027 - Posted: 11 Dec 2012, 17:10:59 UTC - in response to Message 1314014.

I had received updates a couple of days ago, but it appears to be down again today. :(


Notice on front page:

Weekly Outage and Initial Catch Up
Every Tuesday morning (Pacific time) we begin a four hour data distribution outage for database and systems maintenance. The upload/download servers will be offline during this time. Afterwards you may experience connectivity issues for several more hours as the servers catch up with demand.



Second notice down this week.
____________


Today is life, the only life we're sure of. Make the most of today.

zork
Send message
Joined: 22 Jun 99
Posts: 5
Credit: 354,680
RAC: 1,159
Canada
Message 1314047 - Posted: 11 Dec 2012, 23:38:59 UTC - in response to Message 1314027.
Last modified: 11 Dec 2012, 23:41:39 UTC

Weekly Outage and Initial Catch Up
Every Tuesday morning (Pacific time) we begin a four hour data distribution outage for database and systems maintenance. The upload/download servers will be offline during this time. Afterwards you may experience connectivity issues for several more hours as the servers catch up with demand.



I saw that but there is no follow up news that explains whether anything was resolved other than a hope to be back up on Dec. 5.

One simple paragraph on the home page stating the current status would be much appreciated.

Second notice down this week.


That's not this week but is also from Dec. 5

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7198
Credit: 29,211,872
RAC: 35,971
United Kingdom
Message 1314053 - Posted: 11 Dec 2012, 23:45:15 UTC
Last modified: 11 Dec 2012, 23:48:27 UTC

Resolutions are normally posted in the technical news.

Jeff Cobb posted this on the 6th.

We have recently come out of a painful outage. Last Thursday, 11/29, there was an unexpected power outage at Space Sciences Lab. It lasted some 20 minutes. Eric came over as quickly as he could to shut machines down, but he works in another building from where our machine room is, so the UPS's had run out their fairly short on-battery time by the time he got there. It was a perfect storm in that both Matt and I (who work a few feet from the machine room) were both out.


Most machines came through OK, but three did not. Lando, an older administrative work horse (and splitter machine) appears to be dead. We have some spares from which to choose its replacement. More tragic was the fact that the master BOINC database, and its replica, suffered unrepairable corruption. This was an astonishing bit of bad luck. Both machines are on UPS and both machines have battery backed RAID controllers. One would think that all database logging would have at least made it to the RAID controller, but it obviously did not.

In order to recover the master database, we had to actually delete all of the underlying files and then recreate all of the databases from scratch before recovering from backup. A simple recovery from the backup did not work. After recreating the databases and then recovering from the backup, we ran all of the MySQL binary logs to recover up to a point in time just before the outage. Then we took a fresh backup of the database in case the next step did more harm than good. The next step was to run an extensive table check/repair on all tables in both the production and beta databases. All tables reported OK. Good! We then brought the projects up and used the fresh backup to restore the replica.

One might ask why we don't have machines automatically shut down in an on-battery situation. A good question with a lot of history. To make a long story short, our server complex has enough cross dependencies that if machines come down in the "wrong" order, other machines can hang. Plus some of of old UPS's would hiccup and cause a spurious shutdown (I'm not sure if our current crop have this problem). This was enough of a headache that we went with a very simple design. Our database machines would have battery backed RAID and be on UPS with no automatic shutdown. The theory was that the UPS would hold the machines for the duration of very short (one or two minute) power outages and, beyond that, the RAID controllers would save any pending IO. This very simple design has served us well but, as we see, not in all cases.

Eric came up with a good compromise. We will configure the BOINC replica database machine to immediately shut down (after stopping the database and unmounting its file system in case the shutdown hangs) upon detecting an on-battery condition. Nothing is dependent on this machine, so a spurious shutdown would not be a disaster. This should prevent a disaster of this magnitude from recurring.


That's not this week but is also from Dec. 5


As the outage is a weekly event it appears automatically every Tuesday. This week the outage notice was pushed down for some reason.
____________


Today is life, the only life we're sure of. Make the most of today.

zork
Send message
Joined: 22 Jun 99
Posts: 5
Credit: 354,680
RAC: 1,159
Canada
Message 1314063 - Posted: 12 Dec 2012, 0:00:12 UTC - in response to Message 1314053.

Thanks Bernie.

I think I saw that on the weekend but I was asking because I have not been able to connect since November, and that news is getting on six days old now.

It would be nice to have an "in your face" status message, so we can see if any of their attempts have made progress without searching the news or message boards.

I'll shut up now, but I'm just anxious to crunch, being newly back in the fold after years of not contributing.

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7198
Credit: 29,211,872
RAC: 35,971
United Kingdom
Message 1314069 - Posted: 12 Dec 2012, 0:27:32 UTC - in response to Message 1314063.

Thanks Bernie.

I think I saw that on the weekend but I was asking because I have not been able to connect since November, and that news is getting on six days old now.

It would be nice to have an "in your face" status message, so we can see if any of their attempts have made progress without searching the news or message boards.

I'll shut up now, but I'm just anxious to crunch, being newly back in the fold after years of not contributing.


Well if you can't connect I would guess the problem is at your end as I have several machines on here and SETI-Beta and before the outage all were connecting fine, and have started connecting again now.

Were you using a "proxy" during the scheduler problem, if so it will have barred you by now and it will work as well without one.

Do you know how to look at the BOINC logs? When you start the first 20 lines or so would give a good indication if something is wrong.


____________


Today is life, the only life we're sure of. Make the most of today.

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 112
Credit: 25,680,227
RAC: 7,448
United States
Message 1314258 - Posted: 12 Dec 2012, 13:35:28 UTC - in response to Message 1314063.
Last modified: 12 Dec 2012, 13:36:30 UTC

One thing I see, Zork, is that you have recently added Einstein. Einstein usually overloads a system with work units that have an extremely short deadline. This makes it difficult to get work from other projects, especially if you haven't done any processing on those other projects for awhile.

I suggest you first try the following in the sequence listed, as it may be a simple "jump start" fix.

1. Select the Projects tab in BOINC, select Einstein@Home, then click the Suspend button.
2. Make sure SETI@home is not suspended. If it is, select SETI@home and click the Resume button.
3. Make sure SETI is selected and then click the Update button.
4. Check the Event Log to see if you received any tasks.
5. If you do not receive SETI tasks, return to the Projects tab. Wait until the clock on SETI@home has run down, then try the Update button again.
6. If you still don't get any SETI tasks, repeat steps 3-5.
7. If you do receive new SETI tasks, or if, after a reasonable number of step 3-5 repetitions, you are unsuccessful, be sure to select Einstein@Home and click the Resume button.

If the above does not work, let us know and we can try some other possibilities.

Good luck! :)

zork
Send message
Joined: 22 Jun 99
Posts: 5
Credit: 354,680
RAC: 1,159
Canada
Message 1314308 - Posted: 12 Dec 2012, 17:09:36 UTC - in response to Message 1314069.

Thanks Bernie.

I think I saw that on the weekend but I was asking because I have not been able to connect since November, and that news is getting on six days old now.

It would be nice to have an "in your face" status message, so we can see if any of their attempts have made progress without searching the news or message boards.

I'll shut up now, but I'm just anxious to crunch, being newly back in the fold after years of not contributing.


Well if you can't connect I would guess the problem is at your end as I have several machines on here and SETI-Beta and before the outage all were connecting fine, and have started connecting again now.

Were you using a "proxy" during the scheduler problem, if so it will have barred you by now and it will work as well without one.

Do you know how to look at the BOINC logs? When you start the first 20 lines or so would give a good indication if something is wrong.



No proxy on this end.

My real issue has been that, for days, I have had no idea whether the issue has still been on the other end or mine.

zork
Send message
Joined: 22 Jun 99
Posts: 5
Credit: 354,680
RAC: 1,159
Canada
Message 1314309 - Posted: 12 Dec 2012, 17:13:06 UTC - in response to Message 1314258.

One thing I see, Zork, is that you have recently added Einstein. Einstein usually overloads a system with work units that have an extremely short deadline. This makes it difficult to get work from other projects, especially if you haven't done any processing on those other projects for awhile.

I suggest you first try the following in the sequence listed, as it may be a simple "jump start" fix.

1. Select the Projects tab in BOINC, select Einstein@Home, then click the Suspend button.
2. Make sure SETI@home is not suspended. If it is, select SETI@home and click the Resume button.
3. Make sure SETI is selected and then click the Update button.
4. Check the Event Log to see if you received any tasks.
5. If you do not receive SETI tasks, return to the Projects tab. Wait until the clock on SETI@home has run down, then try the Update button again.
6. If you still don't get any SETI tasks, repeat steps 3-5.
7. If you do receive new SETI tasks, or if, after a reasonable number of step 3-5 repetitions, you are unsuccessful, be sure to select Einstein@Home and click the Resume button.

If the above does not work, let us know and we can try some other possibilities.

Good luck! :)



Thanks Cherokee150. This did it.

My work machine took two tries, but my home one did it on the first go.

I'm glad I finally knew to do something on my end other than click update.

Charles LaDue
Send message
Joined: 14 Aug 12
Posts: 4
Credit: 563,578
RAC: 51
Message 1314952 - Posted: 14 Dec 2012, 0:27:19 UTC

Whew....

Charles LaDue
Send message
Joined: 14 Aug 12
Posts: 4
Credit: 563,578
RAC: 51
Message 1314953 - Posted: 14 Dec 2012, 0:31:06 UTC - in response to Message 1312052.

Hmmm, is Seti being compared again against a commercial outfit with plenty of money? Is this being done again by a joker that in fact invented the personal home computer and therefore knows everything better than those know-nothings at the project?

When will people learn to read the front page, where it says: Keep your computer busy when SETI@home has no work - participate in other BOINC-based projects.

But no, "nag nag nag, complain complain complain. I know better than you guys how you should do things. I'll prove it by forking Seti into a full-fledged commercial outfit that does work." ... don't you?

Seti is open source. Nothing much that stands in your way to go out and prove that you can do it better. Well, nothing but your big mouth, that is.

And before you fall into the pitfall of UPS systems that people think should run for more than 20 minutes, read all about them. Now that you're warned, go and do it better. Hmmmm? :-)


Well Said !

Jack Wolf
Send message
Joined: 10 Apr 12
Posts: 1
Credit: 65,323
RAC: 31
United States
Message 1315063 - Posted: 14 Dec 2012, 11:43:33 UTC
Last modified: 14 Dec 2012, 11:46:16 UTC

I hope I pull the rabbit out of the hat sometime soon and find E.T. :-)

Profile Cornhusker
Send message
Joined: 20 Apr 09
Posts: 20
Credit: 18,151,727
RAC: 20,683
United States
Message 1325719 - Posted: 8 Jan 2013, 6:59:39 UTC - in response to Message 1311662.

Agreed. All their power outages have made me reconsider any ideas I ever had of moving to California.

I Wonder if they've considered running an extension cord out here to Nebraska.
____________

Number 6
Avatar
Send message
Joined: 26 Nov 05
Posts: 2
Credit: 105,153
RAC: 0
New Zealand
Message 1336540 - Posted: 10 Feb 2013, 7:55:23 UTC

I have about 26 WUs ready to report and I keep getting "communication deferred" for ever increasing times on the Project tab of BOINC Manager.

In the Event Log I can see the following:

10/02/2013 6:39:23 p.m. | SETI@home | Scheduler request failed: Couldn't connect to server
10/02/2013 6:39:26 p.m. | | Project communication failed: attempting access to reference site
10/02/2013 6:39:29 p.m. | | Internet access OK - project servers may be temporarily down.

After a couple of hours I tried again and got this:

10/02/2013 8:48:08 p.m. | SETI@home | Sending scheduler request: Requested by user.
10/02/2013 8:48:08 p.m. | SETI@home | Reporting 26 completed tasks, requesting new tasks for CPU and NVIDIA
10/02/2013 8:48:50 p.m. | SETI@home | [error] No start tag in scheduler reply

What's going on here? Is there something wrong my end or are there still problems on the server end?



____________
-------------------------------------
By Hook or by Crook....We Will
-------------------------------------

Previous · 1 · 2 · 3 · Next

Message boards : News : Scheduler is back up...

Copyright © 2014 University of California