Oh, come on, is this normal?

Message boards : Number crunching : Oh, come on, is this normal?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile scsimodo

Send message
Joined: 15 May 03
Posts: 39
Credit: 51,577
RAC: 0
Germany
Message 156983 - Posted: 25 Aug 2005, 21:50:56 UTC

I've been an entuthiastic S@H cruncher for more than 1 year. Now I'm happily crunching P@H and LHC. Just for n ostalgic reasons I switched back to S@H for a few WUs.

Well, what I see is the following:

- Outages everywhere (since 2 days)
- Pending credits are far behind
- performance problems

Is this normal? I've never seen this on other projects, ok some time with no work, yes, but not the problems I see on Seti.

Did I just pick a bad time, or is this "business as usual"?

ID: 156983 · Report as offensive
Profile [B^S] Paul@home
Volunteer tester

Send message
Joined: 20 Dec 99
Posts: 121
Credit: 1,885,420
RAC: 0
Ireland
Message 156986 - Posted: 25 Aug 2005, 21:54:04 UTC
Last modified: 25 Aug 2005, 21:54:25 UTC

This is a particularly bad time.. but there is also an unfortunate familiarity about it!!

There now appears to be light at the end of the tunnel (see the recent tech news + the odd post in this forum) so hopefully you'll get work again soon!

cheers,

Paul.
Wanna visit BOINC Synergy? Click my stats!

Join BOINC Synergy Team
ID: 156986 · Report as offensive
Profile Toby
Volunteer tester
Avatar

Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 6,366,949
RAC: 0
United States
Message 156987 - Posted: 25 Aug 2005, 21:54:54 UTC

Take a look around the message board and the technical news. Some unique problems have cropped up but the current outage should fix them if all goes according to plan.
A member of The Knights Who Say NI!
For rankings, history graphs and more, check out:
My BOINC stats site
ID: 156987 · Report as offensive
Profile scsimodo

Send message
Joined: 15 May 03
Posts: 39
Credit: 51,577
RAC: 0
Germany
Message 156988 - Posted: 25 Aug 2005, 21:56:27 UTC - in response to Message 156986.  

This is a particularly bad time.. but there is also an unfortunate familiarity about it!!

There now appears to be light at the end of the tunnel



Well, the light at the end of the tunnel, could be the train heading towards you :-))



ID: 156988 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 156989 - Posted: 25 Aug 2005, 21:57:04 UTC

Normal? No.

Then again, LHC is out of work at this moment and Predictor was down tonight (they are only just back up).

If some hardware wants to break down, it will.
What you shouldn't do is compare relative small projects with Seti@Home. While LHC and PAH may have an increasing userbase, all they do is run that userbase, whereas Seti@Home runs two, Classic and Boinc.

Yet if you want to read around, there's threads enough on this subject. It may cost you all week to get through them. Just don't be biased based upon another (smaller) project's up time.

And don't forget who made Boinc. ;)
If it weren't for Seti in the first place, you'd be knitting a wool sweater around now... :P
ID: 156989 · Report as offensive
ampoliros
Volunteer tester
Avatar

Send message
Joined: 24 Sep 99
Posts: 152
Credit: 3,542,579
RAC: 5
United States
Message 156991 - Posted: 25 Aug 2005, 22:02:51 UTC
Last modified: 25 Aug 2005, 22:08:48 UTC

You caught the tail end (hopefully) of several problems and outages that have culminated in an extended outage. Some things were out of the SetiBOINC staffs hands (outage to replace a faulty breaker or router or RAID disk), others were problems with BOINC (antiques/orphans) that only surfaced because the SetiBOINC project is much larger and has run much longer than the other BOINC projects.

If your talking about the 'wating for validation' queue, there are some good discussions about it in this forum. For some perspective SetiClassic is 50 million behind, but that's not the point. Much like any program, there are problems found which result in patches, service packs, upgrades, etc.

Unfortunautely, in order to do this maintenance, the SetiBOINC staff must shut down the scheduler. Prior to this, the validation queue may have been behind but it was not affecting anyones ability to do work. The only reason they are fixing this now in one big swoop is because the disks were filling up and it's easier to shut down the project and empty the disks then add storage and reconfigure the services.

The problems that lead to this outage are being fixed as well, so we are all hopeful that once this outage is over, everything will return to 'normal'.

Half of the BOINC projects are down anyway (for various reasons). BOINC is designed so that it's no big deal if some of them aren't working.

[edit]bad grammer[/edit]

7,049 S@H Classic Credits
ID: 156991 · Report as offensive
Profile scsimodo

Send message
Joined: 15 May 03
Posts: 39
Credit: 51,577
RAC: 0
Germany
Message 156996 - Posted: 25 Aug 2005, 22:10:15 UTC - in response to Message 156989.  

Normal? No.

Then again, LHC is out of work at this moment and Predictor was down tonight (they are only just back up).

If some hardware wants to break down, it will.
What you shouldn't do is compare relative small projects with Seti@Home. While LHC and PAH may have an increasing userbase, all they do is run that userbase, whereas Seti@Home runs two, Classic and Boinc.


LHC is out of work, but they announced it. P@H has obviously some problems with their scheduler (unfortunately on weekends). I have to live with it.

Now to SETI! Some hardware breakdowns, true! But as non active member of SETi I recall at least 3 RAID crashes, database corruption, ancient results that are not deleted etc. No show stopper, but annoying though.

I don't want to blame SETI, I'm just curious (and a liitle bothered) about the current state of SETI.




ID: 156996 · Report as offensive
Profile [B^S] Paul@home
Volunteer tester

Send message
Joined: 20 Dec 99
Posts: 121
Credit: 1,885,420
RAC: 0
Ireland
Message 156998 - Posted: 25 Aug 2005, 22:12:46 UTC

Backend services appear to be coming back up...
Wanna visit BOINC Synergy? Click my stats!

Join BOINC Synergy Team
ID: 156998 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 157000 - Posted: 25 Aug 2005, 22:13:08 UTC

Aren't we all, scsimodo. ;)

At least I remember you from past times, and you're not even coming in to just flame everyone and their little sister, before gracefully leaving again. :)

Addaboy (?)

:)
ID: 157000 · Report as offensive
Profile scsimodo

Send message
Joined: 15 May 03
Posts: 39
Credit: 51,577
RAC: 0
Germany
Message 157003 - Posted: 25 Aug 2005, 22:22:21 UTC - in response to Message 156991.  

You caught the tail end (hopefully) of several problems and outages that have culminated in an extended outage. Some things were out of the SetiBOINC staffs hands (outage to replace a faulty breaker or router or RAID disk), others were problems with BOINC (antiques/orphans) that only surfaced because the SetiBOINC project is much larger and has run much longer than the other BOINC projects...


I don't complain about the problem various BOINC projects have. I'm just curious about the massive outages at SETI. I'm aware that all projects suffer from low work, hardware failures, sporadic outages. That's one reason why BOINC is there. One project has problems, so crunch for the other.

Now I now just hit a bad time at SETI. Well, I have to deal with it...

ID: 157003 · Report as offensive
Profile scsimodo

Send message
Joined: 15 May 03
Posts: 39
Credit: 51,577
RAC: 0
Germany
Message 157008 - Posted: 25 Aug 2005, 22:28:15 UTC - in response to Message 157000.  

Aren't we all, scsimodo. ;)

At least I remember you from past times, and you're not even coming in to just flame everyone and their little sister, before gracefully leaving again. :)

Addaboy (?)

:)


Show me one, only ONE thread where I tried to flame!
Showing my anger, ok, but I never tried to start a flamewar or to discrace someone (ok, except Guido Waldenmeier)




ID: 157008 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 157015 - Posted: 25 Aug 2005, 22:32:42 UTC - in response to Message 156983.  

I've been an entuthiastic S@H cruncher for more than 1 year. Now I'm happily crunching P@H and LHC. Just for n ostalgic reasons I switched back to S@H for a few WUs.

Well, what I see is the following:

- Outages everywhere (since 2 days)
- Pending credits are far behind
- performance problems

Is this normal? I've never seen this on other projects, ok some time with no work, yes, but not the problems I see on Seti.

Did I just pick a bad time, or is this "business as usual"?


Is this normal? Why, yes it is very normal. In another thread azwoody asked if there has been a two week period that the SETI/BOINC project has gone without a problem . . . no one answered, and I don't remember that happening . . . so it seems that this is, indeed normal.

That doesn't mean I'm leaving, or flaming anyone. And I am well aware that the Berkeley folks are running this operation "where no one has gone before", but I am convinced that this project is in a state (an extended state) of development, with new versions of BOINC and SETI@home coming out regularly, new and/or different hardware configurations, and like this instance, the occasional massive glitch that has to be dealt with before it crashes the whole mess.

Is this normal? Until the project achieves some long term (six-months at least) stability, this is, and will be normal.

OK. Flame away.
ID: 157015 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 157016 - Posted: 25 Aug 2005, 22:33:00 UTC - in response to Message 157011.  

Server status page says that the services are up.

How ever as usual it's "dropping connections".

I can't get anything uploaded or downloaded and
I'm out of WUs since early this afternoon. (timezone GMT+1)

The upload/download servers are still (intentionally) off.
ID: 157016 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 157018 - Posted: 25 Aug 2005, 22:38:16 UTC - in response to Message 157016.  

The upload/download servers are still (intentionally) off.


Then the Status page should say the scheduler is disabled, shouldn't it?
ID: 157018 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 157020 - Posted: 25 Aug 2005, 22:39:40 UTC - in response to Message 157015.  

Is this normal? Why, yes it is very normal. In another thread azwoody asked if there has been a two week period that the SETI/BOINC project has gone without a problem . . . no one answered, and I don't remember that happening . . . so it seems that this is, indeed normal.


Does anyone have logs that would prove, conclusively, that azwoody's statement is either true or not?

If I look at BOINC Stats the graphs certainly tend upward quite nicely, but is the resolution good enough to tell for sure?
ID: 157020 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 157022 - Posted: 25 Aug 2005, 22:40:23 UTC - in response to Message 157018.  

The upload/download servers are still (intentionally) off.


Then the Status page should say the scheduler is disabled, shouldn't it?

The scheduler is a different machine.
ID: 157022 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 157027 - Posted: 25 Aug 2005, 22:45:14 UTC - in response to Message 157022.  
Last modified: 25 Aug 2005, 22:46:07 UTC

The upload/download servers are still (intentionally) off.

Then the Status page should say the scheduler is disabled, shouldn't it?

The scheduler is a different machine.


I'm sure you're right, but when I try to connect the message I get is, "8/25/2005 12:38:50 PM SETI@home No schedulers responded", yet the Status page says the "Scheduler" is running.

ID: 157027 · Report as offensive
Profile [B^S] Paul@home
Volunteer tester

Send message
Joined: 20 Dec 99
Posts: 121
Credit: 1,885,420
RAC: 0
Ireland
Message 157030 - Posted: 25 Aug 2005, 22:48:22 UTC - in response to Message 157027.  



I'm sure you're right, but when I try to connect the message I get is, "8/25/2005 12:38:50 PM SETI@home No schedulers responded", yet the Status page says the "Scheduler" is running.




It is possible they are blocking comms to the scheduler... but want it running.

They may want it running for some reason (is it needed to pickup WUs that fail to get a quorum and need to be sent again... or does the feeder do that?)

cheers,

Paul.


Wanna visit BOINC Synergy? Click my stats!

Join BOINC Synergy Team
ID: 157030 · Report as offensive
Profile Shaktai
Volunteer tester
Avatar

Send message
Joined: 16 Jun 99
Posts: 211
Credit: 259,752
RAC: 0
United States
Message 157033 - Posted: 25 Aug 2005, 22:51:10 UTC - in response to Message 157027.  

I'm sure you're right, but when I try to connect the message I get is, "8/25/2005 12:38:50 PM SETI@home No schedulers responded", yet the Status page says the "Scheduler" is running.


It will take a little while for things to settle down. After an extended outage like we had, there are probably a million boxes all trying at once to upload and download work. Give it three or four hours and just let your client manage the process. Be patient, the spice, er a, work will flow again.


Team MacNN - The best Macintosh team ever.
ID: 157033 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 157034 - Posted: 25 Aug 2005, 22:52:29 UTC - in response to Message 157027.  

The upload/download servers are still (intentionally) off.

Then the Status page should say the scheduler is disabled, shouldn't it?

The scheduler is a different machine.


I'm sure you're right, but when I try to connect the message I get is, "8/25/2005 12:38:50 PM SETI@home No schedulers responded", yet the Status page says the "Scheduler" is running.

I haven't read the code to see how the schedulers work, but I've seen Matt comment that there are a limited number of simultaneous CGIs allowed to run on each machine, and remember that every single cruncher out here probably has a reason to want to talk to the mother ship.

We'll see a couple of days of "ratty comms" as a result of the downtime.

ID: 157034 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Oh, come on, is this normal?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.