Panic Mode On (33) Server problems


log in

Advanced search

Message boards : Number crunching : Panic Mode On (33) Server problems

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next
Author Message
Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,914,238
RAC: 13,528
United Kingdom
Message 1005638 - Posted: 18 Jun 2010, 8:49:38 UTC - in response to Message 1005633.

1. Which is where you (plural, not Andy alone) come in. Complain, comment, don't comply, tell how to do it better. Tell it to the developers themselves, do not expect of them to trawl through 40 threads looking for your post. And of course, comments like "dictatorship", "this is ****" and "test it better" are best left behind. On the latter, they're testing it here in a live environment just to see what could go wrong.

That's the reality, but it's bad practice. In the real computing world, whenever a major application or upgrade is launched, the developers should be proactively monitoring the rollout and catching issues as they arise. I still remember (with cold shivers down my spine) the Saturday night I migrated a live telesales database from Microsoft Access to SQL Server. I had to wait until the last call ended at 10pm, then perform the transfer. But I regarded it as a consequent duty to be on-site at 10am the following (Sunday) morning, when the sales lines opened again, to monitor that everything was running smoothly. It was - we didn't lose an order.

4. I actually like that. The biggest problem was always that people expected the claimed credit to be theirs, no matter what. OK, you won't get fun threads anymore that you claimed 17 trillion credits, but let's be honest, the method in which the claimed credit is calculated isn't in use here anymore (time * benchmarks).

Not true. The "claimed credit" shown on this project's website has been derived from the flopcounter for years, and is incredibly stable and reliable - except for the minute percentage of users still using the very earliest v5 clients or before.

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3266
Credit: 40,699,736
RAC: 60,126
Russia
Message 1005639 - Posted: 18 Jun 2010, 8:51:51 UTC - in response to Message 1005633.
Last modified: 18 Jun 2010, 8:59:10 UTC

Quota
Quota is only counting MB tasks, but applying quota limit to all tasks, therefore I cannot download AP tasks.
Quota not being reset at end of 24 hour period, think this is PDT time but it didn't happen at UTC today either. Therefore my quota, and presumably others also, now into its third day.
(expecting more complaints on this as most people seem to run 2 or 3 day caches)

This is performance-limiting factor now indeed.

Resetting quota from close to infinity to some default value (before it was 100*NUMBER_OF_CPU_CORES+100*5*NUMBERS_OF_GPUs AFAIK (GPU part can differ)) if error encountered is OK. If it's only random error like -12 host will have enough work to continue and to prove it's good one to server. But if it's first sign of big host failure the sooner fetch will be inhibited the better. If we could decrease "close to infinity" only by 1 for each failure quota mechanism will be uneffective to deal with broken host.

But currently all says that new quota implemented with bugs. My own host still recives message aboout reached quota (294 so far), but it did not download smth even close to this number for past few days already. That is, downloaded tasks conter reset is broken.
And it looks also as same quota still applied to all app versions. I too get quota reached message on ATI GPU AP work requests too. It's absolutely clear that this host can't download ~300 AP tasks last day at any conditions, actually it downloaded no AP tasks yesterday, no AP tasks today...

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,915,770
RAC: 492,914
United States
Message 1005641 - Posted: 18 Jun 2010, 8:58:30 UTC - in response to Message 1005636.

Oh Boy, do you have some reading ahead of you :-) (and commiserations on the Hayfever)

Ta, but I am not going to read all. There's this handy "Mark all threads read" button at the top. :-)

And hitting the 'ignore' button is gonna fix things, eh?

Nice attitude.

Sorry? Why attack me over this? I don't work here, I run Seti on some of my systems and I try completely voluntarily try to help people with their BOINC troubles. I am not responsible for the code or its introduction, while due to me installing a completely new system I for once am away while new things are introduced around here.

Including this thread there are 35 threads in this forum alone about this problem. I don't have the need to read them all. If that's not good enough for you, then tough!

But be as it may, I'll stop posting and try to help clear things a little. Things I picked up, things as I see them from my perspective. Have a good rest of the day and continue in your struggle to anticipate changes.


Cute.........

I could respond in a way that would get me banned........really cutesey.

I have a valid complaint.......and if you cannot acknowledge that......

You might just as well just jump offa the same bridge as your Boinc companions.........

Don't EVEN give me any crap about voicing my thoughts on this matter.

You are in the wrong.

Have a nice day.

____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12127
Credit: 2,519,625
RAC: 353
Netherlands
Message 1005643 - Posted: 18 Jun 2010, 9:03:25 UTC

All, Mark knows best. He'll fix it.
____________
Jord

Loving awareness is free.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8218
Credit: 21,771,852
RAC: 13,132
United Kingdom
Message 1005646 - Posted: 18 Jun 2010, 9:04:07 UTC

1 & 2. The identification of apps should have been the first step with nothing more done until it was accurate.

Quota,
The quota should be per application, and therefore 1 & 2 apply.

3 & 4. Richard effectively answered that.
5. Credits for AP, because of Eric's modifying flop count method are at ~800cr.

Brodo answered what I was going to say about extra tasks downloaded, i.e. we no longer know how much is asked for.


msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,915,770
RAC: 492,914
United States
Message 1005650 - Posted: 18 Jun 2010, 9:07:39 UTC - in response to Message 1005643.

All, Mark knows best. He'll fix it.

That was a simple comment from a simple mind, apparently.

Your making slight of me and the situation both appears to make your attitude clear.


____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 29472
Credit: 8,873,474
RAC: 27,165
United Kingdom
Message 1005663 - Posted: 18 Jun 2010, 9:44:08 UTC

Can we all just calm down and take 5 please. No-one is slighting anyone intentionally, just voicing opinions.

____________
Damsel Rescuer, Kitty Patron, Uli Fan, Julie Supporter, CAMRA
ES99 Admirer, Raccoon Friend, IFAW, PETA, 5% Badge


Profile [AF>france>pas-de-calais]symaski62
Volunteer tester
Send message
Joined: 12 Aug 05
Posts: 256
Credit: 95,094
RAC: 56
France
Message 1005664 - Posted: 18 Jun 2010, 10:01:16 UTC

18/06/2010 11:40:31 SETI@home Sending scheduler request: To fetch work.
18/06/2010 11:40:31 SETI@home Requesting new tasks for GPU
18/06/2010 11:40:36 SETI@home Scheduler request completed: got 0 new tasks
18/06/2010 11:40:36 SETI@home Message from server: Project has no jobs available


RED servers ^^


____________
SETI@Home Informational message -9 result_overflow
with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding.

Profile Miep
Volunteer moderator
Avatar
Send message
Joined: 23 Jul 99
Posts: 2411
Credit: 351,996
RAC: 51
Message 1005665 - Posted: 18 Jun 2010, 10:33:01 UTC - in response to Message 1005664.

Oh Dear:

WU waiting to validate 43000 and climbing - according to status page one of the validators is down.

So the trickle of work people are getting from quota going up from valid taks will be even smaller.
And it's still a few hours till the guys get in.

There are now so many small fires to put out, it starts looking like the forest is up in flames.

Intressting dilemma: will people be angrier if they fix on the fly (and introduce other problems or we just can't see the fixes quickly enough to appease the community) or if they shut down for another day?

Seems that nerves are so frayed even the most patient of us are having a hard time.
____________
Carola
-------
I'm multilingual - I can misunderstand people in several languages!

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44514
Credit: 35,394,973
RAC: 9,343
Message 1005742 - Posted: 18 Jun 2010, 15:03:13 UTC

I'm on empty and so far Seti will not send Me any new work, But then I gather It won't send work out unless one has the right app, whatever that is, Yet My useless quota kept going up and My complaints have gone unanswered.

6/18/2010 7:50:26 AM SETI@home Sending scheduler request: To fetch work.
6/18/2010 7:50:26 AM SETI@home Requesting new tasks
6/18/2010 7:50:27 AM SETI@home Scheduler request completed: got 0 new tasks
6/18/2010 7:50:27 AM SETI@home Message from server: No work sent
6/18/2010 7:50:27 AM SETI@home Message from server: Your app_info.xml file doesn't have a usable version of SETI@home Enhanced.
6/18/2010 7:50:27 AM SETI@home Message from server: (reached daily quota of 241 tasks)

____________

Profile Blurf
Volunteer tester
Send message
Joined: 2 Sep 06
Posts: 7259
Credit: 6,257,944
RAC: 1,656
United States
Message 1006620 - Posted: 20 Jun 2010, 17:11:48 UTC
Last modified: 20 Jun 2010, 17:15:16 UTC

From Matt's 6/16 post

I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects.


Something for all of us to keep in mind.
____________


Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,216,480
RAC: 183
United States
Message 1006624 - Posted: 20 Jun 2010, 17:21:37 UTC

Thank you Blurf.. and thank you for helping renew my ticked-offedness.

Yeah we get it. it is not 24/7. Never asked for that. Yeah we get it.
We can sign up for other projects we may or may not agree with or want to
help with. We understand they are under funded/paid.

And we get it that on top of previous server problems, they dumped a bunch
of poorly written non-tested code on us, basically keeping things tied up in a knot for over a week. How about being up 12/2?? Cause it has been ages since I remember seeing all servers up at once.

But.. feel free to quote "so sayeth Matt" again. It really.. helps. Not sure who it helps, but I am sure it does. Or not.

"Bite my shiney metal a**" So sayeth Bender.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,915,770
RAC: 492,914
United States
Message 1006629 - Posted: 20 Jun 2010, 17:44:02 UTC

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 29472
Credit: 8,873,474
RAC: 27,165
United Kingdom
Message 1006642 - Posted: 20 Jun 2010, 18:52:01 UTC

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.


Mark, that is the most marvellous post I think I have ever seen you make. I always knew it was worth believing in you.

Take care now.

Chris S.
____________
Damsel Rescuer, Kitty Patron, Uli Fan, Julie Supporter, CAMRA
ES99 Admirer, Raccoon Friend, IFAW, PETA, 5% Badge


Profile Blurf
Volunteer tester
Send message
Joined: 2 Sep 06
Posts: 7259
Credit: 6,257,944
RAC: 1,656
United States
Message 1006643 - Posted: 20 Jun 2010, 18:53:58 UTC - in response to Message 1006642.

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.


Mark, that is the most marvellous post I think I have ever seen you make. I always knew it was worth believing in you.

Take care now.

Chris S.


The calm is nice to see and appreciated.
____________


Profile Dave Cummings
Volunteer tester
Send message
Joined: 16 May 09
Posts: 204
Credit: 912,052
RAC: 70
United Kingdom
Message 1006698 - Posted: 20 Jun 2010, 22:27:44 UTC

glad to see u back

Highlander
Avatar
Send message
Joined: 5 Oct 99
Posts: 143
Credit: 30,749,690
RAC: 5,073
Germany
Message 1006799 - Posted: 21 Jun 2010, 6:30:33 UTC

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU
____________

Profile [B^S] madmac
Volunteer tester
Avatar
Send message
Joined: 9 Feb 04
Posts: 1133
Credit: 3,081,931
RAC: 3,334
United Kingdom
Message 1006802 - Posted: 21 Jun 2010, 6:42:46 UTC - in response to Message 1006799.

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU


Got this message as well today with an upload transient error, will just sit and wait until it has been fixed
____________

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44514
Credit: 35,394,973
RAC: 9,343
Message 1006830 - Posted: 21 Jun 2010, 7:50:52 UTC - in response to Message 1006802.

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU


Got this message as well today with an upload transient error, will just sit and wait until it has been fixed

I turned off network access in Boinc earlier when I heard You all talking about It, I'm sure It'll get fixed on Monday or Tuesday, So We wait.
____________

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8218
Credit: 21,771,852
RAC: 13,132
United Kingdom
Message 1006831 - Posted: 21 Jun 2010, 8:35:19 UTC

Had to happen sooner or later.
If the validators are switched off then the results are not been transferred to the science database. So no space is being emptied to make room for further uploads.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (33) Server problems

Copyright © 2014 University of California