Panic Mode On (33) Server problems

Message boards : Number crunching : Panic Mode On (33) Server problems

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11141
Credit: 83,777,274
RAC: 46,177
United Kingdom
Message 1005638 - Posted: 18 Jun 2010, 8:49:38 UTC - in response to Message 1005633.  

1. Which is where you (plural, not Andy alone) come in. Complain, comment, don't comply, tell how to do it better. Tell it to the developers themselves, do not expect of them to trawl through 40 threads looking for your post. And of course, comments like "dictatorship", "this is ****" and "test it better" are best left behind. On the latter, they're testing it here in a live environment just to see what could go wrong.

That's the reality, but it's bad practice. In the real computing world, whenever a major application or upgrade is launched, the developers should be proactively monitoring the rollout and catching issues as they arise. I still remember (with cold shivers down my spine) the Saturday night I migrated a live telesales database from Microsoft Access to SQL Server. I had to wait until the last call ended at 10pm, then perform the transfer. But I regarded it as a consequent duty to be on-site at 10am the following (Sunday) morning, when the sales lines opened again, to monitor that everything was running smoothly. It was - we didn't lose an order.

4. I actually like that. The biggest problem was always that people expected the claimed credit to be theirs, no matter what. OK, you won't get fun threads anymore that you claimed 17 trillion credits, but let's be honest, the method in which the claimed credit is calculated isn't in use here anymore (time * benchmarks).

Not true. The "claimed credit" shown on this project's website has been derived from the flopcounter for years, and is incredibly stable and reliable - except for the minute percentage of users still using the very earliest v5 clients or before.

ID: 1005638 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5429
Credit: 66,395,969
RAC: 12,582
Russia
Message 1005639 - Posted: 18 Jun 2010, 8:51:51 UTC - in response to Message 1005633.  
Last modified: 18 Jun 2010, 8:59:10 UTC

Quota
Quota is only counting MB tasks, but applying quota limit to all tasks, therefore I cannot download AP tasks.
Quota not being reset at end of 24 hour period, think this is PDT time but it didn't happen at UTC today either. Therefore my quota, and presumably others also, now into its third day.
(expecting more complaints on this as most people seem to run 2 or 3 day caches)

This is performance-limiting factor now indeed.

Resetting quota from close to infinity to some default value (before it was 100*NUMBER_OF_CPU_CORES+100*5*NUMBERS_OF_GPUs AFAIK (GPU part can differ)) if error encountered is OK. If it's only random error like -12 host will have enough work to continue and to prove it's good one to server. But if it's first sign of big host failure the sooner fetch will be inhibited the better. If we could decrease "close to infinity" only by 1 for each failure quota mechanism will be uneffective to deal with broken host.

But currently all says that new quota implemented with bugs. My own host still recives message aboout reached quota (294 so far), but it did not download smth even close to this number for past few days already. That is, downloaded tasks conter reset is broken.
And it looks also as same quota still applied to all app versions. I too get quota reached message on ATI GPU AP work requests too. It's absolutely clear that this host can't download ~300 AP tasks last day at any conditions, actually it downloaded no AP tasks yesterday, no AP tasks today...

ID: 1005639 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45919
Credit: 815,268,545
RAC: 125,065
United States
Message 1005641 - Posted: 18 Jun 2010, 8:58:30 UTC - in response to Message 1005636.  

Oh Boy, do you have some reading ahead of you :-) (and commiserations on the Hayfever)

Ta, but I am not going to read all. There's this handy "Mark all threads read" button at the top. :-)

And hitting the 'ignore' button is gonna fix things, eh?

Nice attitude.

Sorry? Why attack me over this? I don't work here, I run Seti on some of my systems and I try completely voluntarily try to help people with their BOINC troubles. I am not responsible for the code or its introduction, while due to me installing a completely new system I for once am away while new things are introduced around here.

Including this thread there are 35 threads in this forum alone about this problem. I don't have the need to read them all. If that's not good enough for you, then tough!

But be as it may, I'll stop posting and try to help clear things a little. Things I picked up, things as I see them from my perspective. Have a good rest of the day and continue in your struggle to anticipate changes.


Cute.........

I could respond in a way that would get me banned........really cutesey.

I have a valid complaint.......and if you cannot acknowledge that......

You might just as well just jump offa the same bridge as your Boinc companions.........

Don't EVEN give me any crap about voicing my thoughts on this matter.

You are in the wrong.

Have a nice day.

Cats.....what more does one need?

Have made friends in this life.
Most were cats.

ID: 1005641 · Report as offensive
Profile Ageless
Avatar

Send message
Joined: 9 Jun 99
Posts: 13819
Credit: 3,269,733
RAC: 0
Netherlands
Message 1005643 - Posted: 18 Jun 2010, 9:03:25 UTC

All, Mark knows best. He'll fix it.


Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!

ID: 1005643 · Report as offensive
WinterKnight
Volunteer tester

Send message
Joined: 18 May 99
Posts: 10189
Credit: 30,542,057
RAC: 3,516
United Kingdom
Message 1005646 - Posted: 18 Jun 2010, 9:04:07 UTC

1 & 2. The identification of apps should have been the first step with nothing more done until it was accurate.

Quota,
The quota should be per application, and therefore 1 & 2 apply.

3 & 4. Richard effectively answered that.
5. Credits for AP, because of Eric's modifying flop count method are at ~800cr.

Brodo answered what I was going to say about extra tasks downloaded, i.e. we no longer know how much is asked for.


ID: 1005646 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45919
Credit: 815,268,545
RAC: 125,065
United States
Message 1005650 - Posted: 18 Jun 2010, 9:07:39 UTC - in response to Message 1005643.  

All, Mark knows best. He'll fix it.

That was a simple comment from a simple mind, apparently.

Your making slight of me and the situation both appears to make your attitude clear.


Cats.....what more does one need?

Have made friends in this life.
Most were cats.

ID: 1005650 · Report as offensive
Profile Chris SCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 38183
Credit: 21,377,197
RAC: 27,752
United Kingdom
Message 1005663 - Posted: 18 Jun 2010, 9:44:08 UTC

Can we all just calm down and take 5 please. No-one is slighting anyone intentionally, just voicing opinions.


Those are my principles, and if you don't like them ... well, I have others.
Groucho Marx 1895-1977

I also have mine, and if you don't like them ... tough, live with it.
Chris S 2016

Member of UCB Charter Hill Society

ID: 1005663 · Report as offensive
Profile [AF>france>pas-de-calais]symaski62
Volunteer tester

Send message
Joined: 12 Aug 05
Posts: 258
Credit: 100,548
RAC: 0
France
Message 1005664 - Posted: 18 Jun 2010, 10:01:16 UTC

18/06/2010 11:40:31 SETI@home Sending scheduler request: To fetch work.
18/06/2010 11:40:31 SETI@home Requesting new tasks for GPU
18/06/2010 11:40:36 SETI@home Scheduler request completed: got 0 new tasks
18/06/2010 11:40:36 SETI@home Message from server: Project has no jobs available


RED servers ^^


SETI@Home Informational message -9 result_overflow
with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding.

ID: 1005664 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1005665 - Posted: 18 Jun 2010, 10:33:01 UTC - in response to Message 1005664.  

Oh Dear:

WU waiting to validate 43000 and climbing - according to status page one of the validators is down.

So the trickle of work people are getting from quota going up from valid taks will be even smaller.
And it's still a few hours till the guys get in.

There are now so many small fires to put out, it starts looking like the forest is up in flames.

Intressting dilemma: will people be angrier if they fix on the fly (and introduce other problems or we just can't see the fixes quickly enough to appease the community) or if they shut down for another day?

Seems that nerves are so frayed even the most patient of us are having a hard time.


Carola
-------
I'm multilingual - I can misunderstand people in several languages!

ID: 1005665 · Report as offensive
zoom314
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 56763
Credit: 40,747,709
RAC: 5,001
United States
Message 1005742 - Posted: 18 Jun 2010, 15:03:13 UTC

I'm on empty and so far Seti will not send Me any new work, But then I gather It won't send work out unless one has the right app, whatever that is, Yet My useless quota kept going up and My complaints have gone unanswered.

6/18/2010 7:50:26 AM SETI@home Sending scheduler request: To fetch work.
6/18/2010 7:50:26 AM SETI@home Requesting new tasks
6/18/2010 7:50:27 AM SETI@home Scheduler request completed: got 0 new tasks
6/18/2010 7:50:27 AM SETI@home Message from server: No work sent
6/18/2010 7:50:27 AM SETI@home Message from server: Your app_info.xml file doesn't have a usable version of SETI@home Enhanced.
6/18/2010 7:50:27 AM SETI@home Message from server: (reached daily quota of 241 tasks)


Pluto is still a planet

Beep! Beep!

ID: 1005742 · Report as offensive
Profile BlurfProject Donor
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8817
Credit: 9,639,694
RAC: 2,732
United States
Message 1006620 - Posted: 20 Jun 2010, 17:11:48 UTC
Last modified: 20 Jun 2010, 17:15:16 UTC

From Matt's 6/16 post

I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects.


Something for all of us to keep in mind.


ID: 1006620 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6438
Credit: 31,839,687
RAC: 6,569
United States
Message 1006624 - Posted: 20 Jun 2010, 17:21:37 UTC

Thank you Blurf.. and thank you for helping renew my ticked-offedness.

Yeah we get it. it is not 24/7. Never asked for that. Yeah we get it.
We can sign up for other projects we may or may not agree with or want to
help with. We understand they are under funded/paid.

And we get it that on top of previous server problems, they dumped a bunch
of poorly written non-tested code on us, basically keeping things tied up in a knot for over a week. How about being up 12/2?? Cause it has been ages since I remember seeing all servers up at once.

But.. feel free to quote "so sayeth Matt" again. It really.. helps. Not sure who it helps, but I am sure it does. Or not.

"Bite my shiney metal a**" So sayeth Bender.

ID: 1006624 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45919
Credit: 815,268,545
RAC: 125,065
United States
Message 1006629 - Posted: 20 Jun 2010, 17:44:02 UTC

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.


Cats.....what more does one need?

Have made friends in this life.
Most were cats.

ID: 1006629 · Report as offensive
Profile Chris SCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 38183
Credit: 21,377,197
RAC: 27,752
United Kingdom
Message 1006642 - Posted: 20 Jun 2010, 18:52:01 UTC

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.


Mark, that is the most marvellous post I think I have ever seen you make. I always knew it was worth believing in you.

Take care now.

Chris S.
Those are my principles, and if you don't like them ... well, I have others.
Groucho Marx 1895-1977

I also have mine, and if you don't like them ... tough, live with it.
Chris S 2016

Member of UCB Charter Hill Society

ID: 1006642 · Report as offensive
Profile BlurfProject Donor
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8817
Credit: 9,639,694
RAC: 2,732
United States
Message 1006643 - Posted: 20 Jun 2010, 18:53:58 UTC - in response to Message 1006642.  

Well, I, for one, have calmed down.
I have to resign myself to the fact that since I have chosen to crunch Seti, and Seti only, I must accept the downtime of the project gracefully.

And there will be some times when my rigs cannot get enough work on hand to keep them running during extended or sequential project failures.

Such is what I have chosen to do.

Welcome to life in Setiland.


Mark, that is the most marvellous post I think I have ever seen you make. I always knew it was worth believing in you.

Take care now.

Chris S.


The calm is nice to see and appreciated.


ID: 1006643 · Report as offensive
Profile Dave Cummings
Volunteer tester

Send message
Joined: 16 May 09
Posts: 207
Credit: 942,580
RAC: 1
United Kingdom
Message 1006698 - Posted: 20 Jun 2010, 22:27:44 UTC

glad to see u back

ID: 1006698 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 33,135,855
RAC: 3
Germany
Message 1006799 - Posted: 21 Jun 2010, 6:30:33 UTC

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU




- Performance is not a simple linear function of the number of CPUs you throw at the problem. -

ID: 1006799 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 1006802 - Posted: 21 Jun 2010, 6:42:46 UTC - in response to Message 1006799.  

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU


Got this message as well today with an upload transient error, will just sit and wait until it has been fixed

ID: 1006802 · Report as offensive
zoom314
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 56763
Credit: 40,747,709
RAC: 5,001
United States
Message 1006830 - Posted: 21 Jun 2010, 7:50:52 UTC - in response to Message 1006802.  

funny msg:

21.06.2010 08:19:04 [error] Error reported by file upload server: Server is out of disk space

its no wonder with over 1 million unvalidated WU


Got this message as well today with an upload transient error, will just sit and wait until it has been fixed

I turned off network access in Boinc earlier when I heard You all talking about It, I'm sure It'll get fixed on Monday or Tuesday, So We wait.
Pluto is still a planet

Beep! Beep!

ID: 1006830 · Report as offensive
WinterKnight
Volunteer tester

Send message
Joined: 18 May 99
Posts: 10189
Credit: 30,542,057
RAC: 3,516
United Kingdom
Message 1006831 - Posted: 21 Jun 2010, 8:35:19 UTC

Had to happen sooner or later.
If the validators are switched off then the results are not been transferred to the science database. So no space is being emptied to make room for further uploads.

ID: 1006831 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (33) Server problems


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.