Panic Mode On (33) Server problems

Message boards : Number crunching : Panic Mode On (33) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1005625 - Posted: 18 Jun 2010, 8:22:24 UTC - in response to Message 1005606.  

Oh Boy, do you have some reading ahead of you :-) (and commiserations on the Hayfever)

Ta, but I am not going to read all. There's this handy "Mark all threads read" button at the top. :-)

And hitting the 'ignore' button is gonna fix things, eh?

Nice attitude.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1005625 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 18996
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1005626 - Posted: 18 Jun 2010, 8:25:27 UTC

According to the "in Progress" page there are now 203 tasks for my quad after it downloaded that 5 CUDA tasks, that includes 4 AP tasks being processed and 6 waiting.
But still cannot get AP tasks for the CPU, if no more received today the cpu will go cold at ~03:00 tomorrow morning. ASAP after that it will be switched off.
ID: 1005626 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1005627 - Posted: 18 Jun 2010, 8:25:48 UTC - in response to Message 1005619.  

BOINC will always ask for so many seconds of work. Even if 1 second of work, you get a task that's more than 1 second long. If asking for 113426 seconds of work and this is slightly more than quota, you'll get it.

The BOINC client used to show how many seconds of work were being requested in the Messages tab. This disappeared somewhere in the early 6.10.xx versions.

Any idea why ?

Brodo
ID: 1005627 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1005628 - Posted: 18 Jun 2010, 8:27:57 UTC - in response to Message 1005627.  

BOINC will always ask for so many seconds of work. Even if 1 second of work, you get a task that's more than 1 second long. If asking for 113426 seconds of work and this is slightly more than quota, you'll get it.

The BOINC client used to show how many seconds of work were being requested in the Messages tab. This disappeared somewhere in the early 6.10.xx versions.

Any idea why ?

Brodo

Because they tried to 'dumb it down' so folks like you and I would have less information to question what is actually happening.

Enough said?

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1005628 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1005629 - Posted: 18 Jun 2010, 8:30:32 UTC - in response to Message 1005619.  

I've looked for John's post and found it here:

The quota code has changed dramatically.

There is a separate quota per application version, and for good computers that consistently return valid work, there is no cap on the quota as it is incremented by one at validation. However, errored or late tasks drop the quota to min(current quota, app specified quota) - 1, and invalid tasks drop the quota by 1 from where it is. In other words if you have a daily quota of 1000 and your machine generates an error, your quota for that app will be 99. If it is currently 50 and you generate an error it will be 49. If it is 100,000 and you have yet another validation success, it will be 100,001.

This way machines that never return an error eventually can get all the tasks they want.

Now, to be honest, I think that John made a typo with the 1000, instead meaning 100. So "In other words if you have a daily quota of 100 and your machine generates an error, your quota for that app will be 99. If it is currently 50 and you generate an error it will be 49."

Or in a more down to Earth example, if you have grown to a quota of 123 and your machine makes an error, its new quota is 122. I have noticed that the quota is only adjusted after validation, so it's not even correct to assume that since you crunched it correctly and uploaded & reported without problems, that it will be a good task and added to your quota.

'fraid not. Sutaru thought John had made a typo, and I had to correct him: now you think he has made a different typo. There is no typo.

I haven't had time to get up to 1,000 at Beta, but host 28361 is up to 631 - in principle, I don't think there's any limit. And when you report an error, it does go down to 99 - or typically 100, because you're usually reporting more than one task at a time, and the good ones instantly reset the base quota. But the 'Consecutive valid tasks' counter does reset to zero: SnakesAndLadders@home, anyone?
ID: 1005629 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1005630 - Posted: 18 Jun 2010, 8:33:37 UTC - in response to Message 1005628.  

BOINC will always ask for so many seconds of work. Even if 1 second of work, you get a task that's more than 1 second long. If asking for 113426 seconds of work and this is slightly more than quota, you'll get it.

The BOINC client used to show how many seconds of work were being requested in the Messages tab. This disappeared somewhere in the early 6.10.xx versions.

Any idea why ?

Brodo

Because they tried to 'dumb it down' so folks like you and I would have less information to question what is actually happening.

Enough said?

Absolutely true.

But you can restore it with the [sched_op_debug] logging flag - which isn't too verbose for general use, and very useful.
ID: 1005630 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1005633 - Posted: 18 Jun 2010, 8:37:26 UTC - in response to Message 1005608.  

Answering these two separately.

(For legibility I'll be numbering them)
Identification of applications,
1. badly introduced, initially did not identify correctly.
2.Therefore I have many tasks completed in unidentified app which don't count towards quota.
AP applications still listed as unknown cpu app.

Quota
Quota is only counting MB tasks, but applying quota limit to all tasks, therefore I cannot download AP tasks.
Quota not being reset at end of 24 hour period, think this is PDT time but it didn't happen at UTC today either. Therefore my quota, and presumably others also, now into its third day.
(expecting more complaints on this as most people seem to run 2 or 3 day caches)


1. Which is where you (plural, not Andy alone) come in. Complain, comment, don't comply, tell how to do it better. Tell it to the developers themselves, do not expect of them to trawl through 40 threads looking for your post. And of course, comments like "dictatorship", "this is ****" and "test it better" are best left behind. On the latter, they're testing it here in a live environment just to see what could go wrong.

2. Which will eventually be fixed. It'll always be eventually fixed. And then when it's introduced on the next project, they will run into their own problems, again due to incompatibilities with database used, hardware, software, the amount of cosmic rays passing through their office and the phase of the moon causing shifts in the Earth's magnetic field which affects the platters in their servers. :-)

Credits
3. Apparently new method of calculating, I have not checked yet.
4. Claimed credits col in task pages removed.
5. Pending credits page says pendings for AP tasks now 1,294.84cr !!!!!

3. See Credit New for the low-down. Best to be read after 17 cups of coffee. Smoke 'em if you got and do 'em as well.

4. I actually like that. The biggest problem was always that people expected the claimed credit to be theirs, no matter what. OK, you won't get fun threads anymore that you claimed 17 trillion credits, but let's be honest, the method in which the claimed credit is calculated isn't in use here anymore (time * benchmarks).

5. I don't understand? But then I don't run APs. Perhaps that I'll enable it on my new system, but so far all the APs I have ever seen grace any of my systems can be counted on one finger. :-)
ID: 1005633 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1005634 - Posted: 18 Jun 2010, 8:39:20 UTC - in response to Message 1005629.  

SnakesAndLadders@home, anyone?

Enough board games for me, Richard.

I have always highly regarded your input, but I think you have been far too condescending of the recent debacle.

Your background knowledge is always appreciated, but you seem to lack consideration for those like myself that have been whacked by this new code.

I am considering shutting down....I have been treading hitting the button for a day or two now.

And you know I am not prone to doing so easily.

I just want to be able to contribute to the best my hardware can do.


"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1005634 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1005636 - Posted: 18 Jun 2010, 8:47:38 UTC - in response to Message 1005625.  

Oh Boy, do you have some reading ahead of you :-) (and commiserations on the Hayfever)

Ta, but I am not going to read all. There's this handy "Mark all threads read" button at the top. :-)

And hitting the 'ignore' button is gonna fix things, eh?

Nice attitude.

Sorry? Why attack me over this? I don't work here, I run Seti on some of my systems and I try completely voluntarily try to help people with their BOINC troubles. I am not responsible for the code or its introduction, while due to me installing a completely new system I for once am away while new things are introduced around here.

Including this thread there are 35 threads in this forum alone about this problem. I don't have the need to read them all. If that's not good enough for you, then tough!

But be as it may, I'll stop posting and try to help clear things a little. Things I picked up, things as I see them from my perspective. Have a good rest of the day and continue in your struggle to anticipate changes.
ID: 1005636 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1005638 - Posted: 18 Jun 2010, 8:49:38 UTC - in response to Message 1005633.  

1. Which is where you (plural, not Andy alone) come in. Complain, comment, don't comply, tell how to do it better. Tell it to the developers themselves, do not expect of them to trawl through 40 threads looking for your post. And of course, comments like "dictatorship", "this is ****" and "test it better" are best left behind. On the latter, they're testing it here in a live environment just to see what could go wrong.

That's the reality, but it's bad practice. In the real computing world, whenever a major application or upgrade is launched, the developers should be proactively monitoring the rollout and catching issues as they arise. I still remember (with cold shivers down my spine) the Saturday night I migrated a live telesales database from Microsoft Access to SQL Server. I had to wait until the last call ended at 10pm, then perform the transfer. But I regarded it as a consequent duty to be on-site at 10am the following (Sunday) morning, when the sales lines opened again, to monitor that everything was running smoothly. It was - we didn't lose an order.

4. I actually like that. The biggest problem was always that people expected the claimed credit to be theirs, no matter what. OK, you won't get fun threads anymore that you claimed 17 trillion credits, but let's be honest, the method in which the claimed credit is calculated isn't in use here anymore (time * benchmarks).

Not true. The "claimed credit" shown on this project's website has been derived from the flopcounter for years, and is incredibly stable and reliable - except for the minute percentage of users still using the very earliest v5 clients or before.
ID: 1005638 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1005639 - Posted: 18 Jun 2010, 8:51:51 UTC - in response to Message 1005633.  
Last modified: 18 Jun 2010, 8:59:10 UTC

Quota
Quota is only counting MB tasks, but applying quota limit to all tasks, therefore I cannot download AP tasks.
Quota not being reset at end of 24 hour period, think this is PDT time but it didn't happen at UTC today either. Therefore my quota, and presumably others also, now into its third day.
(expecting more complaints on this as most people seem to run 2 or 3 day caches)

This is performance-limiting factor now indeed.

Resetting quota from close to infinity to some default value (before it was 100*NUMBER_OF_CPU_CORES+100*5*NUMBERS_OF_GPUs AFAIK (GPU part can differ)) if error encountered is OK. If it's only random error like -12 host will have enough work to continue and to prove it's good one to server. But if it's first sign of big host failure the sooner fetch will be inhibited the better. If we could decrease "close to infinity" only by 1 for each failure quota mechanism will be uneffective to deal with broken host.

But currently all says that new quota implemented with bugs. My own host still recives message aboout reached quota (294 so far), but it did not download smth even close to this number for past few days already. That is, downloaded tasks conter reset is broken.
And it looks also as same quota still applied to all app versions. I too get quota reached message on ATI GPU AP work requests too. It's absolutely clear that this host can't download ~300 AP tasks last day at any conditions, actually it downloaded no AP tasks yesterday, no AP tasks today...
ID: 1005639 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1005641 - Posted: 18 Jun 2010, 8:58:30 UTC - in response to Message 1005636.  

Oh Boy, do you have some reading ahead of you :-) (and commiserations on the Hayfever)

Ta, but I am not going to read all. There's this handy "Mark all threads read" button at the top. :-)

And hitting the 'ignore' button is gonna fix things, eh?

Nice attitude.

Sorry? Why attack me over this? I don't work here, I run Seti on some of my systems and I try completely voluntarily try to help people with their BOINC troubles. I am not responsible for the code or its introduction, while due to me installing a completely new system I for once am away while new things are introduced around here.

Including this thread there are 35 threads in this forum alone about this problem. I don't have the need to read them all. If that's not good enough for you, then tough!

But be as it may, I'll stop posting and try to help clear things a little. Things I picked up, things as I see them from my perspective. Have a good rest of the day and continue in your struggle to anticipate changes.


Cute.........

I could respond in a way that would get me banned........really cutesey.

I have a valid complaint.......and if you cannot acknowledge that......

You might just as well just jump offa the same bridge as your Boinc companions.........

Don't EVEN give me any crap about voicing my thoughts on this matter.

You are in the wrong.

Have a nice day.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1005641 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1005643 - Posted: 18 Jun 2010, 9:03:25 UTC

All, Mark knows best. He'll fix it.
ID: 1005643 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 18996
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1005646 - Posted: 18 Jun 2010, 9:04:07 UTC

1 & 2. The identification of apps should have been the first step with nothing more done until it was accurate.

Quota,
The quota should be per application, and therefore 1 & 2 apply.

3 & 4. Richard effectively answered that.
5. Credits for AP, because of Eric's modifying flop count method are at ~800cr.

Brodo answered what I was going to say about extra tasks downloaded, i.e. we no longer know how much is asked for.


ID: 1005646 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1005650 - Posted: 18 Jun 2010, 9:07:39 UTC - in response to Message 1005643.  

All, Mark knows best. He'll fix it.

That was a simple comment from a simple mind, apparently.

Your making slight of me and the situation both appears to make your attitude clear.


"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1005650 · Report as offensive
Profile [AF>france>pas-de-calais]symaski62
Volunteer tester

Send message
Joined: 12 Aug 05
Posts: 258
Credit: 100,548
RAC: 0
France
Message 1005664 - Posted: 18 Jun 2010, 10:01:16 UTC

18/06/2010 11:40:31 SETI@home Sending scheduler request: To fetch work.
18/06/2010 11:40:31 SETI@home Requesting new tasks for GPU
18/06/2010 11:40:36 SETI@home Scheduler request completed: got 0 new tasks
18/06/2010 11:40:36 SETI@home Message from server: Project has no jobs available


RED servers ^^


SETI@Home Informational message -9 result_overflow
with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding.
ID: 1005664 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1005665 - Posted: 18 Jun 2010, 10:33:01 UTC - in response to Message 1005664.  

Oh Dear:

WU waiting to validate 43000 and climbing - according to status page one of the validators is down.

So the trickle of work people are getting from quota going up from valid taks will be even smaller.
And it's still a few hours till the guys get in.

There are now so many small fires to put out, it starts looking like the forest is up in flames.

Intressting dilemma: will people be angrier if they fix on the fly (and introduce other problems or we just can't see the fixes quickly enough to appease the community) or if they shut down for another day?

Seems that nerves are so frayed even the most patient of us are having a hard time.
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1005665 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65690
Credit: 55,293,173
RAC: 49
United States
Message 1005742 - Posted: 18 Jun 2010, 15:03:13 UTC

I'm on empty and so far Seti will not send Me any new work, But then I gather It won't send work out unless one has the right app, whatever that is, Yet My useless quota kept going up and My complaints have gone unanswered.

6/18/2010 7:50:26 AM SETI@home Sending scheduler request: To fetch work.
6/18/2010 7:50:26 AM SETI@home Requesting new tasks
6/18/2010 7:50:27 AM SETI@home Scheduler request completed: got 0 new tasks
6/18/2010 7:50:27 AM SETI@home Message from server: No work sent
6/18/2010 7:50:27 AM SETI@home Message from server: Your app_info.xml file doesn't have a usable version of SETI@home Enhanced.
6/18/2010 7:50:27 AM SETI@home Message from server: (reached daily quota of 241 tasks)

The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1005742 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 1006620 - Posted: 20 Jun 2010, 17:11:48 UTC
Last modified: 20 Jun 2010, 17:15:16 UTC

From Matt's 6/16 post

I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects.


Something for all of us to keep in mind.


ID: 1006620 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1006624 - Posted: 20 Jun 2010, 17:21:37 UTC

Thank you Blurf.. and thank you for helping renew my ticked-offedness.

Yeah we get it. it is not 24/7. Never asked for that. Yeah we get it.
We can sign up for other projects we may or may not agree with or want to
help with. We understand they are under funded/paid.

And we get it that on top of previous server problems, they dumped a bunch
of poorly written non-tested code on us, basically keeping things tied up in a knot for over a week. How about being up 12/2?? Cause it has been ages since I remember seeing all servers up at once.

But.. feel free to quote "so sayeth Matt" again. It really.. helps. Not sure who it helps, but I am sure it does. Or not.

"Bite my shiney metal a**" So sayeth Bender.
ID: 1006624 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Panic Mode On (33) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.