Panic Mode On (73) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (73) Server problems?

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next
Author Message
clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 5
United Kingdom
Message 1215264 - Posted: 7 Apr 2012, 20:48:11 UTC - in response to Message 1215241.

Cliff

Try downgrading to 6.10.60 to avoid th 6.12.xx backoffs. Made a massive world of difference this end. Suddenly it's like the olde dayse!


Yup, did that myself a few months ago cos i was totaly backed off with the crazy backoff`s.

Profile cliff
Avatar
Send message
Joined: 16 Dec 07
Posts: 322
Credit: 2,509,590
RAC: 0
United Kingdom
Message 1215271 - Posted: 7 Apr 2012, 20:56:37 UTC - in response to Message 1215241.
Last modified: 7 Apr 2012, 20:58:16 UTC

Cliff

Try downgrading to 6.10.60 to avoid th 6.12.xx backoffs. Made a massive world of difference this end. Suddenly it's like the olde dayse!


I think I'll wait it out:-) After all its v7 next is'nt it? Then its lunatics kit and the fun and games start again:-)

The backoffs believe it or not, are not the problem so much as the very patchy comms between boinc and its home turf..

Backoffs are cured by highlighting the entire download queue and then hitting retry whenever 1 of them stalls.. The 1st backoff makes it easy to gather the lot and hammer the d/l until all are in.

So far today I've had about 3 WU stall.. no more but failed updates are another story.. But if I repeatedly hit update, when boinc times out the first time, then by update hit 3 or 4 or 7 its in contact and the d/l proceeds smoothly.

If I'm asleep then Ray's SIV will handle the backoffs, more slowly then me, but it gets it done.

Cheers,
____________
Cliff,
Been there, Done that, Still no damm T shirt!

Profile Zapped Sparky
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 30 Aug 08
Posts: 5625
Credit: 1,120,350
RAC: 1,623
United Kingdom
Message 1215299 - Posted: 7 Apr 2012, 21:37:19 UTC - in response to Message 1215209.

12 hours since anyone had a panic attack. That can't be right, it just has to be something to panic about. :-)

A lightbulb blew earlier, AND I DON'T HAVE A SPARE! :) Other than that I'm still waiting for my astropulse times to get down to what they actually are. At the moment they're at 197hrs :)
____________
In an alternate universe, it was a ZX81 that asked for clothes, boots and motorcycle.

Client error 418: I'm a teapot

Tropical Goldfish Fish 6: Revenge Of The 'prettig gestoord' Koalas

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2203
Credit: 8,009,002
RAC: 4,265
United States
Message 1215409 - Posted: 8 Apr 2012, 0:31:56 UTC - in response to Message 1215299.
Last modified: 8 Apr 2012, 0:32:13 UTC

A lightbulb blew earlier, AND I DON'T HAVE A SPARE! :) Other than that I'm still waiting for my astropulse times to get down to what they actually are. At the moment they're at 197hrs :)

It'll get there. You need 10 validated APs that did not early-exit and had less than 10% blanking. Mine took a while for that to happen. Right about 44 total APs to get 10 that met the criteria. Of course by then, the DCF mechanism was just about to provide reasonable rates anyway.

Because of the way DCF works, it took ~20 to go from a little over 200 down to ~175, 15 more to drop down to ~125, and five to get down to ~75. I imagine it would have only taken 5-7 more for the ETA to be pretty close, but I hit that magic 10 and server-side knocked the ETA down to 04:41:47. One task finished after that and now they're all showing ~11.5h like it should be.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,290,832
RAC: 39,352
Australia
Message 1215416 - Posted: 8 Apr 2012, 0:54:10 UTC - in response to Message 1215409.


Now if they could just get DCF to work for MB on CPU & GPU systems.
____________
Grant
Darwin NT.

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,366,360
RAC: 1,317
United States
Message 1215836 - Posted: 8 Apr 2012, 22:00:54 UTC

For the last hour or so, all of my scheduler requests on all 3 rigs are ending in timeout. Cricket and status page both look OK. Uploads going through normally. Anyone else?
____________

Profile cliff
Avatar
Send message
Joined: 16 Dec 07
Posts: 322
Credit: 2,509,590
RAC: 0
United Kingdom
Message 1215837 - Posted: 8 Apr 2012, 22:06:10 UTC - in response to Message 1215836.
Last modified: 8 Apr 2012, 22:06:42 UTC

Yup, same here for past 20 hrs takes 3 or more attempts to get sorted.

Cheers,
____________
Cliff,
Been there, Done that, Still no damm T shirt!

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,290,832
RAC: 39,352
Australia
Message 1215850 - Posted: 8 Apr 2012, 22:40:44 UTC - in response to Message 1215836.
Last modified: 8 Apr 2012, 22:46:08 UTC

For the last hour or so, all of my scheduler requests on all 3 rigs are ending in timeout. Cricket and status page both look OK. Uploads going through normally. Anyone else?

Just checked my logs. Been OK (at least on one machine) for the last hour or 2, but for most of the night the majority of Scheduler requests were timing out.


EDIT- just checked my other machine. Still getting the odd Scheduler timeout there. Looks like it's just the luck of the draw. You might get a Scheduler timeout. Even if that doesn't happen you might get a "Project has no tasks available" message. If you're really lucky you might get some work.
____________
Grant
Darwin NT.

Profile Britz
Send message
Joined: 1 Apr 12
Posts: 1
Credit: 56,261
RAC: 0
United States
Message 1215910 - Posted: 9 Apr 2012, 0:47:09 UTC

I got about 16 GPU and 1 AP WUs today, and that was around 3:00 EDT. Took a long while for the GPU WUs to get returned. Hope it all gets sorted out soon.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2203
Credit: 8,009,002
RAC: 4,265
United States
Message 1215956 - Posted: 9 Apr 2012, 3:15:06 UTC

My apologies for clogging the pipe, but my 10-day AP-only cache is now full. 3 crunching with 63 "ready to start."

What were the limits again? Wasn't it 50 per [physical] CPU, or did it go back to counting cores? I know for a while it didn't matter if it was 1-16 cores, it was 1 physical CPU and there was a limit.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

N9JFE
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 9312
Credit: 11,909,132
RAC: 14,329
United States
Message 1215978 - Posted: 9 Apr 2012, 3:42:01 UTC

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


N9JFE
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 9312
Credit: 11,909,132
RAC: 14,329
United States
Message 1215992 - Posted: 9 Apr 2012, 4:37:28 UTC - in response to Message 1215978.

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

My tasks in progress continues to drop, and the server has been saying I've reached my limit for about 7 hours now. Is this anything to do with the switch to Lunatics? Boinc is running GPU work on high priority, even though it's not due for a week or more. (If it matters, I set app_info to do 2 GPUs at a time.)

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 59,955,820
RAC: 87,204
Argentina
Message 1216001 - Posted: 9 Apr 2012, 5:03:30 UTC - in response to Message 1215992.

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

My tasks in progress continues to drop, and the server has been saying I've reached my limit for about 7 hours now. Is this anything to do with the switch to Lunatics? Boinc is running GPU work on high priority, even though it's not due for a week or more. (If it matters, I set app_info to do 2 GPUs at a time.)


I guess the limit was reached only on CPU tasks, if the GPU's are in panic mode, BOINC wont ask for more GPU work, and if you are not using the flops tags then the scheduller is still learning what the speed for the new app's is and meanwhile is using a (slower) default value which is forcing the panic mode...
(or something like that...)
____________

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,366,360
RAC: 1,317
United States
Message 1216014 - Posted: 9 Apr 2012, 5:42:09 UTC

Scheduler still seems borked.

I've got over 300 to report, and I'm guessing about 100 or more ghosted, waiting for a good scheduler response or five to start receiving them as resends.

Still getting 99% timeouts.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,290,832
RAC: 39,352
Australia
Message 1216023 - Posted: 9 Apr 2012, 5:59:35 UTC - in response to Message 1216014.


Most of today the Scheduler has been hit & miss, but for about the last hour it's all been miss. Every attempt has timed out.
____________
Grant
Darwin NT.

AndrewM
Volunteer tester
Send message
Joined: 5 Jan 08
Posts: 361
Credit: 33,872,609
RAC: 5
Australia
Message 1216031 - Posted: 9 Apr 2012, 6:31:46 UTC

To borrow a line from Wiggo, I'm bouncing off the limits
____________
AndrewM

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,290,832
RAC: 39,352
Australia
Message 1216033 - Posted: 9 Apr 2012, 7:04:07 UTC - in response to Message 1216031.

To borrow a line from Wiggo, I'm bouncing off the limits

So am i, now.
But for several hours there i was getting further & further from the limits with each Scheduler timeout.
____________
Grant
Darwin NT.

rob smith
Volunteer moderator
Send message
Joined: 7 Mar 03
Posts: 7667
Credit: 44,713,551
RAC: 75,037
United Kingdom
Message 1216036 - Posted: 9 Apr 2012, 7:31:53 UTC

The schedulers must have had a surfeit of Chocolate Easter Eggs and dozed off in the arm chair...
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile S@NL Etienne Dokkum
Volunteer tester
Avatar
Send message
Joined: 11 Jun 99
Posts: 155
Credit: 12,782,685
RAC: 25,055
Netherlands
Message 1216057 - Posted: 9 Apr 2012, 9:15:06 UTC

every holiday the servers seem to take their own vacation. And who says machines don't have feelings.... pffff

on topic : agreed with previous posters, time-out reached on every attempt.
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 5158
Credit: 82,928,936
RAC: 71,663
Australia
Message 1216060 - Posted: 9 Apr 2012, 9:20:30 UTC - in response to Message 1216033.

To borrow a line from Wiggo, I'm bouncing off the limits

So am i, now.
But for several hours there i was getting further & further from the limits with each Scheduler timeout.

Far too much bouncing going here now, I'm getting a sore neck.

Cheers.
____________

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (73) Server problems?

Copyright © 2014 University of California