Panic Mode On (46) Server problems


log in

Advanced search

Message boards : Number crunching : Panic Mode On (46) Server problems

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next
Author Message
-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1094939 - Posted: 8 Apr 2011, 23:13:18 UTC - in response to Message 1094824.


A feature added to BOINC specifically for AP causes the estimate to be multiplied by 1.3 before comparing to the delay_bound = 25*86400 set by the ap_splitter. That multiplier allows for heavily blanked tasks when the application calculates a lot of shaped noise replacement data, which can cause run time to increase by about 30%.

As you deduced that doesn't reduce the deadline, it simply affects whether the task gets sent or not. Also, the situation is seldom so simple because there are usually other tasks already on the host.
Joe


Interesting thanks for the information!

____________
Traveling through space at ~67,000mph!

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4048
Credit: 32,693,315
RAC: 531
United Kingdom
Message 1095861 - Posted: 10 Apr 2011, 21:29:47 UTC
Last modified: 10 Apr 2011, 21:33:43 UTC

Uploads have dropped to Zero, and downloads are dropping too, scheduler requests also fail,

Claggy

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 14,920,517
RAC: 12,217
United States
Message 1095865 - Posted: 10 Apr 2011, 21:55:31 UTC - in response to Message 1095861.

Whatever it was didn't last long. I just got some new work and uploaded and reported a couple.
____________


PROUD MEMBER OF Team Starfire World BOINC

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4048
Credit: 32,693,315
RAC: 531
United Kingdom
Message 1095867 - Posted: 10 Apr 2011, 21:58:46 UTC - in response to Message 1095865.

My uploads went through and reported too,

Claggy

Kevin Olley
Send message
Joined: 3 Aug 99
Posts: 368
Credit: 35,177,995
RAC: 2,562
United Kingdom
Message 1096722 - Posted: 13 Apr 2011, 6:16:28 UTC

Problems with the replica database, ATM its 15,047 seconds behind the master.



____________
Kevin


Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3860
Credit: 107,044,750
RAC: 98,622
United States
Message 1096791 - Posted: 13 Apr 2011, 12:38:55 UTC - in response to Message 1096722.
Last modified: 13 Apr 2011, 12:39:19 UTC

Problems with the replica database, ATM its 15,047 seconds behind the master.



That is rather normal while the replica catches up after maintenance. Six hours later and it is down to about 2500 seconds.
Sometimes you might even observe the time go up for a few hours after everything comes back online. IIRC the replica recovers in about 12-24 hours after everything is brought back up.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38322
Credit: 560,201,851
RAC: 652,368
United States
Message 1097047 - Posted: 14 Apr 2011, 6:50:01 UTC

Oh, meow.

Downloads seem to have died.

Uppies and reporting still working.


____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Robert J
Avatar
Send message
Joined: 30 Mar 00
Posts: 108
Credit: 6,041,916
RAC: 4,079
United States
Message 1097058 - Posted: 14 Apr 2011, 8:13:33 UTC
Last modified: 14 Apr 2011, 8:14:23 UTC

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,725,226
RAC: 20,971
United Kingdom
Message 1097064 - Posted: 14 Apr 2011, 9:22:48 UTC - in response to Message 1097058.

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 10,926,639
RAC: 13,456
United Kingdom
Message 1097068 - Posted: 14 Apr 2011, 9:46:51 UTC

Time to call Dyno Rod.
They'll be there soon after the team gets into work.
____________

-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1097070 - Posted: 14 Apr 2011, 10:02:51 UTC - in response to Message 1097064.

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'


GRRR....and just as I got my second card back in from eVGA. Hopefully they ole' cache last one more test!....err that should be a period....
____________
Traveling through space at ~67,000mph!

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3860
Credit: 107,044,750
RAC: 98,622
United States
Message 1097072 - Posted: 14 Apr 2011, 10:14:59 UTC

One of my machines at home was mid downloading some tasks when everything went splat about 3:30 UTC. Then it looks like it cleared up about 5:00 UTC as the tasks finished downloading and a few more were requested and downloaded. Then again around 7:00 UTC tasks are no go for d/l.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2237
Credit: 8,449,136
RAC: 4,081
United States
Message 1097073 - Posted: 14 Apr 2011, 10:22:04 UTC

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3860
Credit: 107,044,750
RAC: 98,622
United States
Message 1097084 - Posted: 14 Apr 2011, 10:57:52 UTC - in response to Message 1097073.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2237
Credit: 8,449,136
RAC: 4,081
United States
Message 1097087 - Posted: 14 Apr 2011, 11:39:54 UTC - in response to Message 1097084.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

True, but that doesn't keep the pending downloads from retrying, filling my log up and making unnecessary connection attempts. Just easier to suspend comms. and wait for things to be fixed.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,725,226
RAC: 20,971
United Kingdom
Message 1097088 - Posted: 14 Apr 2011, 11:41:31 UTC - in response to Message 1097084.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).

Profile Miep
Volunteer moderator
Avatar
Send message
Joined: 23 Jul 99
Posts: 2411
Credit: 351,996
RAC: 0
Message 1097092 - Posted: 14 Apr 2011, 12:04:08 UTC - in response to Message 1097088.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).


Isn't the backoff only 2 hours max in the 6.10 branch? iirc we went up to 24h max on 6.12
____________
Carola
-------
I'm multilingual - I can misunderstand people in several languages!

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,725,226
RAC: 20,971
United Kingdom
Message 1097109 - Posted: 14 Apr 2011, 13:07:48 UTC - in response to Message 1097092.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).

Isn't the backoff only 2 hours max in the 6.10 branch? iirc we went up to 24h max on 6.12

Even the basic 'per task' backoff and retry always went up to a possible limit of four hours (though randomised within that range). I don't remember 'project backoff' ever being shorter, but I'll try and find when it was introduced.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38322
Credit: 560,201,851
RAC: 652,368
United States
Message 1097120 - Posted: 14 Apr 2011, 14:03:02 UTC - in response to Message 1097064.
Last modified: 14 Apr 2011, 14:36:32 UTC

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'

Ooooohhh.
Those be sour grapes.

And I should add that I am only referring to the grapes themselves that are sour...LOL. Not any opinions about said grapes.
Meow.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,725,226
RAC: 20,971
United Kingdom
Message 1097130 - Posted: 14 Apr 2011, 15:18:58 UTC - in response to Message 1097120.

The grapes that are growing on the grapevine, you mean? Yes, they are sour indeed. But don't shoot the messenger, please - only trying to help :-)

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (46) Server problems

Copyright © 2014 University of California