Panic Mode On (46) Server problems


log in

Advanced search

Message boards : Number crunching : Panic Mode On (46) Server problems

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next
Author Message
-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1094939 - Posted: 8 Apr 2011, 23:13:18 UTC - in response to Message 1094824.


A feature added to BOINC specifically for AP causes the estimate to be multiplied by 1.3 before comparing to the delay_bound = 25*86400 set by the ap_splitter. That multiplier allows for heavily blanked tasks when the application calculates a lot of shaped noise replacement data, which can cause run time to increase by about 30%.

As you deduced that doesn't reduce the deadline, it simply affects whether the task gets sent or not. Also, the situation is seldom so simple because there are usually other tasks already on the host.
Joe


Interesting thanks for the information!

____________
Traveling through space at ~67,000mph!

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4207
Credit: 34,461,600
RAC: 20,950
United Kingdom
Message 1095861 - Posted: 10 Apr 2011, 21:29:47 UTC
Last modified: 10 Apr 2011, 21:33:43 UTC

Uploads have dropped to Zero, and downloads are dropping too, scheduler requests also fail,

Claggy

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,209,245
RAC: 7,563
United States
Message 1095865 - Posted: 10 Apr 2011, 21:55:31 UTC - in response to Message 1095861.

Whatever it was didn't last long. I just got some new work and uploaded and reported a couple.
____________


PROUD MEMBER OF Team Starfire World BOINC

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4207
Credit: 34,461,600
RAC: 20,950
United Kingdom
Message 1095867 - Posted: 10 Apr 2011, 21:58:46 UTC - in response to Message 1095865.

My uploads went through and reported too,

Claggy

Kevin Olley
Send message
Joined: 3 Aug 99
Posts: 368
Credit: 35,328,637
RAC: 670
United Kingdom
Message 1096722 - Posted: 13 Apr 2011, 6:16:28 UTC

Problems with the replica database, ATM its 15,047 seconds behind the master.



____________
Kevin


Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4588
Credit: 121,537,414
RAC: 55,340
United States
Message 1096791 - Posted: 13 Apr 2011, 12:38:55 UTC - in response to Message 1096722.
Last modified: 13 Apr 2011, 12:39:19 UTC

Problems with the replica database, ATM its 15,047 seconds behind the master.



That is rather normal while the replica catches up after maintenance. Six hours later and it is down to about 2500 seconds.
Sometimes you might even observe the time go up for a few hours after everything comes back online. IIRC the replica recovers in about 12-24 hours after everything is brought back up.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Robert J
Avatar
Send message
Joined: 30 Mar 00
Posts: 108
Credit: 7,152,060
RAC: 7,687
United States
Message 1097058 - Posted: 14 Apr 2011, 8:13:33 UTC
Last modified: 14 Apr 2011, 8:14:23 UTC

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8757
Credit: 52,706,358
RAC: 28,632
United Kingdom
Message 1097064 - Posted: 14 Apr 2011, 9:22:48 UTC - in response to Message 1097058.

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 926
Credit: 12,373,423
RAC: 8,308
United Kingdom
Message 1097068 - Posted: 14 Apr 2011, 9:46:51 UTC

Time to call Dyno Rod.
They'll be there soon after the team gets into work.
____________

-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1097070 - Posted: 14 Apr 2011, 10:02:51 UTC - in response to Message 1097064.

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'


GRRR....and just as I got my second card back in from eVGA. Hopefully they ole' cache last one more test!....err that should be a period....
____________
Traveling through space at ~67,000mph!

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4588
Credit: 121,537,414
RAC: 55,340
United States
Message 1097072 - Posted: 14 Apr 2011, 10:14:59 UTC

One of my machines at home was mid downloading some tasks when everything went splat about 3:30 UTC. Then it looks like it cleared up about 5:00 UTC as the tasks finished downloading and a few more were requested and downloaded. Then again around 7:00 UTC tasks are no go for d/l.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2326
Credit: 8,867,607
RAC: 1,010
United States
Message 1097073 - Posted: 14 Apr 2011, 10:22:04 UTC

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4588
Credit: 121,537,414
RAC: 55,340
United States
Message 1097084 - Posted: 14 Apr 2011, 10:57:52 UTC - in response to Message 1097073.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2326
Credit: 8,867,607
RAC: 1,010
United States
Message 1097087 - Posted: 14 Apr 2011, 11:39:54 UTC - in response to Message 1097084.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

True, but that doesn't keep the pending downloads from retrying, filling my log up and making unnecessary connection attempts. Just easier to suspend comms. and wait for things to be fixed.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8757
Credit: 52,706,358
RAC: 28,632
United Kingdom
Message 1097088 - Posted: 14 Apr 2011, 11:41:31 UTC - in response to Message 1097084.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).

Profile Miep
Volunteer moderator
Avatar
Send message
Joined: 23 Jul 99
Posts: 2411
Credit: 351,996
RAC: 0
Message 1097092 - Posted: 14 Apr 2011, 12:04:08 UTC - in response to Message 1097088.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).


Isn't the backoff only 2 hours max in the 6.10 branch? iirc we went up to 24h max on 6.12
____________
Carola
-------
I'm multilingual - I can misunderstand people in several languages!

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8757
Credit: 52,706,358
RAC: 28,632
United Kingdom
Message 1097109 - Posted: 14 Apr 2011, 13:07:48 UTC - in response to Message 1097092.

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).

Isn't the backoff only 2 hours max in the 6.10 branch? iirc we went up to 24h max on 6.12

Even the basic 'per task' backoff and retry always went up to a possible limit of four hours (though randomised within that range). I don't remember 'project backoff' ever being shorter, but I'll try and find when it was introduced.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8757
Credit: 52,706,358
RAC: 28,632
United Kingdom
Message 1097130 - Posted: 14 Apr 2011, 15:18:58 UTC - in response to Message 1097120.

The grapes that are growing on the grapevine, you mean? Yes, they are sour indeed. But don't shoot the messenger, please - only trying to help :-)

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (46) Server problems

Copyright © 2014 University of California