Panic Mode On (56) Server problems?

Message boards : Number crunching : Panic Mode On (56) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1157913 - Posted: 1 Oct 2011, 15:46:29 UTC - in response to Message 1157908.  

Somebody must be in the lab, the scheduling server is now showing as disabled.

Well, it isn't disabled, because I just reported 20 tasks. Did you check the status page for the server status page? ;-)

Seriously, all of those 'status' flags are indicative only. A script tests each server/daemon periodically to see if it's in some sense 'responsive'. The result of the test goes into a disk file somewhere, and that's what we see as being the status for the next 10 or 20 minutes, until the next page update. The daemons also have watchdog scripts which restart them if they stop running.

All of which means that the scheduling server might have glitched for a second and been restarted. That's the most we can deduce from the SSP - a single server down for a single observing cycle isn't enough to conclude that maintenance is underway (and if the staff do shut a server down manually, they usually shut down a whole block of them).
ID: 1157913 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1157916 - Posted: 1 Oct 2011, 15:51:23 UTC - in response to Message 1157913.  

Somebody must be in the lab, the scheduling server is now showing as disabled.

Well, it isn't disabled, because I just reported 20 tasks. Did you check the status page for the server status page? ;-)

Seriously, all of those 'status' flags are indicative only. A script tests each server/daemon periodically to see if it's in some sense 'responsive'. The result of the test goes into a disk file somewhere, and that's what we see as being the status for the next 10 or 20 minutes, until the next page update. The daemons also have watchdog scripts which restart them if they stop running.

All of which means that the scheduling server might have glitched for a second and been restarted. That's the most we can deduce from the SSP - a single server down for a single observing cycle isn't enough to conclude that maintenance is underway (and if the staff do shut a server down manually, they usually shut down a whole block of them).

Plus, today is Saturday, not a normal work day for the S@H gang.

Donald
Infernal Optimist / Submariner, retired
ID: 1157916 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19354
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1157927 - Posted: 1 Oct 2011, 16:15:11 UTC
Last modified: 1 Oct 2011, 16:18:14 UTC

I had success at 15:56:19, but not at 16:01:50, 16:07:27 or 16:13:19.

Think I also detect a nose dive starting on cricket.

edit]uploads are ok.
ID: 1157927 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1157932 - Posted: 1 Oct 2011, 16:35:13 UTC - in response to Message 1157927.  

I had success at 15:56:19, but not at 16:01:50, 16:07:27 or 16:13:19.

Think I also detect a nose dive starting on cricket.

edit]uploads are ok.

All my fault, I had most of my rigs shut down and had just restarted 2 of them. Downloads failed as soon as the 2nd one booted up and asked for work.

(Just wondering. Is there any way we can blame Misfit for this ? He hasn't been around for a long time but......)

T.A.
ID: 1157932 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1157937 - Posted: 1 Oct 2011, 16:45:33 UTC

Well, I checked the server status page just before I posted that and it had refreshed just one minute before I did. That's why I posted it as showing disabled.


TA, we can always blame Misfit. Actually, I kinda miss him. Wonder how he's doing?


PROUD MEMBER OF Team Starfire World BOINC
ID: 1157937 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1157940 - Posted: 1 Oct 2011, 16:49:34 UTC

Something's definitely wrong. I just uploaded a pile of units and then sent all the results in to s&h. This morning I even got some new WUs.
I am worried that this may not be what is supposed to happen ;)


ID: 1157940 · Report as offensive
S@NL - John van Gorsel
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 193
Credit: 139,673,078
RAC: 0
Netherlands
Message 1157942 - Posted: 1 Oct 2011, 16:52:12 UTC
Last modified: 1 Oct 2011, 16:52:48 UTC

For some reason my Linux pc's can still report (and get new work) while my Windows pc's all get the "unable to connect to server" or "HTTP error" message. Same thing happened yesterday when the Linux pc's were still able to get through.

The Cricket graphs clearly show that something happened about an hour ago.


Seti@Netherlands website
ID: 1157942 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1157948 - Posted: 1 Oct 2011, 17:02:25 UTC - in response to Message 1157937.  

TA, we can always blame Misfit. Actually, I kinda miss him. Wonder how he's doing?

Same here, I still occasionally go look at his profile when I want a grin.
ID: 1157948 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1157953 - Posted: 1 Oct 2011, 17:11:20 UTC
Last modified: 1 Oct 2011, 17:18:11 UTC

More worryingly, I just got

01/10/2011 18:03:21 | SETI@home | Sending scheduler request: Requested by user.
01/10/2011 18:03:21 | SETI@home | Reporting 14 completed tasks, requesting new tasks for CPU and NVIDIA GPU
01/10/2011 18:03:21 | SETI@home | [sched_op] CPU work request: 673473.12 seconds; 0.00 CPUs
01/10/2011 18:03:21 | SETI@home | [sched_op] NVIDIA GPU work request: 154670.17 seconds; 0.00 GPUs
01/10/2011 18:04:22 | SETI@home | Scheduler request failed: HTTP internal server error
01/10/2011 18:04:22 | SETI@home | [sched_op] Deferring communication for 1 min 30 sec

The 14 tasks got reported OK, because the 'in progress' count for that host on the website is now 14 fewer than the number of tasks BoincView is counting - I never got the 'ack' for successful reporting of those tasks, so BOINC locally still thinks they're 'ready to report'.

Edit - whatever it was, was only temporary. The tasks reported properly and were acknowledged a couple of attempts later, and I got some new work.
ID: 1157953 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1157965 - Posted: 1 Oct 2011, 18:25:55 UTC

Just tried to to report work from the old P4 and its a no go. Says servers may be down. the Mac has two work units that I got sometime today. This I7 still has 4 Einstiens I want to finish befor I request new work. Think I will get some?

And downloads from Einstien havent been very fast the past few days either. Ive had one trying since this morning.

Ill join Chris at the pub.
[/quote]

Old James
ID: 1157965 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1157976 - Posted: 1 Oct 2011, 19:04:07 UTC
Last modified: 1 Oct 2011, 19:06:00 UTC

Daren't go to pub. The only way i can get uploads and reports through is to sit here constantly hitting the retry button. Just to add insult to injury, I jave got a whole load of 12s megashorties that take 12s to abort and then 12 minutes to upload and another 12 minutes to report. Have to go now still got WU to report again and again and again....

EDIT guess what while I was typing this the report went through. Now I have 14 minutes free till the next shortie finishes. How many pints can I drink in 14 minutes?
ID: 1157976 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19354
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1157978 - Posted: 1 Oct 2011, 19:07:31 UTC - in response to Message 1157976.  

Daren't go to pub. The only way i can get uploads and reports through is to sit here constantly hitting the retry button. Just to add insult to injury, I jave got a whole load of 12s megashorties that take 12s to abort and then 12 minutes to upload and another 12 minutes to report. Have to go now still got WU to report again and again and again....

Well the cricket graph has almost hit rock bottom, so the sensible decision is go down the pub and check when we get back.
ID: 1157978 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22491
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1157981 - Posted: 1 Oct 2011, 19:14:23 UTC - in response to Message 1157976.  



EDIT guess what while I was typing this the report went through. Now I have 14 minutes free till the next shortie finishes. How many pints can I drink in 14 minutes?


Well it was fun while it lasted.

Pints in 14 minutes? That depends how many you get lined up first, some pubs I know you'd be lucky to get one in never mind drunk, others you could line a dozen or more up and still have time to order another dozen.
Don't bother punishing the re-try button, just retire to the pub, someone has flattened the crickets :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1157981 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1157991 - Posted: 1 Oct 2011, 19:37:28 UTC

Jeff Cobb has posted an update for our woes in the HE thread
http://setiathome.berkeley.edu/forum_thread.php?id=64652&nowrap=true#1157929
ID: 1157991 · Report as offensive
Profile Kinguni
Volunteer tester
Avatar

Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 1157994 - Posted: 1 Oct 2011, 19:47:30 UTC

Got results gone past expiry now. I'd just like to be able to upload and report them. I have plenty of work on other projects.
Join Team Starfire
BOINC Chat

ID: 1157994 · Report as offensive
__W__
Avatar

Send message
Joined: 28 Mar 09
Posts: 116
Credit: 5,943,642
RAC: 0
Germany
Message 1158007 - Posted: 1 Oct 2011, 20:49:50 UTC - in response to Message 1157834.  

Someone must have kicked the routers at HE very hard - yiiihhha
Just got 40 WUs and downloaded them in under 2 minutes, in spite of cricket nearly maxed out - and pinging the servers is as fast as never before (from my point of the world) :-) .


Only short time of happiness, ping on .13 and .16 lost in timeouts :-( - .18 and 20 are as fast as before :-) -

__W__

_______________________________________________________________________________
ID: 1158007 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1158009 - Posted: 1 Oct 2011, 21:01:27 UTC - in response to Message 1158007.  

One rig reported on 1 Oct 2011 | 20:55:08 UTC, but only UPLoad, no work.
Also no more work from SETI Bêta, see an enormous spike at Milkyway.


ID: 1158009 · Report as offensive
Ichobod

Send message
Joined: 20 Apr 08
Posts: 6
Credit: 2,948,560
RAC: 0
United States
Message 1158012 - Posted: 1 Oct 2011, 21:09:38 UTC

I've been getting a good amount of cpu but I don't think either of my computers have received any gpu since the last outtage. Is there not enough CUDA to go around or something else is wrong ?
ID: 1158012 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1158013 - Posted: 1 Oct 2011, 21:18:03 UTC - in response to Message 1158012.  
Last modified: 1 Oct 2011, 21:19:56 UTC

I've been getting a good amount of cpu but I don't think either of my computers have received any gpu since the last outtage. Is there not enough CUDA to go around or something else is wrong ?

Boinc will fill it's fastest device first, be it CPU or GPU, then once that cache is full it will fill the other, (it seems to be operating the other way round at the moment through)
You can try reducing your Cache setting until Boinc no longer asks for CPU work, once the HE link behaves itself you should get some GPU work,

Claggy
ID: 1158013 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1158018 - Posted: 1 Oct 2011, 21:39:41 UTC - in response to Message 1157978.  

Jeff Cobb has posted an update for our woes in the HE thread
http://setiathome.berkeley.edu/forum_thread.php?id=64652&nowrap=true#1157929


cricket graph seems to be coming back to life. Whoever Jeff kicked, bribed or sweet talked to sort out the connection issues seems to have done the trick
ID: 1158018 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · Next

Message boards : Number crunching : Panic Mode On (56) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.