Panic Mode On (56) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (56) Server problems?

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next
Author Message
Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,700,177
RAC: 25,553
Australia
Message 1157816 - Posted: 1 Oct 2011, 9:05:09 UTC - in response to Message 1157794.
Last modified: 1 Oct 2011, 9:08:57 UTC

the 'shorties' take slightly less than 5 minutes to complete. The new tasks I am receiving have an estimated computation time of only 58 seconds.

For me the estimates are all over the place. The DCF is moving around between 0.7 & 1.5. As each GPU tasks complete, their ridiculously long completion times slowly drop, making the almost correct CPU times drop as well. They get down to about half of the actual completion time is when one finally completes & the estimates get bumped up; pushing the GPU task completion times to new heights of ridiculousness.
Hopefully Seti can stay up for the next few days & things will start to settle down.



Although things are still looking bit broken- doesn't look as though AP work is going out. And a lot of the requests for work result in none. Sometimes i get 1 or 2 WUs, occasionally i'll get 20+. But mostly it's "Project has no tasks available".
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,700,177
RAC: 25,553
Australia
Message 1157819 - Posted: 1 Oct 2011, 9:52:11 UTC


Now i'm not getting any response from the Scheduler.
____________
Grant
Darwin NT.

BetelgeuseFive
Volunteer tester
Send message
Joined: 6 Jul 99
Posts: 63
Credit: 5,835,890
RAC: 4,162
Netherlands
Message 1157828 - Posted: 1 Oct 2011, 10:22:38 UTC - in response to Message 1157819.

I had the same problem around the time you posted your message.
Things seem to be working again. I just received 40 (!) new workunits and they downloaded really fast (less than 1.5 minutes for all 40 of them). No big surprise as the cricket graph isn't maxed out, but still nice to see ...



Now i'm not getting any response from the Scheduler.


____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,700,177
RAC: 25,553
Australia
Message 1157829 - Posted: 1 Oct 2011, 10:25:32 UTC - in response to Message 1157819.

Now i'm not getting any response from the Scheduler.


Now it's back again.
But for a while there it wasn't.

1/10/2011 19:14:59 SETI@home Sending scheduler request: To fetch work.
1/10/2011 19:14:59 SETI@home Reporting 4 completed tasks, requesting new tasks for CPU and GPU
1/10/2011 19:15:22 Project communication failed: attempting access to reference site
1/10/2011 19:15:22 SETI@home Scheduler request failed: Couldn't connect to server
1/10/2011 19:15:25 Internet access OK - project servers may be temporarily down.
1/10/2011 19:16:22 SETI@home Sending scheduler request: To fetch work.
1/10/2011 19:16:22 SETI@home Reporting 4 completed tasks, requesting new tasks for CPU and GPU
1/10/2011 19:16:44 Project communication failed: attempting access to reference site
1/10/2011 19:16:44 SETI@home Scheduler request failed: Couldn't connect to server
1/10/2011 19:16:46 Internet access OK - project servers may be temporarily down.
1/10/2011 19:17:44 SETI@home Sending scheduler request: To fetch work.
1/10/2011 19:17:44 SETI@home Reporting 6 completed tasks, requesting new tasks for CPU and GPU
1/10/2011 19:18:40 SETI@home Scheduler request failed: HTTP internal server error
1/10/2011 19:19:40 SETI@home Sending scheduler request: To fetch work.
1/10/2011 19:19:40 SETI@home Reporting 6 completed tasks, requesting new tasks for CPU and GPU
1/10/2011 19:20:00 SETI@home Computation for task 17ap11ah.22009.16427.6.10.174_0 finished
1/10/2011 19:20:16 Project communication failed: attempting access to reference site
1/10/2011 19:20:16 SETI@home Scheduler request failed: Failure when receiving data from the peer
1/10/2011 19:20:18 Internet access OK - project servers may be temporarily down.

Now it's mostly "Project has no tasks available"
____________
Grant
Darwin NT.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7852
Credit: 98,317,903
RAC: 33,777
Australia
Message 1157831 - Posted: 1 Oct 2011, 10:47:10 UTC - in response to Message 1157829.

The main message for my 3 PC's for the last 4-6 hours has been, "This computer has reached a limit on tasks in progress", with the occasional 1-10 tasks being received every 4th or 5th request.

Cheers.

____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8734
Credit: 61,636,655
RAC: 48,884
United Kingdom
Message 1157832 - Posted: 1 Oct 2011, 10:52:25 UTC

S@H has been running with a cap on tasks in progress (in other words a limit on the number of tasks you can have on each cruncher) for some time.
Each cruncher is allowed 50 per CPU core, and 400 per GPU.

(My figures might be wrong, I deduced them from the number of tasks on my crunchers.)
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Spectrum
Avatar
Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 1157833 - Posted: 1 Oct 2011, 10:54:01 UTC

Well after a fair period of no uploads or downloads it seems that the system has settled and all the gripes can be forgotten until the next time, no expectations no regrets lets all do it for the one in a bazillion chance to say we have proven that there is life out there beyond our little blue planet.

Keep on crunching and greetings to all on our little planet called Earth.
____________

__W__
Avatar
Send message
Joined: 28 Mar 09
Posts: 114
Credit: 3,270,411
RAC: 371
Germany
Message 1157834 - Posted: 1 Oct 2011, 11:14:02 UTC

Someone must have kicked the routers at HE very hard - yiiihhha
Just got 40 WUs and downloaded them in under 2 minutes, in spite of cricket nearly maxed out - and pinging the servers is as fast as never before (from my point of the world) :-) .

__W__
____________
_______________________________________________________________________________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,710,689
RAC: 24,580
United Kingdom
Message 1157836 - Posted: 1 Oct 2011, 11:18:35 UTC - in response to Message 1157831.

The main message for my 3 PC's for the last 4-6 hours has been, "This computer has reached a limit on tasks in progress", with the occasional 1-10 tasks being received every 4th or 5th request.

Cheers.

Each one of your three hosts shows either 449 or 450 tasks in progress. That's the current limit for CPU and GPU tasks combined. Subject to the usual caveats about hitting the feeder when it has suitable tasks available, you'll get a fresh task in exchange for each completed task you return.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7852
Credit: 98,317,903
RAC: 33,777
Australia
Message 1157844 - Posted: 1 Oct 2011, 11:36:17 UTC - in response to Message 1157836.

The main message for my 3 PC's for the last 4-6 hours has been, "This computer has reached a limit on tasks in progress", with the occasional 1-10 tasks being received every 4th or 5th request.

Cheers.

Each one of your three hosts shows either 449 or 450 tasks in progress. That's the current limit for CPU and GPU tasks combined. Subject to the usual caveats about hitting the feeder when it has suitable tasks available, you'll get a fresh task in exchange for each completed task you return.

Yes it's certainly nowhere near my usual cache capacity but then again I also have quite a bit of CPU work from backup projects for a safety buffer (so far it only seems to be CPU work that I run out of, the GPU work has remained SETI only).

Cheers.
____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,210,338
RAC: 6,320
United States
Message 1157908 - Posted: 1 Oct 2011, 15:33:02 UTC

Somebody must be in the lab, the scheduling server is now showing as disabled.
____________


PROUD MEMBER OF Team Starfire World BOINC

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,710,689
RAC: 24,580
United Kingdom
Message 1157913 - Posted: 1 Oct 2011, 15:46:29 UTC - in response to Message 1157908.

Somebody must be in the lab, the scheduling server is now showing as disabled.

Well, it isn't disabled, because I just reported 20 tasks. Did you check the status page for the server status page? ;-)

Seriously, all of those 'status' flags are indicative only. A script tests each server/daemon periodically to see if it's in some sense 'responsive'. The result of the test goes into a disk file somewhere, and that's what we see as being the status for the next 10 or 20 minutes, until the next page update. The daemons also have watchdog scripts which restart them if they stop running.

All of which means that the scheduling server might have glitched for a second and been restarted. That's the most we can deduce from the SSP - a single server down for a single observing cycle isn't enough to conclude that maintenance is underway (and if the staff do shut a server down manually, they usually shut down a whole block of them).

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6326
Credit: 769,082
RAC: 942
United States
Message 1157916 - Posted: 1 Oct 2011, 15:51:23 UTC - in response to Message 1157913.

Somebody must be in the lab, the scheduling server is now showing as disabled.

Well, it isn't disabled, because I just reported 20 tasks. Did you check the status page for the server status page? ;-)

Seriously, all of those 'status' flags are indicative only. A script tests each server/daemon periodically to see if it's in some sense 'responsive'. The result of the test goes into a disk file somewhere, and that's what we see as being the status for the next 10 or 20 minutes, until the next page update. The daemons also have watchdog scripts which restart them if they stop running.

All of which means that the scheduling server might have glitched for a second and been restarted. That's the most we can deduce from the SSP - a single server down for a single observing cycle isn't enough to conclude that maintenance is underway (and if the staff do shut a server down manually, they usually shut down a whole block of them).

Plus, today is Saturday, not a normal work day for the S@H gang.

____________
Donald
Infernal Optimist / Submariner, retired

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8737
Credit: 25,597,098
RAC: 11,513
United Kingdom
Message 1157927 - Posted: 1 Oct 2011, 16:15:11 UTC
Last modified: 1 Oct 2011, 16:18:14 UTC

I had success at 15:56:19, but not at 16:01:50, 16:07:27 or 16:13:19.

Think I also detect a nose dive starting on cricket.

edit]uploads are ok.

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1758
Credit: 206,461,229
RAC: 14,819
Australia
Message 1157932 - Posted: 1 Oct 2011, 16:35:13 UTC - in response to Message 1157927.

I had success at 15:56:19, but not at 16:01:50, 16:07:27 or 16:13:19.

Think I also detect a nose dive starting on cricket.

edit]uploads are ok.

All my fault, I had most of my rigs shut down and had just restarted 2 of them. Downloads failed as soon as the 2nd one booted up and asked for work.

(Just wondering. Is there any way we can blame Misfit for this ? He hasn't been around for a long time but......)

T.A.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3677
Credit: 21,180,287
RAC: 7,451
Sweden
Message 1157936 - Posted: 1 Oct 2011, 16:45:04 UTC

All hope is lost. This works so badly and unreliable that I can no longer heat my apartment with the help of SETI.

I sold all my radiators last winter, because SETI worked so well, that my computers was enough to keep my rooms heated.

I will now die from lack of WU's, so don't anyone say that SETI isn't important.

Goodbye cruel world.


L


O


L

____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,210,338
RAC: 6,320
United States
Message 1157937 - Posted: 1 Oct 2011, 16:45:33 UTC

Well, I checked the server status page just before I posted that and it had refreshed just one minute before I did. That's why I posted it as showing disabled.


TA, we can always blame Misfit. Actually, I kinda miss him. Wonder how he's doing?
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 926
Credit: 12,374,656
RAC: 7,162
United Kingdom
Message 1157940 - Posted: 1 Oct 2011, 16:49:34 UTC

Something's definitely wrong. I just uploaded a pile of units and then sent all the results in to s&h. This morning I even got some new WUs.
I am worried that this may not be what is supposed to happen ;)

____________

S@NL - John van GorselProject donor
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 190
Credit: 137,737,158
RAC: 5,833
Netherlands
Message 1157942 - Posted: 1 Oct 2011, 16:52:12 UTC
Last modified: 1 Oct 2011, 16:52:48 UTC

For some reason my Linux pc's can still report (and get new work) while my Windows pc's all get the "unable to connect to server" or "HTTP error" message. Same thing happened yesterday when the Linux pc's were still able to get through.

The Cricket graphs clearly show that something happened about an hour ago.
____________


Seti@Netherlands website

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next

Message boards : Number crunching : Panic Mode On (56) Server problems?

Copyright © 2014 University of California