Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 · Next
Author Message
zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46125
Credit: 36,599,330
RAC: 5,286
Message 1320818 - Posted: 28 Dec 2012, 16:09:05 UTC - in response to Message 1320816.

I got 40 ghost WU's on this host of mine, result of the same issues posted before me:
http://setiathome.berkeley.edu/results.php?hostid=5332132

Maybe Seti could call in these guys... ;)

Me I have no idea if I have any ghosts or not, besides I just woke up.
____________
My Facebook, War Commander, 2015

j tramer
Send message
Joined: 6 Oct 03
Posts: 242
Credit: 5,385,663
RAC: 16
Canada
Message 1320819 - Posted: 28 Dec 2012, 16:12:16 UTC

i shut down my second computer.....

not wanted not needed....

soon both computers will not be running seti

more ppl should quit seti

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,776,243
RAC: 2,135
United States
Message 1320825 - Posted: 28 Dec 2012, 16:30:49 UTC - in response to Message 1320819.

Aye, she's getting a touch cranky again. Flushed my DNS, renewed my IP address and it did get through enough to report and refill. Once. I'll come back in a few hours and see how she's doing.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8466
Credit: 49,000,855
RAC: 73,191
United Kingdom
Message 1320851 - Posted: 28 Dec 2012, 17:34:15 UTC - in response to Message 1320810.

Scheduler contacts are frequently timing out or returning nothing (no headers, no data), and when i manage to get work they are all Shorties, at which point downloads are terribly slow, sounds as if the internal network is possibly overloaded again.

Claggy

Are you creating ghost results on your scheduler timeouts? (Specifically, the timeouts, not the empty replies)

And similarly, are results which you have attempted, but apparently failed, to report appearing as completed on the web task lists?

Yes for the first question:

28/12/2012 15:00:19 SETI@home [sched_op_debug] Starting scheduler request
28/12/2012 15:00:19 SETI@home Sending scheduler request: To fetch work.
28/12/2012 15:00:19 SETI@home Reporting 3 completed tasks, requesting new tasks for GPU
28/12/2012 15:00:19 SETI@home [sched_op_debug] CPU work request: 0.00 seconds; 0.00 CPUs
28/12/2012 15:00:19 SETI@home [sched_op_debug] NVIDIA GPU work request: 0.00 seconds; 0.00 GPUs
28/12/2012 15:00:19 SETI@home [sched_op_debug] ATI GPU work request: 82261.43 seconds; 0.00 GPUs
28/12/2012 15:05:28 Project communication failed: attempting access to reference site
28/12/2012 15:05:28 SETI@home Scheduler request failed: Timeout was reached
28/12/2012 15:05:28 SETI@home [sched_op_debug] Deferring communication for 1 min 0 sec
28/12/2012 15:05:28 SETI@home [sched_op_debug] Reason: Scheduler request failed
28/12/2012 15:05:29 Internet access OK - project servers may be temporarily down.


there are 20 ATI Ghosts waiting to be resent at the moment (all timed at the moment between 28 Dec 2012 | 15:00:33 UTC and 15:00:38 UTC): All tasks for computer 5427475

For the second question, i think it is yes too, (three tasks were reported, one of them i've seen as being reported at 15:00:31 UTC)

Claggy

I was getting exactly the same thing on Albert (Einstein's test server) before Christmas: the scheduler did everything it was supposed to do in less than a second, then sat there for two minutes twiddling its thumbs until Apache killed it with a SIGTERM (you can see useful things like that in the Einstein family server logs). I tried to convince Bernd and Eric (and David) that the two behaviours might be related (and not just by overwork - Albert was very lightly loaded at the time) - but Christmas holidays intervened. Something to pick up on in the New Year. Until then, zzzzzzzzz...

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46125
Credit: 36,599,330
RAC: 5,286
Message 1320870 - Posted: 28 Dec 2012, 18:00:30 UTC - in response to Message 1320851.

Just What both projects don't kneed, Senile servers, someone get out the rocking chairs... ;)
____________
My Facebook, War Commander, 2015

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 0
United Kingdom
Message 1320889 - Posted: 28 Dec 2012, 18:32:59 UTC

While the network was not at max coms where good,
Now that more AP splitters are runing we seem to have hit the same problem of a month ago,
That is what i can see of it,
Increasing the AP spliters slowly seem to me to point to something.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8466
Credit: 49,000,855
RAC: 73,191
United Kingdom
Message 1320891 - Posted: 28 Dec 2012, 18:37:32 UTC

On a test sample of one host, I got the same outcome here as I got at Albert.

1) Try to do a normal update (reporting completed work, and requesting new work):
I saw a server timeout, but the server registered the completed work and created some ghosts.

2) Set NNT before update: I got acknowledgements of the (same) completed work).

3) Unset NNT and update again: I got the (same) ghosts, as "resent lost results".

I don't think it's just network congestion, no matter how severe.

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46125
Credit: 36,599,330
RAC: 5,286
Message 1320896 - Posted: 28 Dec 2012, 18:56:13 UTC - in response to Message 1320891.

On a test sample of one host, I got the same outcome here as I got at Albert.

1) Try to do a normal update (reporting completed work, and requesting new work):
I saw a server timeout, but the server registered the completed work and created some ghosts.

2) Set NNT before update: I got acknowledgements of the (same) completed work).

3) Unset NNT and update again: I got the (same) ghosts, as "resent lost results".

I don't think it's just network congestion, no matter how severe.

Yeah, something is screwed up, but what? The Joker is in the details...
____________
My Facebook, War Commander, 2015

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38925
Credit: 579,213,332
RAC: 510,525
United States
Message 1320949 - Posted: 28 Dec 2012, 20:43:30 UTC

Yeah, getting scheduler requests through again is so ugly most of my better rigs are out of GPU work due to timed out or otherwise unable to be completed requests and the %#@$#@&##! 100 WU limit.

This limit situation is starting to piss even the good natured kitties off.
Can't ride out the Tuesday outage or some network/server congestion without running out of work for the GPUs.



____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6793
Credit: 93,166,617
RAC: 75,761
Australia
Message 1320952 - Posted: 28 Dec 2012, 20:53:10 UTC - in response to Message 1320949.

If the guys have changed something in the server closet in the last several hours then they better change it back again.

Cheers.
____________

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1222
Credit: 45,652,396
RAC: 117,412
United States
Message 1320954 - Posted: 28 Dec 2012, 20:58:45 UTC

I'm having problems with the uploads hanging. I don't have that many with the APs going on one card and long MBs on the other, but, they are all hanging. The Long MBs are about gone, and now the recently downloaded shorties will be running....more hangs.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38925
Credit: 579,213,332
RAC: 510,525
United States
Message 1320956 - Posted: 28 Dec 2012, 20:59:36 UTC
Last modified: 28 Dec 2012, 21:04:42 UTC

And when, after a number of retries, the scheduler responds with some 'resends', most of the downloads are dead in the water.

EDIT...
I would estimate this all went to heck in a handbasket about 4-5 hours ago. When I left for work about 9 hours ago, all rigs had their pitiful 100 WU allotment filled.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38925
Credit: 579,213,332
RAC: 510,525
United States
Message 1320965 - Posted: 28 Dec 2012, 21:41:28 UTC

Seeing the same behavior as when things were really bad a while back....before the limits 'fixed' things.

Host makes scheduler request. My account shows that contact with the scheduler was made. Scheduler does not answer.... Host tries again, still no answer. Eventually after enough retries, the scheduler responds by resending 'lost' tasks. MY rigs did not lose them.

Uploads are rather dicey too.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46125
Credit: 36,599,330
RAC: 5,286
Message 1320969 - Posted: 28 Dec 2012, 21:55:21 UTC - in response to Message 1320965.

Seeing the same behavior as when things were really bad a while back....before the limits 'fixed' things.

Host makes scheduler request. My account shows that contact with the scheduler was made. Scheduler does not answer.... Host tries again, still no answer. Eventually after enough retries, the scheduler responds by resending 'lost' tasks. MY rigs did not lose them.

Uploads are rather dicey too.

Yeah and that's crazy, something's borked...
____________
My Facebook, War Commander, 2015

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6793
Credit: 93,166,617
RAC: 75,761
Australia
Message 1320973 - Posted: 28 Dec 2012, 22:01:37 UTC - in response to Message 1320969.

I've just set NNT until this hiccup is over as I'm not going to baby sit down/up loads (my backup projects may get to fight for my resources again).

Cheers.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5792
Credit: 58,084,143
RAC: 48,369
Australia
Message 1320974 - Posted: 28 Dec 2012, 22:03:57 UTC - in response to Message 1320965.


Uploads backing up.
After hitting retry several dozen times i was able to get a couple to upload, eventually. Upload speed was that of an old & crippled snail (< 2kB/s).
Upload error message- connect() failed.

Have got a few Scheduler errors, mostly Server returned no data etc. I'd probably have more, but the backedup uploads have been blocking the work requests. When the request does go through it's taking 1-2min to get a response.


BTW- weren't the WUs with the really long identifier meant to have been fixed? I'm still getting lots of those.
____________
Grant
Darwin NT.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38925
Credit: 579,213,332
RAC: 510,525
United States
Message 1320977 - Posted: 28 Dec 2012, 22:15:42 UTC - in response to Message 1320974.


Uploads backing up.
After hitting retry several dozen times i was able to get a couple to upload, eventually. Upload speed was that of an old & crippled snail (< 2kB/s).
Upload error message- connect() failed.

Have got a few Scheduler errors, mostly Server returned no data etc. I'd probably have more, but the backedup uploads have been blocking the work requests. When the request does go through it's taking 1-2min to get a response.


BTW- weren't the WUs with the really long identifier meant to have been fixed? I'm still getting lots of those.

Your observations about comms mirror what I am seeing.

I don't think the long IDs were considered a problem per se, but I thought they were a temporary thing as well.

____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5792
Credit: 58,084,143
RAC: 48,369
Australia
Message 1321027 - Posted: 28 Dec 2012, 23:08:24 UTC - in response to Message 1320977.


Just to add to the fun, when i do get work it's almost all shorties.
____________
Grant
Darwin NT.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8635
Credit: 23,742,408
RAC: 18,803
United Kingdom
Message 1321038 - Posted: 28 Dec 2012, 23:17:54 UTC - in response to Message 1320974.
Last modified: 28 Dec 2012, 23:18:37 UTC

Uploads backing up.

That looks like the problem here also.

By abusing a few buttons, got enough uploads to happen so that requests could be made.

It took a few requests but eventually got a few GPU tasks and they all came in at >50kbs.

Just to add to the fun, when i do get work it's almost all shorties.


Same here, so that still leaves me less than an hours GPU crunching time on hand.

So as my four legged friend (the bed) calls I must either enable Einstein crunching or switch off and try domani.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5792
Credit: 58,084,143
RAC: 48,369
Australia
Message 1321047 - Posted: 28 Dec 2012, 23:25:09 UTC - in response to Message 1321038.

It took a few requests but eventually got a few GPU tasks and they all came in at >50kbs.

10-20kB/s here at the moment.

Now with uploads 1kB/s is doing well (when it does eventually go through).
____________
Grant
Darwin NT.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California