Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 · Next
Author Message
zoom314
Volunteer tester
Avatar
Send message
Joined: 30 Nov 03
Posts: 51940
Credit: 38,622,802
RAC: 7,874
United States
Message 1320818 - Posted: 28 Dec 2012, 16:09:05 UTC - in response to Message 1320816.

I got 40 ghost WU's on this host of mine, result of the same issues posted before me:
http://setiathome.berkeley.edu/results.php?hostid=5332132

Maybe Seti could call in these guys... ;)

Me I have no idea if I have any ghosts or not, besides I just woke up.
____________
Spiderman, Pluto is still a planet. Basic Income for all!

j tramer
Send message
Joined: 6 Oct 03
Posts: 242
Credit: 5,401,277
RAC: 134
Canada
Message 1320819 - Posted: 28 Dec 2012, 16:12:16 UTC

i shut down my second computer.....

not wanted not needed....

soon both computers will not be running seti

more ppl should quit seti

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 383
Credit: 3,650,157
RAC: 2,299
United States
Message 1320825 - Posted: 28 Dec 2012, 16:30:49 UTC - in response to Message 1320819.

Aye, she's getting a touch cranky again. Flushed my DNS, renewed my IP address and it did get through enough to report and refill. Once. I'll come back in a few hours and see how she's doing.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 9604
Credit: 62,659,837
RAC: 95,875
United Kingdom
Message 1320851 - Posted: 28 Dec 2012, 17:34:15 UTC - in response to Message 1320810.

Scheduler contacts are frequently timing out or returning nothing (no headers, no data), and when i manage to get work they are all Shorties, at which point downloads are terribly slow, sounds as if the internal network is possibly overloaded again.

Claggy

Are you creating ghost results on your scheduler timeouts? (Specifically, the timeouts, not the empty replies)

And similarly, are results which you have attempted, but apparently failed, to report appearing as completed on the web task lists?

Yes for the first question:

28/12/2012 15:00:19 SETI@home [sched_op_debug] Starting scheduler request
28/12/2012 15:00:19 SETI@home Sending scheduler request: To fetch work.
28/12/2012 15:00:19 SETI@home Reporting 3 completed tasks, requesting new tasks for GPU
28/12/2012 15:00:19 SETI@home [sched_op_debug] CPU work request: 0.00 seconds; 0.00 CPUs
28/12/2012 15:00:19 SETI@home [sched_op_debug] NVIDIA GPU work request: 0.00 seconds; 0.00 GPUs
28/12/2012 15:00:19 SETI@home [sched_op_debug] ATI GPU work request: 82261.43 seconds; 0.00 GPUs
28/12/2012 15:05:28 Project communication failed: attempting access to reference site
28/12/2012 15:05:28 SETI@home Scheduler request failed: Timeout was reached
28/12/2012 15:05:28 SETI@home [sched_op_debug] Deferring communication for 1 min 0 sec
28/12/2012 15:05:28 SETI@home [sched_op_debug] Reason: Scheduler request failed
28/12/2012 15:05:29 Internet access OK - project servers may be temporarily down.


there are 20 ATI Ghosts waiting to be resent at the moment (all timed at the moment between 28 Dec 2012 | 15:00:33 UTC and 15:00:38 UTC): All tasks for computer 5427475

For the second question, i think it is yes too, (three tasks were reported, one of them i've seen as being reported at 15:00:31 UTC)

Claggy

I was getting exactly the same thing on Albert (Einstein's test server) before Christmas: the scheduler did everything it was supposed to do in less than a second, then sat there for two minutes twiddling its thumbs until Apache killed it with a SIGTERM (you can see useful things like that in the Einstein family server logs). I tried to convince Bernd and Eric (and David) that the two behaviours might be related (and not just by overwork - Albert was very lightly loaded at the time) - but Christmas holidays intervened. Something to pick up on in the New Year. Until then, zzzzzzzzz...

zoom314
Volunteer tester
Avatar
Send message
Joined: 30 Nov 03
Posts: 51940
Credit: 38,622,802
RAC: 7,874
United States
Message 1320870 - Posted: 28 Dec 2012, 18:00:30 UTC - in response to Message 1320851.

Just What both projects don't kneed, Senile servers, someone get out the rocking chairs... ;)
____________
Spiderman, Pluto is still a planet. Basic Income for all!

.clair.
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 29,811,207
RAC: 1,019
United Kingdom
Message 1320889 - Posted: 28 Dec 2012, 18:32:59 UTC

While the network was not at max coms where good,
Now that more AP splitters are runing we seem to have hit the same problem of a month ago,
That is what i can see of it,
Increasing the AP spliters slowly seem to me to point to something.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 9604
Credit: 62,659,837
RAC: 95,875
United Kingdom
Message 1320891 - Posted: 28 Dec 2012, 18:37:32 UTC

On a test sample of one host, I got the same outcome here as I got at Albert.

1) Try to do a normal update (reporting completed work, and requesting new work):
I saw a server timeout, but the server registered the completed work and created some ghosts.

2) Set NNT before update: I got acknowledgements of the (same) completed work).

3) Unset NNT and update again: I got the (same) ghosts, as "resent lost results".

I don't think it's just network congestion, no matter how severe.

zoom314
Volunteer tester
Avatar
Send message
Joined: 30 Nov 03
Posts: 51940
Credit: 38,622,802
RAC: 7,874
United States
Message 1320896 - Posted: 28 Dec 2012, 18:56:13 UTC - in response to Message 1320891.

On a test sample of one host, I got the same outcome here as I got at Albert.

1) Try to do a normal update (reporting completed work, and requesting new work):
I saw a server timeout, but the server registered the completed work and created some ghosts.

2) Set NNT before update: I got acknowledgements of the (same) completed work).

3) Unset NNT and update again: I got the (same) ghosts, as "resent lost results".

I don't think it's just network congestion, no matter how severe.

Yeah, something is screwed up, but what? The Joker is in the details...
____________
Spiderman, Pluto is still a planet. Basic Income for all!

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 42168
Credit: 701,643,050
RAC: 389,935
United States
Message 1320949 - Posted: 28 Dec 2012, 20:43:30 UTC

Yeah, getting scheduler requests through again is so ugly most of my better rigs are out of GPU work due to timed out or otherwise unable to be completed requests and the %#@$#@&##! 100 WU limit.

This limit situation is starting to piss even the good natured kitties off.
Can't ride out the Tuesday outage or some network/server congestion without running out of work for the GPUs.



____________
**************************
PCs, kitties, and kibblestones.
That would be me.


I have met a few friends in my life.
Most were cats.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 9449
Credit: 115,930,192
RAC: 72,986
Australia
Message 1320952 - Posted: 28 Dec 2012, 20:53:10 UTC - in response to Message 1320949.

If the guys have changed something in the server closet in the last several hours then they better change it back again.

Cheers.
____________

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 2103
Credit: 75,087,548
RAC: 122,821
United States
Message 1320954 - Posted: 28 Dec 2012, 20:58:45 UTC

I'm having problems with the uploads hanging. I don't have that many with the APs going on one card and long MBs on the other, but, they are all hanging. The Long MBs are about gone, and now the recently downloaded shorties will be running....more hangs.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 42168
Credit: 701,643,050
RAC: 389,935
United States
Message 1320956 - Posted: 28 Dec 2012, 20:59:36 UTC
Last modified: 28 Dec 2012, 21:04:42 UTC

And when, after a number of retries, the scheduler responds with some 'resends', most of the downloads are dead in the water.

EDIT...
I would estimate this all went to heck in a handbasket about 4-5 hours ago. When I left for work about 9 hours ago, all rigs had their pitiful 100 WU allotment filled.
____________
**************************
PCs, kitties, and kibblestones.
That would be me.


I have met a few friends in my life.
Most were cats.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 42168
Credit: 701,643,050
RAC: 389,935
United States
Message 1320965 - Posted: 28 Dec 2012, 21:41:28 UTC

Seeing the same behavior as when things were really bad a while back....before the limits 'fixed' things.

Host makes scheduler request. My account shows that contact with the scheduler was made. Scheduler does not answer.... Host tries again, still no answer. Eventually after enough retries, the scheduler responds by resending 'lost' tasks. MY rigs did not lose them.

Uploads are rather dicey too.
____________
**************************
PCs, kitties, and kibblestones.
That would be me.


I have met a few friends in my life.
Most were cats.

zoom314
Volunteer tester
Avatar
Send message
Joined: 30 Nov 03
Posts: 51940
Credit: 38,622,802
RAC: 7,874
United States
Message 1320969 - Posted: 28 Dec 2012, 21:55:21 UTC - in response to Message 1320965.

Seeing the same behavior as when things were really bad a while back....before the limits 'fixed' things.

Host makes scheduler request. My account shows that contact with the scheduler was made. Scheduler does not answer.... Host tries again, still no answer. Eventually after enough retries, the scheduler responds by resending 'lost' tasks. MY rigs did not lose them.

Uploads are rather dicey too.

Yeah and that's crazy, something's borked...
____________
Spiderman, Pluto is still a planet. Basic Income for all!

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 9449
Credit: 115,930,192
RAC: 72,986
Australia
Message 1320973 - Posted: 28 Dec 2012, 22:01:37 UTC - in response to Message 1320969.

I've just set NNT until this hiccup is over as I'm not going to baby sit down/up loads (my backup projects may get to fight for my resources again).

Cheers.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 6394
Credit: 73,994,572
RAC: 48,453
Australia
Message 1320974 - Posted: 28 Dec 2012, 22:03:57 UTC - in response to Message 1320965.


Uploads backing up.
After hitting retry several dozen times i was able to get a couple to upload, eventually. Upload speed was that of an old & crippled snail (< 2kB/s).
Upload error message- connect() failed.

Have got a few Scheduler errors, mostly Server returned no data etc. I'd probably have more, but the backedup uploads have been blocking the work requests. When the request does go through it's taking 1-2min to get a response.


BTW- weren't the WUs with the really long identifier meant to have been fixed? I'm still getting lots of those.
____________
Grant
Darwin NT.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 42168
Credit: 701,643,050
RAC: 389,935
United States
Message 1320977 - Posted: 28 Dec 2012, 22:15:42 UTC - in response to Message 1320974.


Uploads backing up.
After hitting retry several dozen times i was able to get a couple to upload, eventually. Upload speed was that of an old & crippled snail (< 2kB/s).
Upload error message- connect() failed.

Have got a few Scheduler errors, mostly Server returned no data etc. I'd probably have more, but the backedup uploads have been blocking the work requests. When the request does go through it's taking 1-2min to get a response.


BTW- weren't the WUs with the really long identifier meant to have been fixed? I'm still getting lots of those.

Your observations about comms mirror what I am seeing.

I don't think the long IDs were considered a problem per se, but I thought they were a temporary thing as well.

____________
**************************
PCs, kitties, and kibblestones.
That would be me.


I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 6394
Credit: 73,994,572
RAC: 48,453
Australia
Message 1321027 - Posted: 28 Dec 2012, 23:08:24 UTC - in response to Message 1320977.


Just to add to the fun, when i do get work it's almost all shorties.
____________
Grant
Darwin NT.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8936
Credit: 28,629,140
RAC: 33
United Kingdom
Message 1321038 - Posted: 28 Dec 2012, 23:17:54 UTC - in response to Message 1320974.
Last modified: 28 Dec 2012, 23:18:37 UTC

Uploads backing up.

That looks like the problem here also.

By abusing a few buttons, got enough uploads to happen so that requests could be made.

It took a few requests but eventually got a few GPU tasks and they all came in at >50kbs.

Just to add to the fun, when i do get work it's almost all shorties.


Same here, so that still leaves me less than an hours GPU crunching time on hand.

So as my four legged friend (the bed) calls I must either enable Einstein crunching or switch off and try domani.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 6394
Credit: 73,994,572
RAC: 48,453
Australia
Message 1321047 - Posted: 28 Dec 2012, 23:25:09 UTC - in response to Message 1321038.

It took a few requests but eventually got a few GPU tasks and they all came in at >50kbs.

10-20kB/s here at the moment.

Now with uploads 1kB/s is doing well (when it does eventually go through).
____________
Grant
Darwin NT.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2015 University of California