Panic Mode On (76) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (76) Server Problems?

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 · Next
Author Message
Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5864
Credit: 60,560,326
RAC: 47,745
Australia
Message 1278727 - Posted: 1 Sep 2012, 8:13:08 UTC - in response to Message 1278722.


No longer getting Scheduler errors & timeouts. For the last 40 min anyway.
____________
Grant
Darwin NT.

Lee Gresham
Avatar
Send message
Joined: 12 Aug 03
Posts: 131
Credit: 101,993,561
RAC: 32,244
United States
Message 1279407 - Posted: 2 Sep 2012, 22:36:40 UTC - in response to Message 1278595.

I set NNT yesterday morning when I noticed a truck load of pending downloads. This morning I had 36 task to down load, Even with button pushing I still have 13 to download. I have Einstein running now. Being a holiday weekend I dont expect the lab to get things up untill after the Tuesday outage.



Work download problems gone on all 4 computers, however, I just caught the GPU on 1 PC running VLARs again. Haven't checked the other PCs yet.
____________
Delta-V

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,063,564
RAC: 596
United Kingdom
Message 1279473 - Posted: 3 Sep 2012, 2:13:51 UTC - in response to Message 1279407.

Work download problems gone on all 4 computers, however, I just caught the GPU on 1 PC running VLARs again. Haven't checked the other PCs yet.

VLAR`s !!
On a GPU ??
Pray tell how did you manage that ?
I have not seen a vlar in weeks.
I did not know that they still made them.
I WANT VLAR`S.

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,975,287
RAC: 13,839
United States
Message 1279486 - Posted: 3 Sep 2012, 2:46:37 UTC - in response to Message 1279473.

Work download problems gone on all 4 computers, however, I just caught the GPU on 1 PC running VLARs again. Haven't checked the other PCs yet.

VLAR`s !!
On a GPU ??
Pray tell how did you manage that ?
I have not seen a vlar in weeks.
I did not know that they still made them.
I WANT VLAR`S.


Impossible! The servers have been fixed so that can never occur.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4141
Credit: 33,584,451
RAC: 26,691
United Kingdom
Message 1279517 - Posted: 3 Sep 2012, 4:36:37 UTC - in response to Message 1279486.

Work download problems gone on all 4 computers, however, I just caught the GPU on 1 PC running VLARs again. Haven't checked the other PCs yet.

VLAR`s !!
On a GPU ??
Pray tell how did you manage that ?
I have not seen a vlar in weeks.
I did not know that they still made them.
I WANT VLAR`S.


Impossible! The servers have been fixed so that can never occur.

He received those tasks 3 weeks ago, and has only just got round to starting them,

Claggy

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,975,287
RAC: 13,839
United States
Message 1279554 - Posted: 3 Sep 2012, 7:12:02 UTC - in response to Message 1279517.


He received those tasks 3 weeks ago, and has only just got round to starting them,

Claggy


Better late than never.

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,063,564
RAC: 596
United Kingdom
Message 1279609 - Posted: 3 Sep 2012, 11:23:40 UTC - in response to Message 1279486.

Work download problems gone on all 4 computers, however, I just caught the GPU on 1 PC running VLARs again. Haven't checked the other PCs yet.

VLAR`s !!
On a GPU ??
Pray tell how did you manage that ?
I have not seen a vlar in weeks.
I did not know that they still made them.
I WANT VLAR`S.


Impossible! The servers have been fixed so that can never occur.

I see it as the servers have been fubared so I/we dont get them :(

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3689
Credit: 48,728,723
RAC: 6,430
United States
Message 1280009 - Posted: 4 Sep 2012, 20:12:24 UTC

And we are back from another weekly backup.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5864
Credit: 60,560,326
RAC: 47,745
Australia
Message 1280352 - Posted: 5 Sep 2012, 18:11:21 UTC - in response to Message 1280009.


Getting lots of Scheduler time outs again.
____________
Grant
Darwin NT.

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1280381 - Posted: 5 Sep 2012, 19:28:20 UTC - in response to Message 1280352.


Getting lots of Scheduler time outs again.


Ditto. Of course, since AP's are being split, that explains it.
____________

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,063,564
RAC: 596
United Kingdom
Message 1280537 - Posted: 6 Sep 2012, 10:50:14 UTC

The time outs are so bad i am out of work again
Just can`t get through the AP crush

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1280565 - Posted: 6 Sep 2012, 12:18:02 UTC - in response to Message 1280537.

Anyone know what happened here?

http://setiathome.berkeley.edu/workunit.php?wuid=1060552647

Basically I had some tasks go into 'Timeout no response' within 2 hours of being sent to me. That's a bit odd.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1715
Credit: 205,802,760
RAC: 27,645
Australia
Message 1280572 - Posted: 6 Sep 2012, 12:45:09 UTC - in response to Message 1280565.

Anyone know what happened here?

http://setiathome.berkeley.edu/workunit.php?wuid=1060552647

Basically I had some tasks go into 'Timeout no response' within 2 hours of being sent to me. That's a bit odd.

Did your computer actually download these units or did the tasks get lost in Limbo due to the server load ?

50% of the units I'm getting atm are resends.

Someone mentioned in an earlier thread that reports with the client set to NNT get through. It's the reports with a request for new tasks that hang. This indicates an overload of the download servers.

T.A.

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1280577 - Posted: 6 Sep 2012, 13:11:58 UTC - in response to Message 1280572.

Anyone know what happened here?

http://setiathome.berkeley.edu/workunit.php?wuid=1060552647

Basically I had some tasks go into 'Timeout no response' within 2 hours of being sent to me. That's a bit odd.

Did your computer actually download these units or did the tasks get lost in Limbo due to the server load ?

50% of the units I'm getting atm are resends.

Someone mentioned in an earlier thread that reports with the client set to NNT get through. It's the reports with a request for new tasks that hang. This indicates an overload of the download servers.

T.A.


No idea, I was asleep. I didn't abort them or anything, just checked my tasks this morning and saw a bunch of fresh errors.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4302
Credit: 1,072,504
RAC: 1,211
United States
Message 1280587 - Posted: 6 Sep 2012, 14:02:46 UTC - in response to Message 1280565.

Anyone know what happened here?

http://setiathome.berkeley.edu/workunit.php?wuid=1060552647

Basically I had some tasks go into 'Timeout no response' within 2 hours of being sent to me. That's a bit odd.

The only known cause for early expiry of tasks is when the "Resend lost work" feature finds that for some reason it cannot assign some lost work to the host. In this case, your host is only doing MB on CUDA and AP on CPU. I think that at 6 Sep 2012 | 11:45:23 UTC it requested CPU work only, so 20 lost MB tasks which it had originally assigned to CUDA were expired because they couldn't be resent to CPU.
Joe

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1280588 - Posted: 6 Sep 2012, 14:06:56 UTC - in response to Message 1280587.

Anyone know what happened here?

http://setiathome.berkeley.edu/workunit.php?wuid=1060552647

Basically I had some tasks go into 'Timeout no response' within 2 hours of being sent to me. That's a bit odd.

The only known cause for early expiry of tasks is when the "Resend lost work" feature finds that for some reason it cannot assign some lost work to the host. In this case, your host is only doing MB on CUDA and AP on CPU. I think that at 6 Sep 2012 | 11:45:23 UTC it requested CPU work only, so 20 lost MB tasks which it had originally assigned to CUDA were expired because they couldn't be resent to CPU.
Joe


Hrm it shouldn't have done that but thanks for the explanation. Was just trying to make sure I didn't do anything wrong.

Thanks Josef and T.A. :)
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,063,564
RAC: 596
United Kingdom
Message 1280748 - Posted: 6 Sep 2012, 21:07:53 UTC
Last modified: 6 Sep 2012, 21:11:09 UTC

Try this one for size :¬)
Task Self Destruct
sent - 6 Sep 2012 | 10:12:04 UTC
dead - 6 Sep 2012 | 10:20:01 UTC Timed out - no response
Just another imposible deadline
I think there was a thread about it not long ago.

Profile Keith Myers
Volunteer tester
Avatar
Send message
Joined: 29 Apr 01
Posts: 171
Credit: 64,364,024
RAC: 34,192
United States
Message 1280823 - Posted: 7 Sep 2012, 1:27:56 UTC

I've recently gotten a slew of "timed out no response" errors too. Some in as little as 6 minutes. I think those were because they were VLAR's getting assigned to the NVIDIA GPU's.

http://setiathome.berkeley.edu/workunit.php?wuid=1059826158

Keith
____________

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1280840 - Posted: 7 Sep 2012, 2:41:25 UTC - in response to Message 1278504.

Two of my three boxes are OK, normal D/L and U/L, a bit of a lag reporting. The third is a complete mess; over 2000 waiting D/L's, and reporting is almost impossible. I've changed MTR from 256, to 100, and now 50, plus set NNT until I can get about 700 WU's reported.

EDIT: With 50 MTR and NNT, I got all 700 WU's reported. As soon as I removed NNT, BOINC tried to report two additional WU's and request new work....and timed out.

So it looks like any scheduler request with a work fetch is timing out.


No download issues ATM, but, same scheduler problem with reporting. The minute I set BOINC to NNT, the scheduler request goes through, almost immediately. Remove NNT and every request times out.
____________

Rolf
Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 1
Switzerland
Message 1280916 - Posted: 7 Sep 2012, 7:50:46 UTC - in response to Message 1280840.

No download issues ATM, but, same scheduler problem with reporting. The minute I set BOINC to NNT, the scheduler request goes through, almost immediately. Remove NNT and every request times out.


Exactly same here!

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 · Next

Message boards : Number crunching : Panic Mode On (76) Server Problems?

Copyright © 2014 University of California