Panic Mode On (79) Server Problems?

Message boards : Number crunching : Panic Mode On (79) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · Next

AuthorMessage
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1320889 - Posted: 28 Dec 2012, 18:32:59 UTC

While the network was not at max coms where good,
Now that more AP splitters are runing we seem to have hit the same problem of a month ago,
That is what i can see of it,
Increasing the AP spliters slowly seem to me to point to something.
ID: 1320889 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1320891 - Posted: 28 Dec 2012, 18:37:32 UTC

On a test sample of one host, I got the same outcome here as I got at Albert.

1) Try to do a normal update (reporting completed work, and requesting new work):
I saw a server timeout, but the server registered the completed work and created some ghosts.

2) Set NNT before update: I got acknowledgements of the (same) completed work).

3) Unset NNT and update again: I got the (same) ghosts, as "resent lost results".

I don't think it's just network congestion, no matter how severe.
ID: 1320891 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1320896 - Posted: 28 Dec 2012, 18:56:13 UTC - in response to Message 1320891.  

On a test sample of one host, I got the same outcome here as I got at Albert.

1) Try to do a normal update (reporting completed work, and requesting new work):
I saw a server timeout, but the server registered the completed work and created some ghosts.

2) Set NNT before update: I got acknowledgements of the (same) completed work).

3) Unset NNT and update again: I got the (same) ghosts, as "resent lost results".

I don't think it's just network congestion, no matter how severe.

Yeah, something is screwed up, but what? The Joker is in the details...
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1320896 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1320949 - Posted: 28 Dec 2012, 20:43:30 UTC

Yeah, getting scheduler requests through again is so ugly most of my better rigs are out of GPU work due to timed out or otherwise unable to be completed requests and the %#@$#@&##! 100 WU limit.

This limit situation is starting to piss even the good natured kitties off.
Can't ride out the Tuesday outage or some network/server congestion without running out of work for the GPUs.



"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1320949 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1320952 - Posted: 28 Dec 2012, 20:53:10 UTC - in response to Message 1320949.  

If the guys have changed something in the server closet in the last several hours then they better change it back again.

Cheers.
ID: 1320952 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1320954 - Posted: 28 Dec 2012, 20:58:45 UTC

I'm having problems with the uploads hanging. I don't have that many with the APs going on one card and long MBs on the other, but, they are all hanging. The Long MBs are about gone, and now the recently downloaded shorties will be running....more hangs.
ID: 1320954 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1320956 - Posted: 28 Dec 2012, 20:59:36 UTC
Last modified: 28 Dec 2012, 21:04:42 UTC

And when, after a number of retries, the scheduler responds with some 'resends', most of the downloads are dead in the water.

EDIT...
I would estimate this all went to heck in a handbasket about 4-5 hours ago. When I left for work about 9 hours ago, all rigs had their pitiful 100 WU allotment filled.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1320956 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1320965 - Posted: 28 Dec 2012, 21:41:28 UTC

Seeing the same behavior as when things were really bad a while back....before the limits 'fixed' things.

Host makes scheduler request. My account shows that contact with the scheduler was made. Scheduler does not answer.... Host tries again, still no answer. Eventually after enough retries, the scheduler responds by resending 'lost' tasks. MY rigs did not lose them.

Uploads are rather dicey too.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1320965 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1320969 - Posted: 28 Dec 2012, 21:55:21 UTC - in response to Message 1320965.  

Seeing the same behavior as when things were really bad a while back....before the limits 'fixed' things.

Host makes scheduler request. My account shows that contact with the scheduler was made. Scheduler does not answer.... Host tries again, still no answer. Eventually after enough retries, the scheduler responds by resending 'lost' tasks. MY rigs did not lose them.

Uploads are rather dicey too.

Yeah and that's crazy, something's borked...
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1320969 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1320973 - Posted: 28 Dec 2012, 22:01:37 UTC - in response to Message 1320969.  

I've just set NNT until this hiccup is over as I'm not going to baby sit down/up loads (my backup projects may get to fight for my resources again).

Cheers.
ID: 1320973 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1320974 - Posted: 28 Dec 2012, 22:03:57 UTC - in response to Message 1320965.  


Uploads backing up.
After hitting retry several dozen times i was able to get a couple to upload, eventually. Upload speed was that of an old & crippled snail (< 2kB/s).
Upload error message- connect() failed.

Have got a few Scheduler errors, mostly Server returned no data etc. I'd probably have more, but the backedup uploads have been blocking the work requests. When the request does go through it's taking 1-2min to get a response.


BTW- weren't the WUs with the really long identifier meant to have been fixed? I'm still getting lots of those.
Grant
Darwin NT
ID: 1320974 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1320977 - Posted: 28 Dec 2012, 22:15:42 UTC - in response to Message 1320974.  


Uploads backing up.
After hitting retry several dozen times i was able to get a couple to upload, eventually. Upload speed was that of an old & crippled snail (< 2kB/s).
Upload error message- connect() failed.

Have got a few Scheduler errors, mostly Server returned no data etc. I'd probably have more, but the backedup uploads have been blocking the work requests. When the request does go through it's taking 1-2min to get a response.


BTW- weren't the WUs with the really long identifier meant to have been fixed? I'm still getting lots of those.

Your observations about comms mirror what I am seeing.

I don't think the long IDs were considered a problem per se, but I thought they were a temporary thing as well.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1320977 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1321027 - Posted: 28 Dec 2012, 23:08:24 UTC - in response to Message 1320977.  


Just to add to the fun, when i do get work it's almost all shorties.
Grant
Darwin NT
ID: 1321027 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1321038 - Posted: 28 Dec 2012, 23:17:54 UTC - in response to Message 1320974.  
Last modified: 28 Dec 2012, 23:18:37 UTC

Uploads backing up.

That looks like the problem here also.

By abusing a few buttons, got enough uploads to happen so that requests could be made.

It took a few requests but eventually got a few GPU tasks and they all came in at >50kbs.

Just to add to the fun, when i do get work it's almost all shorties.


Same here, so that still leaves me less than an hours GPU crunching time on hand.

So as my four legged friend (the bed) calls I must either enable Einstein crunching or switch off and try domani.
ID: 1321038 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1321047 - Posted: 28 Dec 2012, 23:25:09 UTC - in response to Message 1321038.  

It took a few requests but eventually got a few GPU tasks and they all came in at >50kbs.

10-20kB/s here at the moment.

Now with uploads 1kB/s is doing well (when it does eventually go through).
Grant
Darwin NT
ID: 1321047 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1321052 - Posted: 28 Dec 2012, 23:32:12 UTC

Ye-Haw a Shorty Storm with a Borked Upload server. How many do you think it will hang?

12/28/2012 6:13:56 PM | SETI@home | Started upload of 03se12aa.10570.17249.140733193388039.10.238_2_0
12/28/2012 6:14:19 PM |  | Project communication failed: attempting access to reference site
12/28/2012 6:14:19 PM | SETI@home | Temporarily failed upload of 03se12aa.10570.17249.140733193388039.10.238_2_0: connect() failed
12/28/2012 6:14:19 PM | SETI@home | Backing off 3 min 50 sec on upload of 03se12aa.10570.17249.140733193388039.10.238_2_0
12/28/2012 6:14:21 PM |  | Internet access OK - project servers may be temporarily down.
12/28/2012 6:17:59 PM | SETI@home | Computation for task 07oc12af.16388.6202.140733193388038.10.1_1 finished
12/28/2012 6:17:59 PM | SETI@home | Starting task 07oc12ag.10905.12746.9.10.247_1 using setiathome_enhanced version 609 (cuda23) in slot 3
12/28/2012 6:18:01 PM | SETI@home | Started upload of 07oc12af.16388.6202.140733193388038.10.1_1_0
12/28/2012 6:18:10 PM | SETI@home | Started upload of 03se12aa.10570.17249.140733193388039.10.238_2_0
12/28/2012 6:18:23 PM |  | Project communication failed: attempting access to reference site
12/28/2012 6:18:23 PM | SETI@home | Temporarily failed upload of 07oc12af.16388.6202.140733193388038.10.1_1_0: connect() failed
12/28/2012 6:18:23 PM | SETI@home | Backing off 3 min 34 sec on upload of 07oc12af.16388.6202.140733193388038.10.1_1_0
12/28/2012 6:18:25 PM |  | Internet access OK - project servers may be temporarily down.
12/28/2012 6:21:45 PM |  | Project communication failed: attempting access to reference site
12/28/2012 6:21:45 PM | SETI@home | Temporarily failed upload of 03se12aa.10570.17249.140733193388039.10.238_2_0: connect() failed
12/28/2012 6:21:45 PM | SETI@home | Backing off 4 min 49 sec on upload of 03se12aa.10570.17249.140733193388039.10.238_2_0
12/28/2012 6:21:47 PM |  | Internet access OK - project servers may be temporarily down.
12/28/2012 6:22:09 PM | SETI@home | Computation for task 07oc12ag.10905.12746.9.10.247_1 finished
12/28/2012 6:22:09 PM | SETI@home | Starting task 07oc12ah.10878.22562.6.10.76_1 using setiathome_enhanced version 609 (cuda23) in slot 3
12/28/2012 6:22:11 PM | SETI@home | Started upload of 07oc12ag.10905.12746.9.10.247_1_0
12/28/2012 6:22:49 PM |  | Project communication failed: attempting access to reference site
12/28/2012 6:22:49 PM | SETI@home | Temporarily failed upload of 07oc12ag.10905.12746.9.10.247_1_0: connect() failed
12/28/2012 6:22:49 PM | SETI@home | Backing off 3 min 11 sec on upload of 07oc12ag.10905.12746.9.10.247_1_0
12/28/2012 6:22:50 PM |  | Internet access OK - project servers may be temporarily down.
12/28/2012 6:26:11 PM | SETI@home | Computation for task 07oc12ah.10878.22562.6.10.76_1 finished
12/28/2012 6:26:11 PM | SETI@home | Starting task 01au12ab.23909.24895.10.10.9_2 using setiathome_enhanced version 609 (cuda23) in slot 3
12/28/2012 6:26:13 PM | SETI@home | Started upload of 07oc12ah.10878.22562.6.10.76_1_0
12/28/2012 6:26:26 PM | SETI@home | Computation for task 01au12ab.23909.24895.10.10.9_2 finished
12/28/2012 6:26:26 PM | SETI@home | Starting task 07oc12af.5577.9065.140733193388039.10.69_0 using setiathome_enhanced version 609 (cuda23) in slot 3
12/28/2012 6:26:28 PM | SETI@home | Started upload of 01au12ab.23909.24895.10.10.9_2_0
12/28/2012 6:26:35 PM |  | Project communication failed: attempting access to reference site
12/28/2012 6:26:35 PM | SETI@home | Temporarily failed upload of 07oc12ah.10878.22562.6.10.76_1_0: connect() failed
12/28/2012 6:26:35 PM | SETI@home | Backing off 3 min 46 sec on upload of 07oc12ah.10878.22562.6.10.76_1_0
12/28/2012 6:26:36 PM |  | Internet access OK - project servers may be temporarily down.
12/28/2012 6:26:50 PM |  | Project communication failed: attempting access to reference site
12/28/2012 6:26:50 PM | SETI@home | Temporarily failed upload of 01au12ab.23909.24895.10.10.9_2_0: connect() failed
12/28/2012 6:26:50 PM | SETI@home | Backing off 3 min 19 sec on upload of 01au12ab.23909.24895.10.10.9_2_0
12/28/2012 6:26:52 PM |  | Internet access OK - project servers may be temporarily down.


:-(
ID: 1321052 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1321054 - Posted: 28 Dec 2012, 23:38:51 UTC - in response to Message 1321047.  

It took a few requests but eventually got a few GPU tasks and they all came in at >50kbs.

10-20kB/s here at the moment.

Now with uploads 1kB/s is doing well (when it does eventually go through).

Thats cause you live in the middle of nowhere, or at least can see nowhere from there.

My cousin keeps me informed of the actions of your ISP's and the telco's over there he is not very impressed having lived in the UK, Boston and southern California.
ID: 1321054 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1321065 - Posted: 29 Dec 2012, 0:49:01 UTC - in response to Message 1321054.  

My cousin keeps me informed of the actions of your ISP's and the telco's over there he is not very impressed having lived in the UK, Boston and southern California.

We're not overly thrilled with them either, but they do have the population density & total numbers argument on their side.
A landmass the area of mainland USA, with a total population that's not even 3 times that of London (22.6 million v 8.2 million).
Grant
Darwin NT
ID: 1321065 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1321080 - Posted: 29 Dec 2012, 1:28:04 UTC

I have not seen the Seti servers so totally screwed in a long while.
I mean, we have a trifecta going here.
Uploads, downloads, AND scheduler requests all nearly impossible.

Makes me wonder if we have a dang DOS attack on the servers going again.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1321080 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1321082 - Posted: 29 Dec 2012, 1:37:44 UTC - in response to Message 1321080.  

I have not seen the Seti servers so totally screwed in a long while.
I mean, we have a trifecta going here.
Uploads, downloads, AND scheduler requests all nearly impossible.

Makes me wonder if we have a dang DOS attack on the servers going again.


Perhaps just busy. I had an AP upload and two AP downloads finished within the last 10 minutes.
ID: 1321082 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.