Panic Mode On (80) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 25 · Next
Author Message
Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,897,113
RAC: 2,403
United States
Message 1330969 - Posted: 24 Jan 2013, 23:30:43 UTC - in response to Message 1330824.

It's ironic no sooner as I posted my message below, all tasks reported.

Hey, glad to see I'm not the only one that happens to.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Lionel
Send message
Joined: 25 Mar 00
Posts: 576
Credit: 236,081,051
RAC: 230,978
Australia
Message 1330982 - Posted: 25 Jan 2013, 0:42:56 UTC - in response to Message 1330969.

about 100 shorties just came in on one box and the other is getting them as well now ... looks like things are about to get turbulent ...
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2291
Credit: 8,815,382
RAC: 4,065
United States
Message 1331024 - Posted: 25 Jan 2013, 3:50:06 UTC

About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that.

Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot.

Or even like it was for a while there early last year.. AP would not be issued out by the scheduler except for during certain 4-hour blocks. You would go 4 hours with absolutely no APs going out at all, and then in the next 4 hours, only MB-resends would go out, but no new ones.. it was just all AP for those 4 hours.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,610,037
RAC: 47,413
Australia
Message 1331066 - Posted: 25 Jan 2013, 6:33:02 UTC - in response to Message 1331024.
Last modified: 25 Jan 2013, 7:12:50 UTC

One system out of GPU work, the other one running out of GPU & CPU work.
Apart from the odd abberation, Scheduler requests just result in "Couldn't connect to server" messages.


EDIT- if only i had gotten home from work & posted about the issues sooner. Inbound network traffic has surged, i'm now downloading work...
The Scheduler would appear to be alive again. It takes at least a couple of minutes, but it's possible to connect.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,610,037
RAC: 47,413
Australia
Message 1331076 - Posted: 25 Jan 2013, 7:36:41 UTC - in response to Message 1331066.


...and now that the Scheduler is working again, the network pipes are fully clogged & downloads have gone from almost 10kB/s to less than 1.
____________
Grant
Darwin NT.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,941
RAC: 250
Germany
Message 1331092 - Posted: 25 Jan 2013, 8:38:34 UTC - in response to Message 1331024.

About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that.

Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot.

The amount of splitters seems to be be OK, MB is splitted at about the same speed as AP, better than that you won't get it. That avoids the situation we had before: few days of intensive AP splitting and lots of problems with downloads and after few days with bandwidth usage of about 70%. Now with a more or less constant ratio of available MB and AP WUs, all they need to do is to slow down the feeder a bit, so it refills the sheduler queue not as often as it is doing now.
____________
.

Mark FiskeProject donor
Send message
Joined: 15 Aug 11
Posts: 713
Credit: 7,392,921
RAC: 0
United States
Message 1331149 - Posted: 25 Jan 2013, 15:21:27 UTC

Don't believe I've seen this one before...anybody else?

1/25/2013 7:18:17 AM | SETI@home | Scheduler request completed: got 0 new tasks
1/25/2013 7:18:17 AM | SETI@home | No tasks sent
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for Astropulse v505
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for SETI@home v7
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for AstroPulse v6

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24551
Credit: 33,891,089
RAC: 24,258
Germany
Message 1331151 - Posted: 25 Jan 2013, 15:24:54 UTC - in response to Message 1331149.

Don't believe I've seen this one before...anybody else?

1/25/2013 7:18:17 AM | SETI@home | Scheduler request completed: got 0 new tasks
1/25/2013 7:18:17 AM | SETI@home | No tasks sent
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for Astropulse v505
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for SETI@home v7
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for AstroPulse v6


Sure this happens.
Usually the feeder is just empty.

____________

Mark FiskeProject donor
Send message
Joined: 15 Aug 11
Posts: 713
Credit: 7,392,921
RAC: 0
United States
Message 1331157 - Posted: 25 Jan 2013, 15:36:50 UTC - in response to Message 1331151.

Figured it to be something like that, just hadn't seen the specificity before...

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,286,286
RAC: 3,826
United States
Message 1331187 - Posted: 25 Jan 2013, 16:55:15 UTC - in response to Message 1331157.

Seems to me that folks should simply take the hint, suspend processing on SETI and work on other projects until SETI is running moderately healthy once again.

Of course part of this would be some folks migrating and staying elsewhere as it is apparent that given available resources, the project can not reliably feed and take care of its extensive user community.

I've run Seti since the old days -- this April will be my 13th anniversary with the project. Sometimes it is close to my lead CPU centric project (I find running GPU's for SETI simply participates in the most problematic aspect of the project). These days, with the 'on again, off again' functionality here, it has dropped below some other worthy projects -- for me, those include Einstein, Rosetta, World Grid, POEM, and others.

So for now, I'm in 'suspend' mode here. Once the I/O issues are partially fixed, I'll shift to No New Work to clear existing work. After that, well we shall see whether SETI can handle the load it has.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,610,037
RAC: 47,413
Australia
Message 1331309 - Posted: 25 Jan 2013, 22:06:52 UTC - in response to Message 1331187.


Looks like the Scheduler working was just a passing fad- inbound network traffic has been up & down all night. It's about 50/50 at the moment whether you'll get a response from a Scheduler request or a "Couldn't connect to server" message.
____________
Grant
Darwin NT.

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,286,286
RAC: 3,826
United States
Message 1331342 - Posted: 25 Jan 2013, 23:20:27 UTC

Well, it is almost the weekend. Next week is another week.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2291
Credit: 8,815,382
RAC: 4,065
United States
Message 1331479 - Posted: 26 Jan 2013, 6:35:19 UTC
Last modified: 26 Jan 2013, 6:36:19 UTC

Whoa. Just looked in my message log:

2013-01-26 01:19:16 SETI@home Started download of ap_25my12aa_B1_P1_00000_20130125_31783.wu
2013-01-26 01:26:25 SETI@home Finished download of ap_25my12aa_B1_P1_00000_20130125_31783.wu


429 seconds for an average of 19.1KB/sec.. and it ran from start to finish without stalling, and not using a proxy.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 249
Credit: 107,300,710
RAC: 74,736
India
Message 1331490 - Posted: 26 Jan 2013, 7:31:00 UTC

Cannot report the completed tasks from this part of the world since the past couple of hours. All requests are timing out on all my rigs.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2291
Credit: 8,815,382
RAC: 4,065
United States
Message 1331491 - Posted: 26 Jan 2013, 7:54:51 UTC

And actually, upon closer inspection, I have gotten 22 APs in the past ~24 hours, and they have all come through fast, with no errors/time-outs/stalls, and without the use of a proxy.


Scheduler contact on the other hand.. that's a different story.

Dozens of:

2013-01-26 02:45:08 SETI@home Scheduler request failed: Failure when receiving data from the peer
2013-01-26 02:51:31 SETI@home Scheduler request failed: Couldn't connect to server


in the past... ~16 hours. Started acting up around 1630utc according to my logs.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,610,037
RAC: 47,413
Australia
Message 1331507 - Posted: 26 Jan 2013, 9:40:18 UTC - in response to Message 1331491.


Yeah, looks like the Scheduler died again, about 6 hours ago.
The very occasional requst results in work, but it's mostly "Couldn't connect to server" messages. Should be out of work in a few more hours.
____________
Grant
Darwin NT.

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11998
Credit: 14,659,228
RAC: 12,191
United States
Message 1331576 - Posted: 26 Jan 2013, 15:00:42 UTC - in response to Message 1331507.

I dunno, guys. The crickets show no dips in the green since about 2pm PST Friday. The blue line ramped down a bit in the evening, but didn't bottom out. And my active crunchers made contact 3 hours ago for one and 5 minutes ago for the other.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Ron
Volunteer tester
Send message
Joined: 24 Aug 99
Posts: 42
Credit: 34,542,964
RAC: 0
United States
Message 1331600 - Posted: 26 Jan 2013, 16:26:03 UTC
Last modified: 26 Jan 2013, 16:33:14 UTC

I got connected:
GTX690

572 SETI@home 1/26/2013 11:20:19 AM Reporting 103 completed tasks, requesting new tasks for CPU and NVIDIA
573 SETI@home 1/26/2013 11:21:08 AM Computation for task 16dc12aa.21994.18574.5.10.20_1 finished
574 SETI@home 1/26/2013 11:21:08 AM Starting task 16dc12aa.21994.18574.5.10.24_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 21
575 SETI@home 1/26/2013 11:21:10 AM Started upload of 16dc12aa.21994.18574.5.10.20_1_0
576 SETI@home 1/26/2013 11:21:14 AM Finished upload of 16dc12aa.21994.18574.5.10.20_1_0
577 SETI@home 1/26/2013 11:21:34 AM Scheduler request completed: got 0 new tasks
578 SETI@home 1/26/2013 11:21:34 AM No tasks sent
579 SETI@home 1/26/2013 11:21:34 AM No tasks are available for Astropulse v505
580 SETI@home 1/26/2013 11:21:34 AM No tasks are available for SETI@home v7
581 SETI@home 1/26/2013 11:21:34 AM No tasks are available for AstroPulse v6
582 SETI@home 1/26/2013 11:21:34 AM Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them

I was not very satisfying!

And 5 minutes later got 75 tasks that downloaded amazingly fast.
____________

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 12,064,572
RAC: 13,693
United Kingdom
Message 1331636 - Posted: 26 Jan 2013, 19:06:11 UTC
Last modified: 26 Jan 2013, 19:12:01 UTC

Cricket has plunged off the cliff.
Bits in and bits out at zero :-(

[edit]...and almost immediately all back up again :-)
____________

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Copyright © 2014 University of California