Panic Mode On (80) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 25 · Next
Author Message
Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 369
Credit: 2,486,678
RAC: 2,379
United States
Message 1330969 - Posted: 24 Jan 2013, 23:30:43 UTC - in response to Message 1330824.

It's ironic no sooner as I posted my message below, all tasks reported.

Hey, glad to see I'm not the only one that happens to.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Lionel
Send message
Joined: 25 Mar 00
Posts: 543
Credit: 198,830,350
RAC: 171,576
Australia
Message 1330982 - Posted: 25 Jan 2013, 0:42:56 UTC - in response to Message 1330969.

about 100 shorties just came in on one box and the other is getting them as well now ... looks like things are about to get turbulent ...
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2205
Credit: 8,036,481
RAC: 4,372
United States
Message 1331024 - Posted: 25 Jan 2013, 3:50:06 UTC

About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that.

Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot.

Or even like it was for a while there early last year.. AP would not be issued out by the scheduler except for during certain 4-hour blocks. You would go 4 hours with absolutely no APs going out at all, and then in the next 4 hours, only MB-resends would go out, but no new ones.. it was just all AP for those 4 hours.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,590,425
RAC: 44,210
Australia
Message 1331066 - Posted: 25 Jan 2013, 6:33:02 UTC - in response to Message 1331024.
Last modified: 25 Jan 2013, 7:12:50 UTC

One system out of GPU work, the other one running out of GPU & CPU work.
Apart from the odd abberation, Scheduler requests just result in "Couldn't connect to server" messages.


EDIT- if only i had gotten home from work & posted about the issues sooner. Inbound network traffic has surged, i'm now downloading work...
The Scheduler would appear to be alive again. It takes at least a couple of minutes, but it's possible to connect.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,590,425
RAC: 44,210
Australia
Message 1331076 - Posted: 25 Jan 2013, 7:36:41 UTC - in response to Message 1331066.


...and now that the Scheduler is working again, the network pipes are fully clogged & downloads have gone from almost 10kB/s to less than 1.
____________
Grant
Darwin NT.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 813
Credit: 1,502,174
RAC: 353
Germany
Message 1331092 - Posted: 25 Jan 2013, 8:38:34 UTC - in response to Message 1331024.

About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that.

Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot.

The amount of splitters seems to be be OK, MB is splitted at about the same speed as AP, better than that you won't get it. That avoids the situation we had before: few days of intensive AP splitting and lots of problems with downloads and after few days with bandwidth usage of about 70%. Now with a more or less constant ratio of available MB and AP WUs, all they need to do is to slow down the feeder a bit, so it refills the sheduler queue not as often as it is doing now.
____________
.

Mark Fiske
Send message
Joined: 15 Aug 11
Posts: 712
Credit: 7,392,463
RAC: 338
United States
Message 1331149 - Posted: 25 Jan 2013, 15:21:27 UTC

Don't believe I've seen this one before...anybody else?

1/25/2013 7:18:17 AM | SETI@home | Scheduler request completed: got 0 new tasks
1/25/2013 7:18:17 AM | SETI@home | No tasks sent
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for Astropulse v505
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for SETI@home v7
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for AstroPulse v6

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 22449
Credit: 29,470,603
RAC: 25,744
Germany
Message 1331151 - Posted: 25 Jan 2013, 15:24:54 UTC - in response to Message 1331149.

Don't believe I've seen this one before...anybody else?

1/25/2013 7:18:17 AM | SETI@home | Scheduler request completed: got 0 new tasks
1/25/2013 7:18:17 AM | SETI@home | No tasks sent
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for Astropulse v505
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for SETI@home v7
1/25/2013 7:18:17 AM | SETI@home | No tasks are available for AstroPulse v6


Sure this happens.
Usually the feeder is just empty.

____________

Mark Fiske
Send message
Joined: 15 Aug 11
Posts: 712
Credit: 7,392,463
RAC: 338
United States
Message 1331157 - Posted: 25 Jan 2013, 15:36:50 UTC - in response to Message 1331151.

Figured it to be something like that, just hadn't seen the specificity before...

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 11,473,825
RAC: 5,163
United States
Message 1331187 - Posted: 25 Jan 2013, 16:55:15 UTC - in response to Message 1331157.

Seems to me that folks should simply take the hint, suspend processing on SETI and work on other projects until SETI is running moderately healthy once again.

Of course part of this would be some folks migrating and staying elsewhere as it is apparent that given available resources, the project can not reliably feed and take care of its extensive user community.

I've run Seti since the old days -- this April will be my 13th anniversary with the project. Sometimes it is close to my lead CPU centric project (I find running GPU's for SETI simply participates in the most problematic aspect of the project). These days, with the 'on again, off again' functionality here, it has dropped below some other worthy projects -- for me, those include Einstein, Rosetta, World Grid, POEM, and others.

So for now, I'm in 'suspend' mode here. Once the I/O issues are partially fixed, I'll shift to No New Work to clear existing work. After that, well we shall see whether SETI can handle the load it has.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,590,425
RAC: 44,210
Australia
Message 1331309 - Posted: 25 Jan 2013, 22:06:52 UTC - in response to Message 1331187.


Looks like the Scheduler working was just a passing fad- inbound network traffic has been up & down all night. It's about 50/50 at the moment whether you'll get a response from a Scheduler request or a "Couldn't connect to server" message.
____________
Grant
Darwin NT.

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 11,473,825
RAC: 5,163
United States
Message 1331342 - Posted: 25 Jan 2013, 23:20:27 UTC

Well, it is almost the weekend. Next week is another week.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2205
Credit: 8,036,481
RAC: 4,372
United States
Message 1331479 - Posted: 26 Jan 2013, 6:35:19 UTC
Last modified: 26 Jan 2013, 6:36:19 UTC

Whoa. Just looked in my message log:

2013-01-26 01:19:16 SETI@home Started download of ap_25my12aa_B1_P1_00000_20130125_31783.wu
2013-01-26 01:26:25 SETI@home Finished download of ap_25my12aa_B1_P1_00000_20130125_31783.wu


429 seconds for an average of 19.1KB/sec.. and it ran from start to finish without stalling, and not using a proxy.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 247
Credit: 95,909,906
RAC: 80,473
India
Message 1331490 - Posted: 26 Jan 2013, 7:31:00 UTC

Cannot report the completed tasks from this part of the world since the past couple of hours. All requests are timing out on all my rigs.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2205
Credit: 8,036,481
RAC: 4,372
United States
Message 1331491 - Posted: 26 Jan 2013, 7:54:51 UTC

And actually, upon closer inspection, I have gotten 22 APs in the past ~24 hours, and they have all come through fast, with no errors/time-outs/stalls, and without the use of a proxy.


Scheduler contact on the other hand.. that's a different story.

Dozens of:

2013-01-26 02:45:08 SETI@home Scheduler request failed: Failure when receiving data from the peer
2013-01-26 02:51:31 SETI@home Scheduler request failed: Couldn't connect to server


in the past... ~16 hours. Started acting up around 1630utc according to my logs.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,590,425
RAC: 44,210
Australia
Message 1331507 - Posted: 26 Jan 2013, 9:40:18 UTC - in response to Message 1331491.


Yeah, looks like the Scheduler died again, about 6 hours ago.
The very occasional requst results in work, but it's mostly "Couldn't connect to server" messages. Should be out of work in a few more hours.
____________
Grant
Darwin NT.

N9JFE
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 9390
Credit: 11,992,009
RAC: 14,518
United States
Message 1331576 - Posted: 26 Jan 2013, 15:00:42 UTC - in response to Message 1331507.

I dunno, guys. The crickets show no dips in the green since about 2pm PST Friday. The blue line ramped down a bit in the evening, but didn't bottom out. And my active crunchers made contact 3 hours ago for one and 5 minutes ago for the other.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37453
Credit: 502,486,621
RAC: 570,337
United States
Message 1331589 - Posted: 26 Jan 2013, 15:44:32 UTC

I can tell you this.....
Comms are screwed.

Look, I have 9 rigs doing nothing but Seti.

Only two of them have managed to make comms requests in the last half hour.

And that means things are seriously mangled, and the sh#t is hitting the fan.
Big cow sh#t.

If I can't get through, nobody else can either.

I don't care anymore. The kitties have all the rigs on auto.....
If they cannot get the Seti kibble they truly want, they no longer get anything.
I hate starving kitties.

I wish for a return to the days when I had a 75,000 WU cache.
I used to be able to continue to crunch for weeks when the servers went down.

The limits have not accomplished much.....current situation proves that.

Give me back my bullets, Eric.
____________
******************
Just a kittyman kinda guy.

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Ron
Volunteer tester
Send message
Joined: 24 Aug 99
Posts: 42
Credit: 34,534,730
RAC: 8,075
United States
Message 1331600 - Posted: 26 Jan 2013, 16:26:03 UTC
Last modified: 26 Jan 2013, 16:33:14 UTC

I got connected:
GTX690

572 SETI@home 1/26/2013 11:20:19 AM Reporting 103 completed tasks, requesting new tasks for CPU and NVIDIA
573 SETI@home 1/26/2013 11:21:08 AM Computation for task 16dc12aa.21994.18574.5.10.20_1 finished
574 SETI@home 1/26/2013 11:21:08 AM Starting task 16dc12aa.21994.18574.5.10.24_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 21
575 SETI@home 1/26/2013 11:21:10 AM Started upload of 16dc12aa.21994.18574.5.10.20_1_0
576 SETI@home 1/26/2013 11:21:14 AM Finished upload of 16dc12aa.21994.18574.5.10.20_1_0
577 SETI@home 1/26/2013 11:21:34 AM Scheduler request completed: got 0 new tasks
578 SETI@home 1/26/2013 11:21:34 AM No tasks sent
579 SETI@home 1/26/2013 11:21:34 AM No tasks are available for Astropulse v505
580 SETI@home 1/26/2013 11:21:34 AM No tasks are available for SETI@home v7
581 SETI@home 1/26/2013 11:21:34 AM No tasks are available for AstroPulse v6
582 SETI@home 1/26/2013 11:21:34 AM Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them

I was not very satisfying!

And 5 minutes later got 75 tasks that downloaded amazingly fast.
____________

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 917
Credit: 9,762,185
RAC: 10,958
United Kingdom
Message 1331636 - Posted: 26 Jan 2013, 19:06:11 UTC
Last modified: 26 Jan 2013, 19:12:01 UTC

Cricket has plunged off the cliff.
Bits in and bits out at zero :-(

[edit]...and almost immediately all back up again :-)
____________

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Copyright © 2014 University of California