Panic Mode On (80) Server Problems?

Message boards : Number crunching : Panic Mode On (80) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 25 · Next

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1326673 - Posted: 11 Jan 2013, 14:31:16 UTC - in response to Message 1326670.  

Something's gone away....
Everything was still going fine about 5 hours ago when I went to bed. This morning I have a bunch of rigs that have dropped back to Einstein because they can't connect to the server to report and get new work.

Whazzup?

Looking over my logs for the night I see spurts of "Scheduler request failed: HTTP gateway timeout", but I often see that the bandwidth is maxed constantly pegged.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1326673 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1326675 - Posted: 11 Jan 2013, 14:36:45 UTC - in response to Message 1326673.  

Something's gone away....
Everything was still going fine about 5 hours ago when I went to bed. This morning I have a bunch of rigs that have dropped back to Einstein because they can't connect to the server to report and get new work.

Whazzup?

Looking over my logs for the night I see spurts of "Scheduler request failed: HTTP gateway timeout", but I often see that the bandwidth is maxed constantly pegged.

Seems to be not working as smoothly as it has been for a while...
My #1 rig took a bunch of update button pushing and about 20-25 attempts to connect to report. Mostly could not connect to server and a couple of partial connects with no response. Finally just now got through and got 100 GPU tasks in one shot.
But something is tying up scheduler comms badly again.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1326675 · Report as offensive
Sp@ceNv@der Project Donor
Avatar

Send message
Joined: 10 Jul 05
Posts: 41
Credit: 117,366,167
RAC: 152
Belgium
Message 1326679 - Posted: 11 Jan 2013, 14:51:00 UTC - in response to Message 1326675.  

Same here in Belgium .... can upload finished tasks, but can't report them nor get new work as long as reporting fails ...

here we go again lol :D
To boldly crunch ...
ID: 1326679 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1326681 - Posted: 11 Jan 2013, 14:56:05 UTC - in response to Message 1326679.  
Last modified: 11 Jan 2013, 15:01:48 UTC

Same here in Belgium .... can upload finished tasks, but can't report them nor get new work as long as reporting fails ...

here we go again lol :D

Yup.
You can't fool the kitties. When in a matter of hours 5 out of 9 rigs here all of a sudden have started running Einstein, something's gone away in the server closet again.
That means that they have run through their 100 task GPU cache, which takes at least a few of hours on the fast rigs, have not been able to replenish the cache, and the kitties have gone off sniffing for something else to do.

And if you look at the inbound traffic on the Cricket graph, the flow had been very smooth....now looks a bit ragged.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1326681 · Report as offensive
Tom*

Send message
Joined: 12 Aug 11
Posts: 127
Credit: 20,769,223
RAC: 9
United States
Message 1326694 - Posted: 11 Jan 2013, 15:35:45 UTC - in response to Message 1326415.  

Not quietly enough :-)

It means that the project has (very quietly) started running so smoothly that you've all got nothing better to panic about :P

LANDO is not running also errors on MB Channels

someone said we started getting a storm of shorties that with AP's is what usually throws us over the Network Performance Knee
ID: 1326694 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1326712 - Posted: 11 Jan 2013, 16:26:31 UTC

I just did an update on a baby host that only does about 1 CPU task a day. Report one, request zero, only three tasks listed as 'other results'. It still took 71 seconds to turn it round (successfully) - that's mighty slow.
ID: 1326712 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1326718 - Posted: 11 Jan 2013, 16:37:44 UTC - in response to Message 1326712.  
Last modified: 11 Jan 2013, 16:52:10 UTC

I just did an update on a baby host that only does about 1 CPU task a day. Report one, request zero, only three tasks listed as 'other results'. It still took 71 seconds to turn it round (successfully) - that's mighty slow.

Well, like I said Richard, you can't fool the kitties.
With 9 rigs running, all with GPUs, some fast, some slow, when I can see a trend in 5 of them going off to another project, nobody can tell me something has not changed. Been at this too long not to see the writing on the wall.

It's loosened up just a little bit in the last half an hour, but still not back to where it was last week.


EDIT...
And, LOL...
I must confess it drives me a bit mad when I seen my backup project, Einstein, download MBits worth of files with all speed and see reporting work requests go through in less than a second or two. (And I understand why this is, I just wish it could be so for Seti).
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1326718 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1326723 - Posted: 11 Jan 2013, 17:00:10 UTC
Last modified: 11 Jan 2013, 17:02:33 UTC

My logs started showing reporting problems starting around 5am EST. Uploads fine just scheduler reports.

Edit: And always as soon as I post here a request goes through...
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1326723 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1326747 - Posted: 11 Jan 2013, 18:31:01 UTC - in response to Message 1326723.  

My logs started showing reporting problems starting around 5am EST. Uploads fine just scheduler reports.

Edit: And always as soon as I post here a request goes through...


Well, it worked that one time... now back to not reporting unless I NNT.

"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1326747 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1326748 - Posted: 11 Jan 2013, 18:39:39 UTC - in response to Message 1326747.  

My logs started showing reporting problems starting around 5am EST. Uploads fine just scheduler reports.

Edit: And always as soon as I post here a request goes through...


Well, it worked that one time... now back to not reporting unless I NNT.

It's still a train wreck.
I have to get some more sleep sooner or later, so the update button shall have to rest. And then the asinine Boincmanager backoffs will take hold again.
As well as the asinine 100 WU GPU limits.

Whatever.

I have my rigs all pledged to 100% Seti service.
If the Seti servers are not up to that pledge, the kitties go elsewhere.
Not that I am happy about it.

I just don't have 24/7 to sit here and punch the buttons to avoid the dang backoffs.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1326748 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1326776 - Posted: 11 Jan 2013, 19:29:48 UTC

Cricket bits out still spiraling ever so slowly downwards. Hope someone can fix it all before the weekend curtain.

ID: 1326776 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1326781 - Posted: 11 Jan 2013, 19:32:40 UTC

I cant get a thing even with NNT set for all 3 computers.
[/quote]

Old James
ID: 1326781 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1326789 - Posted: 11 Jan 2013, 19:44:16 UTC - in response to Message 1326776.  

Cricket bits out still spiraling ever so slowly downwards. Hope someone can fix it all before the weekend curtain.

No worries here no more.
The kitties have all the rigs set to full time Seti.
When that does not happen for any reason, Einstein sets in on the zero workshare default mode.

I just don't sweat it anymore. I may bitch about it from time to time, but I realize, as one who has been here for over 12 years, it just all works out.

Or doesn't, as the case may be. I know what I want to happen, and when it doesn't, I go to sleep and dream about what might be.

These things are quite apparent to myself, and wish they would be to most Seti users. It just DON't work all the time. Most of the user's computers don't either.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1326789 · Report as offensive
nairb

Send message
Joined: 18 Mar 03
Posts: 201
Credit: 5,447,501
RAC: 5
United Kingdom
Message 1326849 - Posted: 11 Jan 2013, 22:22:36 UTC

still cannot report/get any w/u here yet. So I guess its not all fixed yet?

Nairb
ID: 1326849 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1326854 - Posted: 11 Jan 2013, 22:34:43 UTC - in response to Message 1326849.  


Still borked- inbound network traffic has dropped off considerably. Looks like problems started around 12 hours ago, but it became really bad about 2-3 hours ago.
"Scheduler request failed: Couldn't connect to server" is about the only response i'm getting. Occasionally i get some work, but not often enough to keep what little cache i've got. It's not so slowly dwindling.
Grant
Darwin NT
ID: 1326854 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1326877 - Posted: 11 Jan 2013, 23:47:46 UTC

Cricket still showing that "average bits out" is still declining slowly but steadily, so there must be a blockage in the pipe somewhere.
Call for DynaRod!

ID: 1326877 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1326962 - Posted: 12 Jan 2013, 5:14:46 UTC

Problems connecting to scheduler:
12/01/2013 07:11:16 | SETI@home | [sched_op] Fetching master file
12/01/2013 07:11:16 | SETI@home | Fetching scheduler list
12/01/2013 07:11:19 | SETI@home | [sched_op] Got master file; parsing
12/01/2013 07:11:19 | SETI@home | [sched_op] Found 1 scheduler URLs in master file
12/01/2013 07:11:19 | SETI@home | Master file download succeeded
12/01/2013 07:11:24 | SETI@home | [sched_op] Starting scheduler request
12/01/2013 07:11:24 | SETI@home | Sending scheduler request: To report completed tasks.
12/01/2013 07:11:24 | SETI@home | Reporting 28 completed tasks, not requesting new tasks
12/01/2013 07:11:24 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
12/01/2013 07:11:24 | SETI@home | [sched_op] NVIDIA work request: 0.00 seconds; 0.00 devices
12/01/2013 07:11:47 | SETI@home | Scheduler request failed: Couldn't connect to server
12/01/2013 07:11:47 | SETI@home | [sched_op] Deferring communication for 1 min 57 sec
12/01/2013 07:11:47 | SETI@home | [sched_op] Reason: Scheduler request failed

Still the cricket graph shows green.....
ID: 1326962 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1326975 - Posted: 12 Jan 2013, 5:54:16 UTC - in response to Message 1326962.  

Still the cricket graph shows green.....


The green is not the problem. Us crunchers reporting work are represented by the blue line. "Bits out" means out to Eric and pals. The green is "bits in" from Berkeley on their way to the world.
I know this is counter-intuitive but that's the way it works!


ID: 1326975 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1327024 - Posted: 12 Jan 2013, 7:04:07 UTC

My scheduler contacts are slow, but they go through about 90% of the time. The most recent one was over 60 seconds to acknowledge 3 completed tasks.

I'm still not really a fan of the project back-off mechanism, but last night when I had 11 APs that were all trying to download at the same time, I was getting way too sleepy to help them along. I decided to just leave it be and BOINC will make it work. Woke up 7 hours later and they had all finished just fine.. in addition to four more.

But it is the scheduler contacts mainly that could use some tweaking to try to make them a little more reliable and speedy. Or a bigger download pipe. Or maybe an off-site download mirror that has a >= gigabit link and the existing 100mbit pipe can be used solely for sending new data to the mirror. Just some ideas. I know the off-site thing has been mentioned before and the staff isn't too keen on the idea, but unless we can get our 100mbit limit increased to 150 or 200, it's just going to be painful headaches from here on out.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1327024 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1327048 - Posted: 12 Jan 2013, 8:20:32 UTC

Update on No New Tasks increases the retry timer....:(.. I have not been able to report WU for last 12 hours. Not getting any new WU either...
ID: 1327048 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.