Panic Mode On (77) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · Next
Author Message
Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2355
Credit: 8,938,444
RAC: 4,048
United States
Message 1301282 - Posted: 2 Nov 2012, 15:45:44 UTC

Mark, I noticed something last night that I think you're talking about. I reported one completed AP and the website showed it as reported, but BOINC didn't get the memo and instead.. "timeout was reached." The next report cleared it up, but every time you talk to the scheduler, your client_state has to be sent. I imagine for the faster rigs, it is probably in the >1MB range, so I agree.. wasted bandwidth for scheduler time-outs.

Kind of like cramming 16MB of data through the pipe only to have it be 100% blanked.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile cov_routeProject donor
Avatar
Send message
Joined: 13 Sep 12
Posts: 310
Credit: 7,620,346
RAC: 1,230
Canada
Message 1301293 - Posted: 2 Nov 2012, 16:01:11 UTC

Now I have over 3000 phantom wu's up from about 2k last night. This is about 5x my normal caches size. Should I set NNT? Or just let it do its thing?

zoom314Project donor
Volunteer tester
Avatar
Send message
Joined: 30 Nov 03
Posts: 47119
Credit: 37,065,652
RAC: 4,044
United States
Message 1301297 - Posted: 2 Nov 2012, 16:08:36 UTC - in response to Message 1301202.

I have been monitoring my rig and have no issues.
Except Mark is partnered with me on of my inconclusives. I sure it is stuck in the upload problem.

I'm having less trouble uploading than reporting, something needs some computer exlax for the reporting...
____________
My Facebook, War Commander, 2015

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 840
Credit: 1,578,051
RAC: 55
Germany
Message 1301305 - Posted: 2 Nov 2012, 16:24:57 UTC - in response to Message 1301282.

(...) but every time you talk to the scheduler, your client_state has to be sent. I imagine for the faster rigs, it is probably in the >1MB range, so I agree.. wasted bandwidth for scheduler time-outs.

sched_request_setiathome.berkeley.edu.xml is send to the scheduler and should be quite a bit smaller than the client_state.xml since it doesn't contain all the information about all files, other projects and only very sparse information about all SETI tasks on that machine.
____________
.

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 112
Credit: 25,654,292
RAC: 8,542
United States
Message 1301320 - Posted: 2 Nov 2012, 17:09:22 UTC - in response to Message 1301299.

Mark, I suspect, however, that the kitties are not purring right now. ;)

By the way, I did do one thing to help alleviate the obviously gigantic mess developing between the SETI host and its clients. I suspended network activity on all my machines.

I always keep my caches maxed out to handle these emergencies, so I'm good-to-go for at least a week. It'll be awhile before any of my units time out. When the dust settles and traffic is back to normal, I'll just let my rigs report their stored results one machine at a time. :)

Wibble
Send message
Joined: 25 Nov 02
Posts: 4
Credit: 652,003
RAC: 432
United Kingdom
Message 1301354 - Posted: 2 Nov 2012, 18:29:08 UTC - in response to Message 1301112.
Last modified: 2 Nov 2012, 18:30:50 UTC

Maybe the last 20 hours or something no well scheduler contact. Always:
Scheduler request failed: Timeout was reached


*new tasks* was enabled.

I set *no new tasks* - and then 178 uploaded tasks were accepted from the scheduler server in a bunch (successful report).


That also worked for me, my last successful scheduler request was on the 30th, I just set 'no new tasks,' clicked 'update' and the request/report went through straight away.

I think I'll leave 'no new tasks' set for a while.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5941
Credit: 62,280,894
RAC: 35,989
Australia
Message 1301358 - Posted: 2 Nov 2012, 18:33:37 UTC - in response to Message 1301354.

That also worked for me, my last successful scheduler request was on the 30th, I just set 'no new tasks,' clicked 'update' and the request/report went through straight away.

I've tried it with NNT set, and without.
I think the only advantage of NNT set is you can click update as soon as you get the timeout response from the Scheduler without it complaining about it being too soon.


The fact is that overnight neither of my systems were able to get a reponse from the Scheduler that wasn't a timeout. Completed work to be reported piles up, and my caches get smaller & smaller.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5941
Credit: 62,280,894
RAC: 35,989
Australia
Message 1301392 - Posted: 2 Nov 2012, 20:18:24 UTC - in response to Message 1301358.


I hope they can sort this Scheduler out soon. With all the shorties in the system, and the inabilty to get work on more than 1 attempt in 20, i'm going to run out of work very quickly when almost everything that does get downloaded will be done in minutes.
____________
Grant
Darwin NT.

zoom314Project donor
Volunteer tester
Avatar
Send message
Joined: 30 Nov 03
Posts: 47119
Credit: 37,065,652
RAC: 4,044
United States
Message 1301403 - Posted: 2 Nov 2012, 20:40:38 UTC - in response to Message 1301396.


I hope they can sort this Scheduler out soon. With all the shorties in the system, and the inabilty to get work on more than 1 attempt in 20, i'm going to run out of work very quickly when almost everything that does get downloaded will be done in minutes.

Scheduler cannot do nothing without bandwidth to send and receive.
This needs to be addressed before you folks complain about much more.

Over and out.

Roger Wilco...
____________
My Facebook, War Commander, 2015

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5941
Credit: 62,280,894
RAC: 35,989
Australia
Message 1301404 - Posted: 2 Nov 2012, 20:40:44 UTC - in response to Message 1301396.

Scheduler cannot do nothing without bandwidth to send and receive.
This needs to be addressed before you folks complain about much more.

That would be so, if the problem is bandwidth related (i've posted a work around in the Wish list about that, if it is the case).
However in the past when there have been just as many shorties, after even longer outages, where the Scheduler has had no problems responding to requests. The sheer number of these timeouts is something that's only started over the last 4-8 weeks.
____________
Grant
Darwin NT.

Profile David Anderson (not *that* DA)Project donor
Avatar
Send message
Joined: 5 Dec 09
Posts: 111
Credit: 23,364,739
RAC: 4,665
United States
Message 1301432 - Posted: 2 Nov 2012, 21:50:59 UTC

One wonders if the incredible volume
of incoming data (as shown by cricket graph)
is not a manifestation of the
infamous 'buffer bloat'. Rather than
really a need for more bandwidth.

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7180
Credit: 29,017,861
RAC: 32,310
United Kingdom
Message 1301435 - Posted: 2 Nov 2012, 22:11:52 UTC
Last modified: 2 Nov 2012, 22:12:28 UTC

Well I have given up with SETI@Home and am just running this Intel(R) Celeron(R) CPU 2.53GHz [Family 15 Model 3 Stepping 4](1 processors) brought for £30 at my local computer fair just to keep an RAC here so I can post.

I have had no problems at all, because I do not demand masses of WU's and it takes time to return the results. As it should be.

I seriously believe that running machines with 4-6-8-12-16 processors and multi graphics cards is more than the system can cope with! The problem is us the crunchers.

My 10 CPU's and 4 GPU's are now happily crunching for other projects.
____________


Today is life, the only life we're sure of. Make the most of today.

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13157
Credit: 7,884,950
RAC: 13,726
United States
Message 1301438 - Posted: 2 Nov 2012, 22:20:33 UTC

Something is strange. I have zero issue with SetiBeta right now, but I can't report on main. As it is on the same communications link it isn't a link issue. It has to be an issue in the lab. Perhaps failed disk in an array?

____________

zoom314Project donor
Volunteer tester
Avatar
Send message
Joined: 30 Nov 03
Posts: 47119
Credit: 37,065,652
RAC: 4,044
United States
Message 1301440 - Posted: 2 Nov 2012, 22:23:41 UTC - in response to Message 1301438.

Something is strange. I have zero issue with SetiBeta right now, but I can't report on main. As it is on the same communications link it isn't a link issue. It has to be an issue in the lab. Perhaps failed disk in an array?

Me too, no reports...
____________
My Facebook, War Commander, 2015

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8801
Credit: 53,363,348
RAC: 41,763
United Kingdom
Message 1301443 - Posted: 2 Nov 2012, 22:31:22 UTC - in response to Message 1301432.

One wonders if the incredible volume
of incoming data (as shown by cricket graph)
is not a manifestation of the
infamous 'buffer bloat'. Rather than
really a need for more bandwidth.

Just checking that you are aware that the Cricket graphs are seen from the point of view of a mid-way router?

The green pixels are data leaving the lab, coming in to the router, and proceeding on its way out to us, the crunchers. That is, our downloads.

The blue line is data leaving the router, in the direction of the SSL lab on the hill. That is, our uploads.

I don't think there's an 'incredible volume of incoming data'.

chromespringerProject donor
Avatar
Send message
Joined: 3 Dec 05
Posts: 273
Credit: 24,545,402
RAC: 35,527
United States
Message 1301445 - Posted: 2 Nov 2012, 22:39:24 UTC

Have had 0 tasks since Thurs morning .. haven't been able to report completions or get new work on mach xxxx033 .. have had this response for almost two days now :(

11/2/2012 4:17:19 PM | SETI@home | Reporting 410 completed tasks, requesting new tasks for CPU and ATI
11/2/2012 4:23:01 PM | SETI@home | Scheduler request failed: Timeout was reached
11/2/2012 4:23:05 PM | | Project communication failed: attempting access to reference site
11/2/2012 4:23:06 PM | | Internet access OK - project servers may be temporarily down.
11/2/2012 4:27:33 PM | SETI@home | update requested by user
11/2/2012 4:27:37 PM | SETI@home | Sending scheduler request: Requested by user.
11/2/2012 4:27:37 PM | SETI@home | Reporting 410 completed tasks, requesting new tasks for CPU and ATI
11/2/2012 4:32:57 PM | SETI@home | Scheduler request failed: Timeout was reached
11/2/2012 4:33:00 PM | | Project communication failed: attempting access to reference site
11/2/2012 4:33:01 PM | | Internet access OK - project servers may be temporarily down.

____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3744
Credit: 48,777,915
RAC: 1,076
United States
Message 1301498 - Posted: 3 Nov 2012, 0:35:01 UTC

I managed to report 380+ units 25 at a time with NNT set, it just took a while.
____________

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Copyright © 2014 University of California