Panic Mode On (77) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 23 · Next
Author Message
Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1290795 - Posted: 3 Oct 2012, 15:58:33 UTC - in response to Message 1290786.

Patience folks, patience.

I'm working on a few new bits of hardware that will help out our downloads sticking issues. We should be launching these shortly I hope once I get the specs approved.

There are several of us 'behind the scenes' types who also crunch and know what issues we're facing. We're trying to address and fix these as soon as possible but due to a lack of manpower in the lab, things are necessarily slow going.

One piece of hardware I'm hoping to get will be a load balancer which will more evenly distribute tasks to our download and upload servers. Currently as people have noticed, you will ping off of a dead or overloaded server which constantly mucks up the works. In the near future I'm hoping we can implement a very beefy balancer with proper software and hardware which will alleviate this issue significantly.

If you really want to help us fix these issues, keep reporting your performance and consider donating to the project in whatever capacity you're comfortable with.

In short, patience. We've switched over one server's tasks (mostly) to George and have switched from Apache (which has caused us a load of issues) to Nginx. Add the time spent switching all of our processes over to a new switch, dealing with a crashy server and the push to V7, you can see where our small staff is overwhelmed. Our DL/UL issues will be sorted soon.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8819
Credit: 53,532,475
RAC: 46,321
United Kingdom
Message 1290805 - Posted: 3 Oct 2012, 16:20:27 UTC - in response to Message 1290795.

Patience folks, patience.

I'm working on a few new bits of hardware that will help out our downloads sticking issues. We should be launching these shortly I hope once I get the specs approved.

There are several of us 'behind the scenes' types who also crunch and know what issues we're facing. We're trying to address and fix these as soon as possible but due to a lack of manpower in the lab, things are necessarily slow going.

One piece of hardware I'm hoping to get will be a load balancer which will more evenly distribute tasks to our download and upload servers. Currently as people have noticed, you will ping off of a dead or overloaded server which constantly mucks up the works. In the near future I'm hoping we can implement a very beefy balancer with proper software and hardware which will alleviate this issue significantly.

If you really want to help us fix these issues, keep reporting your performance and consider donating to the project in whatever capacity you're comfortable with.

In short, patience. We've switched over one server's tasks (mostly) to George and have switched from Apache (which has caused us a load of issues) to Nginx. Add the time spent switching all of our processes over to a new switch, dealing with a crashy server and the push to V7, you can see where our small staff is overwhelmed. Our DL/UL issues will be sorted soon.

Great work on the download servers - they seem to be running much better today, with nginx.

But what goes down, must come back up again, and is seems Bruno is having difficulty keeping up...

03-Oct-2012 16:23:35 [SETI@home] [http] [ID#45620] Info: Trying 208.68.240.16...
03-Oct-2012 16:23:56 [SETI@home] [http] [ID#45620] Info: Timed out
03-Oct-2012 16:23:56 [SETI@home] [http] [ID#45620] Info: Failed connect to setiboincdata.ssl.berkeley.edu:80; No error
03-Oct-2012 16:23:56 [SETI@home] [http] [ID#45620] Info: Closing connection #1
03-Oct-2012 16:23:56 [SETI@home] [http] HTTP error: Couldn't connect to server

03-Oct-2012 16:27:43 [SETI@home] [http] [ID#45635] Info: Trying 208.68.240.16...
03-Oct-2012 16:28:04 [SETI@home] [http] [ID#45635] Info: Timed out
03-Oct-2012 16:28:04 [SETI@home] [http] [ID#45635] Info: Failed connect to setiboincdata.ssl.berkeley.edu:80; No error
03-Oct-2012 16:28:04 [SETI@home] [http] [ID#45635] Info: Closing connection #0
03-Oct-2012 16:28:04 [SETI@home] [http] HTTP error: Couldn't connect to server

03-Oct-2012 16:28:06 [SETI@home] [http] [ID#45636] Info: Trying 208.68.240.16...
03-Oct-2012 16:28:27 [SETI@home] [http] [ID#45636] Info: Timed out
03-Oct-2012 16:28:27 [SETI@home] [http] [ID#45636] Info: Failed connect to setiboincdata.ssl.berkeley.edu:80; No error
03-Oct-2012 16:28:27 [SETI@home] [http] [ID#45636] Info: Closing connection #0
03-Oct-2012 16:28:27 [SETI@home] [http] HTTP error: Couldn't connect to server

03-Oct-2012 16:29:51 [SETI@home] [http] [ID#45644] Info: Trying 208.68.240.16...
03-Oct-2012 16:30:13 [SETI@home] [http] [ID#45644] Info: Timed out
03-Oct-2012 16:30:13 [SETI@home] [http] [ID#45644] Info: Failed connect to setiboincdata.ssl.berkeley.edu:80; No error
03-Oct-2012 16:30:13 [SETI@home] [http] [ID#45644] Info: Closing connection #0
03-Oct-2012 16:30:13 [SETI@home] [http] HTTP error: Couldn't connect to server

03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Info: Trying 208.68.240.16...
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Info: Connected to setiboincdata.ssl.berkeley.edu (208.68.240.16) port 80 (#1)
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Info: Connected to setiboincdata.ssl.berkeley.edu (208.68.240.16) port 80 (#1)
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server: POST /sah_cgi/file_upload_handler HTTP/1.1
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server: User-Agent: BOINC client (windows_intelx86 7.0.36)
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server: Host: setiboincdata.ssl.berkeley.edu
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server: Accept: */*
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server: Accept-Encoding: deflate, gzip
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server: Content-Type: application/x-www-form-urlencoded
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server: Content-Length: 285
03-Oct-2012 16:31:40 [SETI@home] [http] [ID#45650] Sent header to server:
03-Oct-2012 16:32:27 [SETI@home] [http] [ID#45650] Info: Recv failure: Connection was reset
03-Oct-2012 16:32:27 [SETI@home] [http] [ID#45650] Info: Closing connection #1
03-Oct-2012 16:32:27 [SETI@home] [http] HTTP error: Failure when receiving data from the peer
03-Oct-2012 16:32:28 [SETI@home] Temporarily failed upload of 25jl12ab.14581.6611.6.10.139_0_0: transient HTTP error

03-Oct-2012 16:32:29 [SETI@home] [http] [ID#45654] Info: Trying 208.68.240.16...
03-Oct-2012 16:32:50 [SETI@home] [http] [ID#45654] Info: Timed out
03-Oct-2012 16:32:50 [SETI@home] [http] [ID#45654] Info: Failed connect to setiboincdata.ssl.berkeley.edu:80; No error
03-Oct-2012 16:32:50 [SETI@home] [http] [ID#45654] Info: Closing connection #0

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2906
Credit: 218,687,193
RAC: 13,056
United States
Message 1290814 - Posted: 3 Oct 2012, 16:37:55 UTC - in response to Message 1290795.



Patience folks, patience.



Have you ever been a passenger in a car and noticed an ugly situation developing that you aren't sure the driver has seen?

It's much, much easier to relax if the driver just mutters, "I see 'em."

Thank you for reporting that the lab is aware.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 645
Credit: 147,693,634
RAC: 47,501
United Kingdom
Message 1290863 - Posted: 3 Oct 2012, 18:45:02 UTC - in response to Message 1290805.

Great work on the download servers - they seem to be running much better today, with nginx.

About the time you posted that, I was having great difficulties with uploads, but downloads were very slick (given that the particular machine is currently only getting 11 WUs per request). A few hours later, things are sticking both ways...
____________

Profile Area 51
Avatar
Send message
Joined: 31 Jan 04
Posts: 965
Credit: 42,193,520
RAC: 0
United Kingdom
Message 1290864 - Posted: 3 Oct 2012, 18:45:04 UTC

I seem to get bogged down downloading - and uploading! I get 2,000 shorties, churn through them in quite short order, and spend the next 4 days trying to upload the results, and of course, I have to upload & report before I can anything new. One thing I have noticed though. When something goes up/down the pipe - it flies - really flies. There isn't even time for a progress bar during the transmission.

____________

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1290888 - Posted: 3 Oct 2012, 19:44:28 UTC - in response to Message 1290864.

The plan right now, pending specs, is building a dedicated upload and download server soon. This one will be specifically slated for nothing but replacing our two remaining old servers. Combine that with a load balancer, the new switch, George and the jbod array, we should be heading in the right direction.

Now if only I had a large stack of money for more bandwidth. One day maybe.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

MikeN
Send message
Joined: 24 Jan 11
Posts: 303
Credit: 32,995,727
RAC: 8,889
United Kingdom
Message 1290921 - Posted: 3 Oct 2012, 21:03:28 UTC

Well uploads have suddenly picked up dramatically. Cricket graph shows them doubling over the last hourand mine have gone through. However, schedular now hard to contact and I have started getting large numbers of timed out no response errors (>50 so far). Looks like the guys might have turned off 'resend ghost units' to free up some bandwdth or memory storage.
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 645
Credit: 147,693,634
RAC: 47,501
United Kingdom
Message 1290922 - Posted: 3 Oct 2012, 21:05:48 UTC - in response to Message 1290888.

Now if only I had a large stack of money for more bandwidth. One day maybe.

I had an idle thought -- I don't remember exactly what the difficulty is in getting a 1 Gbps link down to the campus boundary, but I was wondering if there were a parallel unused "dark fibre" to the existing 100 Mbps link that could be channel-bonded to it to give 200 Mbps. "We" (the UK LCG community) made heavy use of such technology with multiple 1 Gbps links in our data centres until a recent Government windfall enabled most of us to upgrade to 10 Gbps links...
____________

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 25190
Credit: 34,786,155
RAC: 20,763
Germany
Message 1290931 - Posted: 3 Oct 2012, 21:37:53 UTC - in response to Message 1290922.
Last modified: 3 Oct 2012, 21:38:30 UTC

Now if only I had a large stack of money for more bandwidth. One day maybe.

I had an idle thought -- I don't remember exactly what the difficulty is in getting a 1 Gbps link down to the campus boundary, but I was wondering if there were a parallel unused "dark fibre" to the existing 100 Mbps link that could be channel-bonded to it to give 200 Mbps. "We" (the UK LCG community) made heavy use of such technology with multiple 1 Gbps links in our data centres until a recent Government windfall enabled most of us to upgrade to 10 Gbps links...


Eric mentioned once biggest part is political IIRC.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5946
Credit: 62,416,665
RAC: 39,063
Australia
Message 1290934 - Posted: 3 Oct 2012, 21:43:11 UTC - in response to Message 1290691.

I contacted the PS because as they are SETI sponsors so they will have ability to contact SETI directly to alert them to the current problem. I dont think anyone in the forum has a direct line to SETI.

You don't appear to understand.
The Plantetary Socieity has no relationship whatsoever with Seti@Home.
Niether does the Seti Institute.

Seti@home is a project being run in the Space Science Laboratory & the Universtiy of Berkeley.
It has no direct affiliations with any other Seti organisations.

____________
Grant
Darwin NT.

Kevin Benfield
Send message
Joined: 29 Dec 03
Posts: 39
Credit: 16,793,354
RAC: 9,189
United Kingdom
Message 1290936 - Posted: 3 Oct 2012, 21:48:19 UTC - in response to Message 1289368.

Only getting GPU units, not getting any at all for CPU, been like this for a couple of days.
Is anyone getting CPU units ?
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5946
Credit: 62,416,665
RAC: 39,063
Australia
Message 1290937 - Posted: 3 Oct 2012, 21:51:11 UTC - in response to Message 1290888.
Last modified: 3 Oct 2012, 21:56:37 UTC

The plan right now, pending specs, is building a dedicated upload and download server soon. This one will be specifically slated for nothing but replacing our two remaining old servers. Combine that with a load balancer, the new switch, George and the jbod array, we should be heading in the right direction.


Will this also help with the Scheduler issues?
"Project has no tasks available" & "No tasks sent" have been common responses to work requests for a long time now. But over the last few weeks "Timeout was reached" has become very common, often 4 in 5 resposes to work requests.
And now that i've been able to upload all that backlogged work that is the only response i've been getting on one of my machines as i try to report 75 tasks & get new work. My other machine has been getting some work, but it's mostly "No tasks sent" with the odd "Project has no tasks available".

EDIT- oh, i forgot the "Couldn't connect to server" error that occasionally (but more & more frequently) pops up when trying to report or request new work.
____________
Grant
Darwin NT.

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2906
Credit: 218,687,193
RAC: 13,056
United States
Message 1290938 - Posted: 3 Oct 2012, 21:51:46 UTC - in response to Message 1290934.
Last modified: 3 Oct 2012, 21:52:38 UTC



The Plantetary Socieity has no relationship whatsoever with Seti@Home.
Niether does the Seti Institute.





You can certainly understand the confusion, then, since The Planetary Society gets top billing:

SETI Sponsor Page

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5946
Credit: 62,416,665
RAC: 39,063
Australia
Message 1290939 - Posted: 3 Oct 2012, 21:51:59 UTC - in response to Message 1290936.

Only getting GPU units, not getting any at all for CPU, been like this for a couple of days.
Is anyone getting CPU units ?

That will happen untill the GPU cache is full, then you will start getting work for the CPU again.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5946
Credit: 62,416,665
RAC: 39,063
Australia
Message 1290940 - Posted: 3 Oct 2012, 21:54:52 UTC - in response to Message 1290938.

You can certainly understand the confusion, then, since The Planetary Society gets top billing:

It's easier to say there is no relationship, than to try to point out the difference between being a founding sponsor & actually being involved in the operation of the project.
____________
Grant
Darwin NT.

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 897
Credit: 49,147,182
RAC: 44,090
Denmark
Message 1290947 - Posted: 3 Oct 2012, 22:15:32 UTC

Wouldn't a WU limit help big time right now?

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5946
Credit: 62,416,665
RAC: 39,063
Australia
Message 1290953 - Posted: 3 Oct 2012, 22:31:52 UTC - in response to Message 1290947.


Wouldn't a WU limit help big time right now?

Not that much.
____________
Grant
Darwin NT.

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7195
Credit: 29,134,745
RAC: 34,695
United Kingdom
Message 1290954 - Posted: 3 Oct 2012, 22:32:31 UTC - in response to Message 1290947.
Last modified: 3 Oct 2012, 22:33:31 UTC

Wouldn't a WU limit help big time right now?

Probably but the "big" crunchers wouldn't like it. Personally for my small part I have set NNT and will wait out the storm. I was hoping to hit 10 million soon, but that can wait.

Edit - Actually looking at the numbers I might make it after all
____________


Today is life, the only life we're sure of. Make the most of today.

dpatter3
Send message
Joined: 3 Sep 03
Posts: 8
Credit: 11,901,137
RAC: 3,141
United States
Message 1290955 - Posted: 3 Oct 2012, 22:37:31 UTC

After trying all day to upload 80 WU they suddenly went through in the last 30 minutes. Don't know if the "powers that be" tweaked something but all is well.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8819
Credit: 53,532,475
RAC: 46,321
United Kingdom
Message 1290960 - Posted: 3 Oct 2012, 22:51:44 UTC - in response to Message 1290934.

I contacted the PS because as they are SETI sponsors so they will have ability to contact SETI directly to alert them to the current problem. I dont think anyone in the forum has a direct line to SETI.

You don't appear to understand.
The Plantetary Socieity has no relationship whatsoever with Seti@Home.
Niether does the Seti Institute.

Seti@home is a project being run in the Space Science Laboratory & the Universtiy of Berkeley.
It has no direct affiliations with any other Seti organisations.

Mind you, the University of California at Berkeley is host to a lot more generic SETI projects, over and above the specific SETI@home we work on here.

http://seti.berkeley.edu/

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Copyright © 2014 University of California