Message boards :
Number crunching :
Panic Mode On (42) Server problems
Message board moderation
Author | Message |
---|---|
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Okay, we have the answer, now what is the question. We did have an answer?! Ah guess we did with the whole worf going down thing. The question right now is not what the question is, but when the question will be. Soon I suppose. Come on splitters WORK! Traveling through space at ~67,000mph! |
Allie in Vancouver Send message Joined: 16 Mar 07 Posts: 3949 Credit: 1,604,668 RAC: 0 |
I am not terribly worried: I've got 4 AP units that BOINC thinks will take 200 hours (experience tells me 18 - 20 hours is more accurate). As a result, my computer isn't even asking for work. :P Pure mathematics is, in its way, the poetry of logical ideas. Albert Einstein |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Between 2 computers I have 328 Cuda Fermi WU's, 46 MB's, and 5 AP's. SO I'm good to go for a bit. Traveling through space at ~67,000mph! |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Okay, we have the answer, now what is the question. Searcher wrote in #41 |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
I hope all those who wanted the server bandwidth to be maxxed out are happy. We're back to the same old log jam and nobody getting anything except "Project Backoff" notices. So different from last weeks restart. Hope they still have the resends turned on otherwise it's ghost heaven again. T.A. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
I hope all those who wanted the server bandwidth to be maxxed out are happy. It will be interesting to see what ghost reports come around now. Yes, I am happy the bandwidth is maxxed...... Until proven otherwise. I had several rigs without anything to do when I got home from work about 2 hours ago. They have now all received some work..... And yes, the downloads take some retries and are coming through slowly, but they are coming through. Given the countless computers that are requesting work, I think that would be expected. I am still awaiting an answer to the question of whether the limited capabilities of the older server were to blame for creating most of the ghosts, or if it is an artifact of the bandwidth being maxxed out. I am not sure that has really been determined yet. This should be a good test. Meow. "Time is simply the mechanism that keeps everything from happening all at once." |
Blake Bonkofsky Send message Joined: 29 Dec 99 Posts: 617 Credit: 46,383,149 RAC: 0 |
Resends are definitely still on. I've gotten the messages a couple times so far. WU's are coming in faster than I can crunch them, so I'm happy :) |
Frizz Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 |
Hmmm ... lucky you. All I get is this here. Connection errors and/or no wu avail. messages: ... 16/12/2010 20:05:33 SETI@home Requesting new tasks for GPU 16/12/2010 20:06:20 Project communication failed: attempting access to reference site 16/12/2010 20:06:20 SETI@home Scheduler request failed: Failure when receiving data from the peer 16/12/2010 20:06:21 Internet access OK - project servers may be temporarily down. 16/12/2010 20:07:24 SETI@home Sending scheduler request: To fetch work. 16/12/2010 20:07:24 SETI@home Requesting new tasks for GPU 16/12/2010 20:08:14 SETI@home Scheduler request completed: got 0 new tasks 16/12/2010 20:08:14 SETI@home Message from server: No work sent 16/12/2010 20:08:14 SETI@home Message from server: No work is available for Astropulse v5 work. ... |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Did you set "If no work for selected applications is available, accept work from other applications?" to "yes"? |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
It will be interesting to see what ghost reports come around now. Two and a half hours to download 20 W/U's is not an efficient use of bandwidth. not to mention the http errors, bad checksums and download speeds of less than 1KBs (if they're running at all). Certainly my rigs have "some" units, but none of them have enough to run all processors at once. This is a distinct change from last week's restart when all rigs soon had enough units to keep them busy. They may not have had a full cache but units were getting through much faster and downloads were running ahead of the processors. What's the point of having a few units to crunch, then going back to square one when they've finished ? There were just as many computers chasing work last week and they were all being fed much more efficiently than is happening now. If this is a test, the system is failing. edit Crunching time = 10 minutes, download time = 15 to 20 minutes. Not good /edit T.A. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
It will be interesting to see what ghost reports come around now. All of which goes to prove that Ned Ludd knew what he was talking about when he said that the download bandwidth needed to be more intelligently managed. Having it maxxed out is not an intelligent solution. Sure, the error correction and retry protocols will get there in the end - but all those retry packets merely add to the congestion, and reduce the space available for real data. We probably got more actual work through the pipe last week at 60% - 80%, than we are doing right now at 93%. |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
exactly Richard. Having the splitters able to work full bore is great, but there needs to be some restriction of the download server ports, both on number and speed to bring the bandwidth used OFF of peak. It does not need to be far below, but below. Janice |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
We probably got more actual work through the pipe last week at 60% - 80%, than we are doing right now at 93%. Most definitely, after this amount of uptime last week, my rigs had 3 or 4 times more work on board than they do now. Also the new allocations were downloading at record speeds, leaving the pipes clear for others to get their new units. I was not the only person to notice this, there were many comments on the boards at the time as to how freely the work was flowing. T.A. |
W-K 666 Send message Joined: 18 May 99 Posts: 19407 Credit: 40,757,560 RAC: 67 |
We probably got more actual work through the pipe last week at 60% - 80%, than we are doing right now at 93%. Totally agree. I've got an AP task d/loading, it's at 1.67/8.00 MB 20.82% it was sent at 11:18:05 UTC. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
So... Is there such a thing as a bandwidth limiting switch/router? Or must such a thing be done with server settings? "Time is simply the mechanism that keeps everything from happening all at once." |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Thinking out loud.... Isn't there a server setting limiting the number of concurrent connections that can be made at any given time? Could the bandwidth be throttled with that mechanism? "Time is simply the mechanism that keeps everything from happening all at once." |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
So... It could be limited in the network, but the place with the most control would be in the servers. The download servers would be the obvious place to do it, since they have the vast majority of the least time dependent traffic. at the (excuse me if this is out-dated, but same princples should apply) tty or ttyc's you can restrict each connection to X speed, and Y number of connections by the number that are made available. We see when things are congested, often 2-16Kbps throughput, IF we do not fail. If each port is "wide open" and there are plenty, the bandwidth can be jam packed, exceeding the available capabilities of the main connection. Try to put 120MBPS through a 90MBPS capability, and you have a traffic jam. Everything fights for space, and no one gets much of anywhere. For example if there were only 1500 connection ports, at 56K each, some people would get "backed off" but those that did get through would get a full 56K throughput. This is the same principle of metering traffic getting onto a congested freeway. The actual number of ports that can be supported and what speeds to set them at would take some trial and error. But "as fast as you can" is counter productive. You need to leave some room for other communications. I was trying to figure out another way to do it, and it is possible that download servers could be given 3-4 10MBPS network cards, but this too could cause problems. So the cleanest, cheapest, and most controllable point would be the download servers themselves. Janice |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
When I got home from work this morning my main machine was struggling to download some new work, when I checked the messages these were al re-sends, I had no ghosts 12 Hrs previous so these must have ghoasted on the first new work request. I have now reduced my cache and will try to survive untill the download stampede reduces to a more sane level. I am still clearing Einstien on my CPU so should be able to last a couple of days. It definately looked a lot better when the splitters were not generating enough work to saturate the download pipe, I know people want work but if the the splitters were slowed down a bit untill such a point that work started to accumilate without the pipe becoming saturated downloads would probabally be a lot smoother. Kevin |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Since this time we started splitting AP and MB together from a standing start, we can see that AP gets through the tapes quicker than MB. So a slight proportional reduction in the number of AP splitters might help, by keeping a few of those big 8MB WUs out of the pipe. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.