Panic Mode On (42) Server problems

Message boards : Number crunching : Panic Mode On (42) Server problems
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 11 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1056506 - Posted: 16 Dec 2010, 4:31:26 UTC

Okay, we have the answer, now what is the question.

ID: 1056506 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1056513 - Posted: 16 Dec 2010, 4:56:13 UTC - in response to Message 1056506.  

Okay, we have the answer, now what is the question.


We did have an answer?! Ah guess we did with the whole worf going down thing. The question right now is not what the question is, but when the question will be. Soon I suppose. Come on splitters WORK!
Traveling through space at ~67,000mph!
ID: 1056513 · Report as offensive
Profile Allie in Vancouver
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 3949
Credit: 1,604,668
RAC: 0
Canada
Message 1056514 - Posted: 16 Dec 2010, 4:58:08 UTC

I am not terribly worried: I've got 4 AP units that BOINC thinks will take 200 hours (experience tells me 18 - 20 hours is more accurate).

As a result, my computer isn't even asking for work. :P
Pure mathematics is, in its way, the poetry of logical ideas.

Albert Einstein
ID: 1056514 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1056519 - Posted: 16 Dec 2010, 5:08:03 UTC

Between 2 computers I have 328 Cuda Fermi WU's, 46 MB's, and 5 AP's. SO I'm good to go for a bit.
Traveling through space at ~67,000mph!
ID: 1056519 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1056524 - Posted: 16 Dec 2010, 5:30:33 UTC - in response to Message 1056513.  

Okay, we have the answer, now what is the question.


We did have an answer?! Ah guess we did with the whole worf going down thing. The question right now is not what the question is, but when the question will be. Soon I suppose. Come on splitters WORK!


Searcher wrote in #41
I am just waiting till the next iteration when all our questions will be answered by Panic Mode on (42) which as we know is the ultimate answer.


ID: 1056524 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1056569 - Posted: 16 Dec 2010, 8:10:41 UTC

I hope all those who wanted the server bandwidth to be maxxed out are happy.

We're back to the same old log jam and nobody getting anything except "Project Backoff" notices. So different from last weeks restart.

Hope they still have the resends turned on otherwise it's ghost heaven again.

T.A.
ID: 1056569 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1056572 - Posted: 16 Dec 2010, 8:26:26 UTC - in response to Message 1056569.  

I hope all those who wanted the server bandwidth to be maxxed out are happy.

We're back to the same old log jam and nobody getting anything except "Project Backoff" notices. So different from last weeks restart.

Hope they still have the resends turned on otherwise it's ghost heaven again.

T.A.

It will be interesting to see what ghost reports come around now.

Yes, I am happy the bandwidth is maxxed......
Until proven otherwise.

I had several rigs without anything to do when I got home from work about 2 hours ago.
They have now all received some work.....

And yes, the downloads take some retries and are coming through slowly, but they are coming through. Given the countless computers that are requesting work, I think that would be expected.

I am still awaiting an answer to the question of whether the limited capabilities of the older server were to blame for creating most of the ghosts, or if it is an artifact of the bandwidth being maxxed out.

I am not sure that has really been determined yet.

This should be a good test.

Meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1056572 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1056574 - Posted: 16 Dec 2010, 8:34:23 UTC - in response to Message 1056572.  

Resends are definitely still on. I've gotten the messages a couple times so far. WU's are coming in faster than I can crunch them, so I'm happy :)
ID: 1056574 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1056576 - Posted: 16 Dec 2010, 9:08:56 UTC - in response to Message 1056574.  
Last modified: 16 Dec 2010, 9:10:34 UTC

Hmmm ... lucky you.

All I get is this here. Connection errors and/or no wu avail. messages:

...
16/12/2010 20:05:33 SETI@home Requesting new tasks for GPU
16/12/2010 20:06:20 Project communication failed: attempting access to reference site
16/12/2010 20:06:20 SETI@home Scheduler request failed: Failure when receiving data from the peer
16/12/2010 20:06:21 Internet access OK - project servers may be temporarily down.
16/12/2010 20:07:24 SETI@home Sending scheduler request: To fetch work.
16/12/2010 20:07:24 SETI@home Requesting new tasks for GPU
16/12/2010 20:08:14 SETI@home Scheduler request completed: got 0 new tasks
16/12/2010 20:08:14 SETI@home Message from server: No work sent
16/12/2010 20:08:14 SETI@home Message from server: No work is available for Astropulse v5
work.
...
ID: 1056576 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1056582 - Posted: 16 Dec 2010, 9:43:00 UTC - in response to Message 1056576.  

Did you set "If no work for selected applications is available, accept work from other applications?" to "yes"?
ID: 1056582 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1056583 - Posted: 16 Dec 2010, 9:47:55 UTC - in response to Message 1056572.  
Last modified: 16 Dec 2010, 9:49:34 UTC

It will be interesting to see what ghost reports come around now.

Yes, I am happy the bandwidth is maxxed......
Until proven otherwise.

I had several rigs without anything to do when I got home from work about 2 hours ago.
They have now all received some work.....

And yes, the downloads take some retries and are coming through slowly, but they are coming through. Given the countless computers that are requesting work, I think that would be expected.

I am still awaiting an answer to the question of whether the limited capabilities of the older server were to blame for creating most of the ghosts, or if it is an artifact of the bandwidth being maxxed out.

I am not sure that has really been determined yet.

This should be a good test.

Meow.


Two and a half hours to download 20 W/U's is not an efficient use of bandwidth. not to mention the http errors, bad checksums and download speeds of less than 1KBs (if they're running at all). Certainly my rigs have "some" units, but none of them have enough to run all processors at once.

This is a distinct change from last week's restart when all rigs soon had enough units to keep them busy. They may not have had a full cache but units were getting through much faster and downloads were running ahead of the processors.

What's the point of having a few units to crunch, then going back to square one when they've finished ? There were just as many computers chasing work last week and they were all being fed much more efficiently than is happening now.

If this is a test, the system is failing.

edit Crunching time = 10 minutes, download time = 15 to 20 minutes. Not good /edit

T.A.
ID: 1056583 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1056591 - Posted: 16 Dec 2010, 10:50:34 UTC - in response to Message 1056583.  

It will be interesting to see what ghost reports come around now.

Yes, I am happy the bandwidth is maxxed......
Until proven otherwise.

I had several rigs without anything to do when I got home from work about 2 hours ago.
They have now all received some work.....

And yes, the downloads take some retries and are coming through slowly, but they are coming through. Given the countless computers that are requesting work, I think that would be expected.

I am still awaiting an answer to the question of whether the limited capabilities of the older server were to blame for creating most of the ghosts, or if it is an artifact of the bandwidth being maxxed out.

I am not sure that has really been determined yet.

This should be a good test.

Meow.


Two and a half hours to download 20 W/U's is not an efficient use of bandwidth. not to mention the http errors, bad checksums and download speeds of less than 1KBs (if they're running at all). Certainly my rigs have "some" units, but none of them have enough to run all processors at once.

This is a distinct change from last week's restart when all rigs soon had enough units to keep them busy. They may not have had a full cache but units were getting through much faster and downloads were running ahead of the processors.

What's the point of having a few units to crunch, then going back to square one when they've finished ? There were just as many computers chasing work last week and they were all being fed much more efficiently than is happening now.

If this is a test, the system is failing.

edit Crunching time = 10 minutes, download time = 15 to 20 minutes. Not good /edit

T.A.

All of which goes to prove that Ned Ludd knew what he was talking about when he said that the download bandwidth needed to be more intelligently managed.

Having it maxxed out is not an intelligent solution.

Sure, the error correction and retry protocols will get there in the end - but all those retry packets merely add to the congestion, and reduce the space available for real data.

We probably got more actual work through the pipe last week at 60% - 80%, than we are doing right now at 93%.
ID: 1056591 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1056601 - Posted: 16 Dec 2010, 12:29:21 UTC

exactly Richard. Having the splitters able to work full bore is great, but there needs to be some restriction of the download server ports, both on number and speed to bring the bandwidth used OFF of peak. It does not need to be far below, but below.
Janice
ID: 1056601 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1056602 - Posted: 16 Dec 2010, 12:36:11 UTC - in response to Message 1056591.  

We probably got more actual work through the pipe last week at 60% - 80%, than we are doing right now at 93%.

Most definitely, after this amount of uptime last week, my rigs had 3 or 4 times more work on board than they do now. Also the new allocations were downloading at record speeds, leaving the pipes clear for others to get their new units.

I was not the only person to notice this, there were many comments on the boards at the time as to how freely the work was flowing.

T.A.
ID: 1056602 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19323
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1056623 - Posted: 16 Dec 2010, 15:29:23 UTC - in response to Message 1056602.  

We probably got more actual work through the pipe last week at 60% - 80%, than we are doing right now at 93%.

Most definitely, after this amount of uptime last week, my rigs had 3 or 4 times more work on board than they do now. Also the new allocations were downloading at record speeds, leaving the pipes clear for others to get their new units.

I was not the only person to notice this, there were many comments on the boards at the time as to how freely the work was flowing.

T.A.

Totally agree. I've got an AP task d/loading, it's at 1.67/8.00 MB 20.82% it was sent at 11:18:05 UTC.
ID: 1056623 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1056628 - Posted: 16 Dec 2010, 15:37:35 UTC

So...
Is there such a thing as a bandwidth limiting switch/router?
Or must such a thing be done with server settings?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1056628 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1056634 - Posted: 16 Dec 2010, 15:59:44 UTC

Thinking out loud....

Isn't there a server setting limiting the number of concurrent connections that can be made at any given time?

Could the bandwidth be throttled with that mechanism?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1056634 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1056635 - Posted: 16 Dec 2010, 16:00:23 UTC - in response to Message 1056628.  

So...
Is there such a thing as a bandwidth limiting switch/router?
Or must such a thing be done with server settings?


It could be limited in the network, but the place with the most control would be in the servers. The download servers would be the obvious place to do it, since they have the vast majority of the least time dependent traffic.

at the (excuse me if this is out-dated, but same princples should apply) tty or ttyc's you can restrict each connection to X speed, and Y number of connections by the number that are made available.

We see when things are congested, often 2-16Kbps throughput, IF we do not fail.
If each port is "wide open" and there are plenty, the bandwidth can be jam packed, exceeding the available capabilities of the main connection. Try to put 120MBPS through a 90MBPS capability, and you have a traffic jam. Everything fights for space, and no one gets much of anywhere.

For example if there were only 1500 connection ports, at 56K each, some people would get "backed off" but those that did get through would get a full 56K throughput. This is the same principle of metering traffic getting onto a congested freeway.

The actual number of ports that can be supported and what speeds to set them at would take some trial and error. But "as fast as you can" is counter productive. You need to leave some room for other communications.

I was trying to figure out another way to do it, and it is possible that download servers could be given 3-4 10MBPS network cards, but this too could cause problems. So the cleanest, cheapest, and most controllable point would be the download servers themselves.
Janice
ID: 1056635 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1056642 - Posted: 16 Dec 2010, 16:09:28 UTC - in response to Message 1056623.  

When I got home from work this morning my main machine was struggling to download some new work, when I checked the messages these were al re-sends, I had no ghosts 12 Hrs previous so these must have ghoasted on the first new work request.

I have now reduced my cache and will try to survive untill the download stampede reduces to a more sane level. I am still clearing Einstien on my CPU so should be able to last a couple of days.

It definately looked a lot better when the splitters were not generating enough work to saturate the download pipe, I know people want work but if the the splitters were slowed down a bit untill such a point that work started to accumilate without the pipe becoming saturated downloads would probabally be a lot smoother.




Kevin


ID: 1056642 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1056679 - Posted: 16 Dec 2010, 17:30:55 UTC

Since this time we started splitting AP and MB together from a standing start, we can see that AP gets through the tapes quicker than MB. So a slight proportional reduction in the number of AP splitters might help, by keeping a few of those big 8MB WUs out of the pipe.
ID: 1056679 · Report as offensive
1 · 2 · 3 · 4 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (42) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.