Message boards :
Number crunching :
Panic Mode On (10) Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 12 · Next
Author | Message |
---|---|
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
passion-->project-->missing the point or deflecting. Let's have some passionate problem solving based on dispassionate analysis and problem solving. I can't help with the 6.1 stuff; I'm totally ignorant about the specific details of these versions. Yet the question sounds reasonable. At some level of connections, bruno as the sole upload server must become the bottleneck. Are we there yet? What would be the problem of putting a second parallel server into service for that purpose? Has this been done before? Or is there any sort of buffering parameter that can be adjusted for increased loads? I'm not a network expert but the behavior seems a lot like what I experienced using DOS and typing too fast. Is there any progress on changing the top-off cache policy discussed elsewhere? Because of the number of hosts out there, I would think there is a large multiplier available there to resolve some of the bandwidth blockades, if we simply didn't frequently pester the server for 28 seconds of work (times 300K hosts). |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
One of the problems with analysing SETI problems is that the problem keeps changing, and the solution to one problem won't solve (and may even cause) another problem. But focussing on the issue of the day: I don't think it's an upload problem, so I don't think duplicating the upload functions of Bruno would help in this case. Evidence? When I was monitoring the traffic graphs this morning, I saw a more-than-doubling of the upload traffic (10Mb to 22.75Mb) exactly as the download traffic came down a mere 2% from its peak. Bruno isn't involved in downloads, and can clearly handle peak upload rates way above the baseline average: so my feeling is that this particular problem has a network (router or WAN) source. Why is the network maxxed out? Sometimes it's because Matt is splitting shorties, or we're playing catch-up after an outage: at those times, we as a community are actually able to crunch more than the pipe can supply. It's bound to be maxxed out: the only solution would be a fatter pipe. Matt has re-opened negotiations to increase the bandwidth above 100Mb nominal / 96Mb practical - let's wish him the best of luck. At other times, the network is able to handle the average community demand, but can't handle the peak demand - those strange traffic spikes. Obviously, the 'fat pipe' solution would help here too, but it would also help if the flow was more even - squash the spikes and fill the troughs. I don't think there's much we can do at our end to solve that one. The spikes are too frequent, but too irregular, to be able to schedule a 'spike miss' for our download requests (I got caught out myself when today's 7am spike followed much sooner than I was expecting after the 5am spike). It probably would help to avoid the network congestion if BOINC's automatic download retries backed off further and faster if they were balked by network congestion: but I can see that being unpopular, and possibly even causing as many problems as it solves. Ned's variable p-Persistence, imposing a variable degree of back-off according to a project-specified measure of congestion, sounds like the nearest approach so far. I'm also persuaded by Josef's analysis that the spikes occur because the MB splitters do, but the Astropulse splitters don't, pause when the workunit storage is getting full. That accounts satisfactorily for my personal observation that I'm much more likely to be allocated an AP task if I do a work request during a download traffic spike. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
For the last day (or the last week), Cricket is showing an average of about 74 Mbps on the download side, and over 10 Mbps on the upload side. The average size of a Setiathome_Enhanced result on my hosts is just over 26000 bytes, adding 3% for the overhead of uploading with added XML gives 26780 bytes. That's just about 1/14 the size of a S_E WU, so the portion of the upload bandwidth which is being used by uploads would be 74/14 ~= 5.3 Mbps. The other ~5 Mbps may be mostly requests to the Scheduler. Those requests can be small, but adding in the information for reporting completed work, and the information on other work queued on the host, can easily make such a request considerably larger than an uploaded result. If either an upload or a request to the Scheduler fails with an http error, it is tried again a minute or more later. I think I've seen, but cannot be sure because I'm using dial-up, that such errors are far more likely as soon as the download bandwidth is saturated with AP work. If so, successful retries may be a large part of the peak in upload bandwidth which follows an AP burst. Joe |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
For the last day (or the last week), Cricket is showing an average of about 74 Mbps on the download side, and over 10 Mbps on the upload side. The average size of a Setiathome_Enhanced result on my hosts is just over 26000 bytes, adding 3% for the overhead of uploading with added XML gives 26780 bytes. That's just about 1/14 the size of a S_E WU, so the portion of the upload bandwidth which is being used by uploads would be 74/14 ~= 5.3 Mbps. Which is why some sort of mechanism to "cool down" the BOINC client would be useful -- especially if there was a way for the BOINC servers to broadcast some kind of "speed" metric. |
Jim Volfan Send message Joined: 22 May 99 Posts: 52 Credit: 24,239,706 RAC: 90 ![]() ![]() |
The scheduler processes on anakin are disabled, no work being reported or being sent out. The Cricket graphs have almost flat-lined. Wonder if they were turned off, since they say disabled and not "not running"? Anakin is up, the feeder.i686 process is running normally. Results received in the last hour is at zero, so it has been this way for a little while. I don't expect anything to happen on the Berkeley front for another 8 1/2 hours or so. Be patient folks, it will happen. PS, at least the Results waiting for DB purging is draining... ![]() ![]() |
![]() ![]() Send message Joined: 20 May 99 Posts: 16 Credit: 4,428,996 RAC: 0 ![]() |
. I hope it wont take all weekend ![]() ![]() ![]() |
![]() ![]() Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 10 Jul 03 Posts: 72 Credit: 141,587 RAC: 0 ![]() |
For now, Anakin, the scheduler function is still disabled. hi there, it doesn't have to do with your operating system cause the same happens to me too. Im clicking the header <community> and then on the bottom the option < Languages> ,even when im choosing English the site comes out in half English and half German. SETI |
![]() ![]() Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 ![]() |
|
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51524 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Ringgggggg Ringgggggggg Ringggggggggggg.... Heloo.....have I reached the party to whom I am speaking? Calling Seti Central......uploads still failing...... Please kick once if you can hear me..... Kick twice if you cannot. Kick harder if you cannot read this post...LOL. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51524 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Hmmmmmmmmmm...no answer yet..... "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Hmmmmmmmmmm...no answer yet..... All you can do is wait until the Cricket graph stops flatlining at 95 megabits.... |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51524 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Hmmmmmmmmmm...no answer yet..... Where's the luv? Must be some data transfers taking place again.......I wish the Berkeley admins would fatten that pipe up the hill............. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51524 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Whoop da....... Uploads are working again......slowly....... Kick 'em if you got 'em........ "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
![]() ![]() Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 ![]() |
|
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19551 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Linux machines are having problems actually downloading work, they get assigned but never start downloading and get http errors. The msg that most are receiving is "No work from project" so if you got some you are lucky. From server status page - Results ready to send 454 29m @ As of 1 Nov 2008 12:40:44 UTC Been like that for several hours now. |
Thierry Godefroy Send message Joined: 4 Jul 00 Posts: 12 Credit: 1,043,682 RAC: 0 ![]() |
Same problem here since yesterday: got a (very) small burst of work this morning, but now I'm back with HTTP errors and no more work to do for Seti while work did get assigned to me. :-( sam 01 nov 2008 23:05:27 CET|SETI@home|Started download of 04se08af.30247.890.7.8.148 sam 01 nov 2008 23:05:27 CET|SETI@home|Started download of 04se08ag.29801.890.7.8.169 sam 01 nov 2008 23:05:29 CET||Internet access OK - project servers may be temporarily down. sam 01 nov 2008 23:07:28 CET||Project communication failed: attempting access to reference site sam 01 nov 2008 23:07:28 CET|SETI@home|Temporarily failed download of 04se08af.30247.890.7.8.148: HTTP error sam 01 nov 2008 23:07:28 CET|SETI@home|Backing off 50 min 39 sec on download of 04se08af.30247.890.7.8.148 sam 01 nov 2008 23:07:28 CET|SETI@home|Temporarily failed download of 04se08ag.29801.890.7.8.169: HTTP error sam 01 nov 2008 23:07:28 CET|SETI@home|Backing off 2 hr 30 min 35 sec on download of 04se08ag.29801.890.7.8.169 sam 01 nov 2008 23:07:28 CET|SETI@home|Started download of 04se08ag.29801.890.7.8.173 sam 01 nov 2008 23:07:28 CET|SETI@home|Started download of 04se08ac.8443.19995.16.8.174 sam 01 nov 2008 23:07:30 CET||Internet access OK - project servers may be temporarily down. etc, etc... And of course, this always happen during the week ends... Murphy's Law. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
Shorties! Sir, may I please have some more? |
![]() ![]() Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0 ![]() |
For Linux users (the only ones seeming to have the problems with downloading), a temporary solution is here. I tried it and it worked, but I had to reboot first. More experianced Linux users may know how to refresh the HOSTS file or DNS or whatever...it was easier for me just to reboot. Same problem here since yesterday: got a (very) small burst of work this morning, but now I'm back with HTTP errors and no more work to do for Seti while work did get assigned to me. :-( |
![]() ![]() Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 ![]() |
|
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.