Message boards :
Number crunching :
Panic Mode On (80) Server Problems?
Message board moderation
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 · Next
Author | Message |
---|---|
Mike Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80 |
Things are a bit frustrating right now, thinking the project needs to setup a second scheduler on another server and load balance between them. If i remember correctly BOINC does support doing this. Doesn`t resolve the bandwidth issue. With each crime and every kindness we birth our future. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Greetings, What cache settings are you using? Sounds as if you're still running Boinc 6 Cache settings. With Boinc 7 it'll wait until it's below the 'Minimum work buffer' ('Maintain enough tasks to keep busy for at least' on the Setiathome computing preferences page) before asking for work again to save on scheduler contacts. Claggy |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
I must be one of the extremely lucky ones as I have 50 D/Ls that have been trying to come down the pike over night. I am now force feeding the machine to get them on board. My fastest machine is down to 42 tasks and that won't last through the day unless I suspend processing for a couple of hours and try to connect again. I don't buy computers, I build them!! |
Siran d'Vel'nahr Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238 |
Greetings, Greetings Claggy, Ok, I changed my minimum setting. For whatever reason it was set to 0.1, it's been there for like, forever. :/ I changed it to 5. So, that should change the way BOINC communicates with the server(s). Thanks! :) Keep on BOINCing...! :) CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Greetings, Make sure you also set the '... and up to an additional' setting to a low value, say 0.01, If you have cache setting of 5 + 5 days, Boinc will fill up to 10 days work, then not ask again until it drops below 5 days, Claggy |
Fred E. Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0 |
I must be one of the extremely lucky ones as I have 50 D/Ls that have been trying to come down the pike over night. I am now force feeding the machine to get them on board. I also had some overnight luck and got up to the obsolete limit for gpu and I'm working on the cpu limit. Downloads are slow but are coming through w/o much intervention. Just clear the retries when I want to ask for work. Have had half a dozen successful connects this AM. That's more than all day yesterday. Not using a proxy. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. |
Siran d'Vel'nahr Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238 |
Greetings, Greetings Claggy, Ok, will do. It is set to 4 days right now, I will lower that. Thanks. :) Keep on BOINCing...! :) CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
rob smith Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380 |
I've been getting loads of resends all day on my newest cruncher. They are taking ages to get through, but the number of tasks visibly downloading or resident is now nearer the number that the website thinks are "resident" on that cruncher, so we might be getting towards the end of a retry storm (of shorties) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
My take on downloads not going and scheduler requests not happening due to project back-off is that up until a few months ago, I was still using the last pre-GPU build of BOINC (6.2.19). For that build, there was no project back-off. Every file transfer has its own back-off counter, and if they were all waiting for a re-try, scheduler would not happen automatically, but as soon as one counter reached zero, scheduler request would go out. Also, the maximum back-off was 3:59:59. I had nearly zero issues with getting work or reporting finished work with that. Since switching to a more modern (but still very outdated) build, it requires a lot of baby-sitting, because one download will stall, and next thing you know, you're in total lock-down for 18 hours unless you start pressing buttons. I never had to babysit 6.2.19. That project back-off scheme, and a back-off interval of more than 4 hours ruined everything. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
rob smith Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380 |
The more modern BOINC versions are very bad at using very long delays. Given that the top crunchers are capable of rattling through tasks at over 1 per minute these big delays are counter productive. By the time the delay has worked its way trough there are several more tasks to be reported, and more tasks being demanded. Short delays, and clearing out part-downloads in preference to un-started would all help get rid of the backlog, probably reduce the re-try rate and so ease the burden on the servers. As it is it's I would say the scheduler is set up in such a way as to generate queues not prevent them. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
What cache settings are you using? Sounds as if you're still running Boinc 6 Cache settings. I'm using V6 & at the moment it is not set & forget- you have to keep on hitting retry when the Scheduler is borked, or disable & re-enable network access when the Scheduler is working but the network trffic is maxed out. Not spending all day at the computer means i've been running out of work quite regularly over the last couple of weeks. Although it does appear the Scheduler came back to life a few hours agao, so now many of the downloads take 10min or more (just for MB). Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Things are a bit frustrating right now, thinking the project needs to setup a second scheduler on another server and load balance between them. If i remember correctly BOINC does support doing this. I think that, as bad as the servers have been, is probably the biggest issue. Even when the servers are working, work is damn near impossible to come by. For a while there they were using the campus network for Scheduler requests, and it was great. No problems with data from the peer, no timeouts, no problems conecting the server & no matter how many tasks you were reporting & how much work you were requesting 5 seconds was the longest it was taking to get a response. Often the responses came within 3 seconds. Grant Darwin NT |
Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9 |
Grant The last time they used the campus network for schedular requests Ths is what happened. The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven. Eric |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Grant ? Grant Darwin NT |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Not entirely sure what that is about, either. I do remember they tried changing the IP and also the ISP the scheduler listened on. The server was still in the closet and was still hooked up to the same internal network as all the other servers, but it had an IP for the campus ISP, and DNS was of course updated to reflect that. My memory is a little fuzzy, but I think that made things worse somehow, but I don't recall just how. There also may have been some kind of issue with remote-login since the scheduler was now on a different subnet, which would require someone to actually go into the lab. If we could possibly get our soft-limit of 100mbit increased to 150, that would probably fix just about everything regarding communications. That won't fix the database having I/O performance issues, or getting fragmented and bloated, so limits may still be required, but maybe the limits could be increased a little, like 50% to start with and see what happens after a week. Then add another 50%, and so on. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
rob smith Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380 |
A quick sum Number of MB tasks produced per second ~60 (based on an average production rate of 30WU/s) Amount of MB data to be transferred per second = 60*366 = 22000KB Now that's only 22MB per second, which leaves a fair bit of change from the 100KBs pipe. So what is gobbling up the other 78MB??? My sums ignore overheads, even if these run at 100% of the "real" data there is something having a fair old feast at the expense of S@H's link between the lab and the outside world.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
A quick sum Check your bits and bytes. 22 MegaBytes (normal unit for file sizes and storage) is a lot more than 100 Megabits (normal unit for communications channels) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
A quick sum He would also be ignoring AP WUs as well........ "Freedom is just Chaos, with better lighting." Alan Dean Foster |
rob smith Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380 |
In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? It seems to be usually when the larger AP WUs are added to the download mix that things get rather tied up. I have noticed at times that it appears that AP downloads, although still slow, seem to be less likely to stall or hang, thus tying up the download link longer. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.