Panic Mode On (80) Server Problems?

Author	Message
Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1331986 - Posted: 27 Jan 2013, 14:31:30 UTC - in response to Message 1331983. Things are a bit frustrating right now, thinking the project needs to setup a second scheduler on another server and load balance between them. If i remember correctly BOINC does support doing this. Doesn`t resolve the bandwidth issue. With each crime and every kindness we birth our future. ID: 1331986 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1331988 - Posted: 27 Jan 2013, 14:36:50 UTC - in response to Message 1331979. Greetings, Ok, here's my thing: The way I understood what BOINC was, back in 04, was that it was an application to 'set-n-forget', right? Right. Well, since I re-started crunching SETI in December or November last year, BOINC has been anything but 'set-n-forget'. Observe: WUs do not download to my PC automagically. Uploads do go automagically. Finished WUs do not report automagically (please refer to first statement in this paragraph). The only way for the finished WUs to be reported and to download new WUs is to manually hit the "Update" button. I will report anywhere from 10 to 50, or more, WUs and download an equal number after hitting the button, scheduler cooperation notwithstanding. This to me is not very 'set-n-forget'. :( What cache settings are you using? Sounds as if you're still running Boinc 6 Cache settings. With Boinc 7 it'll wait until it's below the 'Minimum work buffer' ('Maintain enough tasks to keep busy for at least' on the Setiathome computing preferences page) before asking for work again to save on scheduler contacts. Claggy ID: 1331988 ·

Cliff Harding Volunteer tester Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67	Message 1331996 - Posted: 27 Jan 2013, 14:55:10 UTC I must be one of the extremely lucky ones as I have 50 D/Ls that have been trying to come down the pike over night. I am now force feeding the machine to get them on board. My fastest machine is down to 42 tasks and that won't last through the day unless I suspend processing for a couple of hours and try to connect again. I don't buy computers, I build them!! ID: 1331996 ·

Siran d'Vel'nahr Volunteer tester Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238	Message 1331997 - Posted: 27 Jan 2013, 15:04:36 UTC - in response to Message 1331988. Greetings, Ok, here's my thing: -[ snip ]- This to me is not very 'set-n-forget'. :( What cache settings are you using? Sounds as if you're still running Boinc 6 Cache settings. With Boinc 7 it'll wait until it's below the 'Minimum work buffer' ('Maintain enough tasks to keep busy for at least' on the Setiathome computing preferences page) before asking for work again to save on scheduler contacts. Claggy Greetings Claggy, Ok, I changed my minimum setting. For whatever reason it was set to 0.1, it's been there for like, forever. :/ I changed it to 5. So, that should change the way BOINC communicates with the server(s). Thanks! :) Keep on BOINCing...! :) CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath ID: 1331997 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1331998 - Posted: 27 Jan 2013, 15:14:27 UTC - in response to Message 1331997. Greetings, Ok, here's my thing: -[ snip ]- This to me is not very 'set-n-forget'. :( What cache settings are you using? Sounds as if you're still running Boinc 6 Cache settings. With Boinc 7 it'll wait until it's below the 'Minimum work buffer' ('Maintain enough tasks to keep busy for at least' on the Setiathome computing preferences page) before asking for work again to save on scheduler contacts. Claggy Greetings Claggy, Ok, I changed my minimum setting. For whatever reason it was set to 0.1, it's been there for like, forever. :/ I changed it to 5. So, that should change the way BOINC communicates with the server(s). Thanks! :) Keep on BOINCing...! :) Make sure you also set the '... and up to an additional' setting to a low value, say 0.01, If you have cache setting of 5 + 5 days, Boinc will fill up to 10 days work, then not ask again until it drops below 5 days, Claggy ID: 1331998 ·

Fred E. Volunteer tester Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0	Message 1331999 - Posted: 27 Jan 2013, 15:18:02 UTC I must be one of the extremely lucky ones as I have 50 D/Ls that have been trying to come down the pike over night. I am now force feeding the machine to get them on board. I also had some overnight luck and got up to the obsolete limit for gpu and I'm working on the cpu limit. Downloads are slow but are coming through w/o much intervention. Just clear the retries when I want to ask for work. Have had half a dozen successful connects this AM. That's more than all day yesterday. Not using a proxy. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. ID: 1331999 ·

Siran d'Vel'nahr Volunteer tester Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238	Message 1332001 - Posted: 27 Jan 2013, 15:22:26 UTC - in response to Message 1331998. Greetings, Ok, here's my thing: -[ snip ]- This to me is not very 'set-n-forget'. :( -[ snip ]- Claggy Greetings Claggy, Ok, I changed my minimum setting. For whatever reason it was set to 0.1, it's been there for like, forever. :/ I changed it to 5. So, that should change the way BOINC communicates with the server(s). Thanks! :) Keep on BOINCing...! :) Make sure you also set the '... and up to an additional' setting to a low value, say 0.01, If you have cache setting of 5 + 5 days, Boinc will fill up to 10 days work, then not ask again until it drops below 5 days, Claggy Greetings Claggy, Ok, will do. It is set to 4 days right now, I will lower that. Thanks. :) Keep on BOINCing...! :) CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath ID: 1332001 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380	Message 1332012 - Posted: 27 Jan 2013, 17:06:10 UTC I've been getting loads of resends all day on my newest cruncher. They are taking ages to get through, but the number of tasks visibly downloading or resident is now nearer the number that the website thinks are "resident" on that cruncher, so we might be getting towards the end of a retry storm (of shorties) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1332012 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1332095 - Posted: 27 Jan 2013, 21:05:38 UTC My take on downloads not going and scheduler requests not happening due to project back-off is that up until a few months ago, I was still using the last pre-GPU build of BOINC (6.2.19). For that build, there was no project back-off. Every file transfer has its own back-off counter, and if they were all waiting for a re-try, scheduler would not happen automatically, but as soon as one counter reached zero, scheduler request would go out. Also, the maximum back-off was 3:59:59. I had nearly zero issues with getting work or reporting finished work with that. Since switching to a more modern (but still very outdated) build, it requires a lot of baby-sitting, because one download will stall, and next thing you know, you're in total lock-down for 18 hours unless you start pressing buttons. I never had to babysit 6.2.19. That project back-off scheme, and a back-off interval of more than 4 hours ruined everything. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1332095 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380	Message 1332098 - Posted: 27 Jan 2013, 21:46:50 UTC The more modern BOINC versions are very bad at using very long delays. Given that the top crunchers are capable of rattling through tasks at over 1 per minute these big delays are counter productive. By the time the delay has worked its way trough there are several more tasks to be reported, and more tasks being demanded. Short delays, and clearing out part-downloads in preference to un-started would all help get rid of the backlog, probably reduce the re-try rate and so ease the burden on the servers. As it is it's I would say the scheduler is set up in such a way as to generate queues not prevent them. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1332098 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1332099 - Posted: 27 Jan 2013, 21:53:53 UTC - in response to Message 1331988. Last modified: 27 Jan 2013, 21:59:44 UTC What cache settings are you using? Sounds as if you're still running Boinc 6 Cache settings. I'm using V6 & at the moment it is not set & forget- you have to keep on hitting retry when the Scheduler is borked, or disable & re-enable network access when the Scheduler is working but the network trffic is maxed out. Not spending all day at the computer means i've been running out of work quite regularly over the last couple of weeks. Although it does appear the Scheduler came back to life a few hours agao, so now many of the downloads take 10min or more (just for MB). Grant Darwin NT ID: 1332099 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1332101 - Posted: 27 Jan 2013, 21:59:03 UTC - in response to Message 1331986. Things are a bit frustrating right now, thinking the project needs to setup a second scheduler on another server and load balance between them. If i remember correctly BOINC does support doing this. Doesn`t resolve the bandwidth issue. I think that, as bad as the servers have been, is probably the biggest issue. Even when the servers are working, work is damn near impossible to come by. For a while there they were using the campus network for Scheduler requests, and it was great. No problems with data from the peer, no timeouts, no problems conecting the server & no matter how many tasks you were reporting & how much work you were requesting 5 seconds was the longest it was taking to get a response. Often the responses came within 3 seconds. Grant Darwin NT ID: 1332101 ·

Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9	Message 1332114 - Posted: 27 Jan 2013, 22:45:45 UTC - in response to Message 1332101. Grant The last time they used the campus network for schedular requests Ths is what happened. The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven. Eric ID: 1332114 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1332150 - Posted: 28 Jan 2013, 3:38:17 UTC - in response to Message 1332114. Grant The last time they used the campus network for schedular requests Ths is what happened. The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven. Eric ? Grant Darwin NT ID: 1332150 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1332166 - Posted: 28 Jan 2013, 6:00:05 UTC Not entirely sure what that is about, either. I do remember they tried changing the IP and also the ISP the scheduler listened on. The server was still in the closet and was still hooked up to the same internal network as all the other servers, but it had an IP for the campus ISP, and DNS was of course updated to reflect that. My memory is a little fuzzy, but I think that made things worse somehow, but I don't recall just how. There also may have been some kind of issue with remote-login since the scheduler was now on a different subnet, which would require someone to actually go into the lab. If we could possibly get our soft-limit of 100mbit increased to 150, that would probably fix just about everything regarding communications. That won't fix the database having I/O performance issues, or getting fragmented and bloated, so limits may still be required, but maybe the limits could be increased a little, like 50% to start with and see what happens after a week. Then add another 50%, and so on. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1332166 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380	Message 1332272 - Posted: 28 Jan 2013, 17:39:08 UTC A quick sum Number of MB tasks produced per second ~60 (based on an average production rate of 30WU/s) Amount of MB data to be transferred per second = 60*366 = 22000KB Now that's only 22MB per second, which leaves a fair bit of change from the 100KBs pipe. So what is gobbling up the other 78MB??? My sums ignore overheads, even if these run at 100% of the "real" data there is something having a fair old feast at the expense of S@H's link between the lab and the outside world.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1332272 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1332275 - Posted: 28 Jan 2013, 17:46:15 UTC - in response to Message 1332272. A quick sum Number of MB tasks produced per second ~60 (based on an average production rate of 30WU/s) Amount of MB data to be transferred per second = 60*366 = 22000KB Now that's only 22MB per second, which leaves a fair bit of change from the 100KBs pipe. So what is gobbling up the other 78MB??? My sums ignore overheads, even if these run at 100% of the "real" data there is something having a fair old feast at the expense of S@H's link between the lab and the outside world.... Check your bits and bytes. 22 MegaBytes (normal unit for file sizes and storage) is a lot more than 100 Megabits (normal unit for communications channels) ID: 1332275 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1332281 - Posted: 28 Jan 2013, 18:03:28 UTC - in response to Message 1332275. A quick sum Number of MB tasks produced per second ~60 (based on an average production rate of 30WU/s) Amount of MB data to be transferred per second = 60*366 = 22000KB Now that's only 22MB per second, which leaves a fair bit of change from the 100KBs pipe. So what is gobbling up the other 78MB??? My sums ignore overheads, even if these run at 100% of the "real" data there is something having a fair old feast at the expense of S@H's link between the lab and the outside world.... Check your bits and bytes. 22 MegaBytes (normal unit for file sizes and storage) is a lot more than 100 Megabits (normal unit for communications channels) He would also be ignoring AP WUs as well........ "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1332281 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380	Message 1332292 - Posted: 28 Jan 2013, 18:43:40 UTC - in response to Message 1332275. In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1332292 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1332294 - Posted: 28 Jan 2013, 18:48:08 UTC - in response to Message 1332292. In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? It seems to be usually when the larger AP WUs are added to the download mix that things get rather tied up. I have noticed at times that it appears that AP downloads, although still slow, seem to be less likely to stall or hang, thus tying up the download link longer. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1332294 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.