Panic Mode On (63) Server problems?

Author	Message
kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51542 Credit: 1,018,363,574 RAC: 1,004	Message 1180965 - Posted: 27 Dec 2011, 17:49:18 UTC - in response to Message 1180964. Same goes for GPU hosting rigs....the faster they are right now, the worse off they are. What makes it really tough on the big rigs is that when things are working OK, as in between shorty storms, they are not now allowed to build a large enough cache to carry them through the times when comms tighten up. That's what really sux for us right now. You said it bro... That's why I have been stumping for weeks now for the Admins and Devs to address the Boinc code problems, get them behind us, and get the dang limits lifted. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1180965 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51542 Credit: 1,018,363,574 RAC: 1,004	Message 1180967 - Posted: 27 Dec 2011, 17:51:38 UTC - in response to Message 1180966. Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year? That could be the case, but I am not certain. If they are on hiatus for the whole week, they may let things limp along as they are other than possibly what could be addressed by remote. Or, possibly an outage later in the week. Just dunno for sure. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1180967 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1180970 - Posted: 27 Dec 2011, 17:55:03 UTC - in response to Message 1180961. What is so odd about that? All things being equal, the quad core is going to do 4 times the amount of work than the single core would. So would have to get and successfully download 4 times the amount of work just to stay even, much less build it's cache. So when up/download and work requests are not flowing well, it's going to be the first one to feel the pain. Same goes for GPU hosting rigs....the faster they are right now, the worse off they are. What makes it really tough on the big rigs is that when things are working OK, as in between shorty storms, they are not now allowed to build a large enough cache to carry them through the times when comms tighten up. That's what really sux for us right now. I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work. The quad-core machine would have about a 10% success rate. Slow machine would get its ~50 MBs in less than 10 requests, but the quad would have to ask for work 50+ times to get maybe 75. Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1180970 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51542 Credit: 1,018,363,574 RAC: 1,004	Message 1180971 - Posted: 27 Dec 2011, 17:57:35 UTC - in response to Message 1180970. What is so odd about that? All things being equal, the quad core is going to do 4 times the amount of work than the single core would. So would have to get and successfully download 4 times the amount of work just to stay even, much less build it's cache. So when up/download and work requests are not flowing well, it's going to be the first one to feel the pain. Same goes for GPU hosting rigs....the faster they are right now, the worse off they are. What makes it really tough on the big rigs is that when things are working OK, as in between shorty storms, they are not now allowed to build a large enough cache to carry them through the times when comms tighten up. That's what really sux for us right now. I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work. The quad-core machine would have about a 10% success rate. Slow machine would get its ~50 MBs in less than 10 requests, but the quad would have to ask for work 50+ times to get maybe 75. Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned. I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1180971 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1180972 - Posted: 27 Dec 2011, 18:05:24 UTC - in response to Message 1180971. I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed. That is true. And I've stated a few times that if we can get more bandwidth, it may create a whole new pile of problems all by itself by allowing more successful contacts to the database. It's one of those things that we'll just have to wait and see what happens and have some contingency plans lined up for some of the possible scenarios. However, the good news is that with all of the enterprise-class networking equipment that is in place, we can get an actual gigabit link, but still rate-limit it to 100mbit, or 150mbit, whatever seems to allow the smoothest data transfer while keeping the database from getting DDoS'ed. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1180972 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51542 Credit: 1,018,363,574 RAC: 1,004	Message 1180974 - Posted: 27 Dec 2011, 18:09:30 UTC - in response to Message 1180972. Last modified: 27 Dec 2011, 18:09:53 UTC I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed. That is true. And I've stated a few times that if we can get more bandwidth, it may create a whole new pile of problems all by itself by allowing more successful contacts to the database. It's one of those things that we'll just have to wait and see what happens and have some contingency plans lined up for some of the possible scenarios. However, the good news is that with all of the enterprise-class networking equipment that is in place, we can get an actual gigabit link, but still rate-limit it to 100mbit, or 150mbit, whatever seems to allow the smoothest data transfer while keeping the database from getting DDoS'ed. Well, if you peruse the information in the GPUUG fundraising thread, you will see than many hardware upgrades are well on their way to being completed. With more to come. As far as I know, we still do not have a real path in place for upgrading the bandwidth, other than having the project's pleas fall on the deaf ears of the Berk IT admins. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1180974 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1181033 - Posted: 28 Dec 2011, 0:01:54 UTC - in response to Message 1180966. Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year? Hmm, the UC Berkeley Academic Calendar shows Monday, Tuesday, Thursday, and Friday as "Academic and Administrative Holiday". Joe ID: 1181033 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51542 Credit: 1,018,363,574 RAC: 1,004	Message 1181034 - Posted: 28 Dec 2011, 0:11:53 UTC - in response to Message 1181033. Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year? Hmm, the UC Berkeley Academic Calendar shows Monday, Tuesday, Thursday, and Friday as "Academic and Administrative Holiday". Joe Ahhh.... So it looks like some of our indentured servants may be in the lab tomorrow for an outage party. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1181034 ·

Richard1949 Send message Joined: 20 Oct 99 Posts: 18 Credit: 232,635 RAC: 0	Message 1181044 - Posted: 28 Dec 2011, 0:38:23 UTC "I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work." --------------------------------------------------- I can't even get anything for my single core machine. ID: 1181044 ·

Richard1949 Send message Joined: 20 Oct 99 Posts: 18 Credit: 232,635 RAC: 0	Message 1181046 - Posted: 28 Dec 2011, 0:41:18 UTC "Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned." ---------------------------------------------- I keep getting "not requesting any tasks." ID: 1181046 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13960 Credit: 208,696,464 RAC: 304	Message 1181111 - Posted: 28 Dec 2011, 8:21:06 UTC 15min to download 1 WU is a bit of a PITA when it takes less than 3min to do 2. Grant Darwin NT ID: 1181111 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 38208 Credit: 261,360,520 RAC: 489	Message 1181112 - Posted: 28 Dec 2011, 8:56:51 UTC - in response to Message 1181111. 15min to download 1 WU is a bit of a PITA when it takes less than 3min to do 2. Personally I still put the current problems on the connection itself between the USA side of our undersea cable and HE as using a proxy here quickly clears any backlogs that occur. Cheers. ID: 1181112 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1181113 - Posted: 28 Dec 2011, 9:05:51 UTC - in response to Message 1181046. I keep getting "not requesting any tasks." And why would that be a server problem when your client doesn't ask for work? GruÃŸ, Gundolf ID: 1181113 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1181114 - Posted: 28 Dec 2011, 9:15:31 UTC - in response to Message 1181112. Last modified: 28 Dec 2011, 9:16:15 UTC 15min to download 1 WU is a bit of a PITA when it takes less than 3min to do 2. Personally I still put the current problems on the connection itself between the USA side of our undersea cable and HE as using a proxy here quickly clears any backlogs that occur. Cheers. Yeah, that certainly would appear to be that under-sea cable. I noticed in my messages tab last night that between "starting download" and "finished download" for an AP, 19 seconds elapsed (~430KB/sec). Of course it was a B3_P1 WU, so it took 24 seconds to error out once processing started. Go figure. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1181114 ·

Dave Send message Joined: 29 Mar 02 Posts: 778 Credit: 25,001,396 RAC: 0	Message 1181118 - Posted: 28 Dec 2011, 10:41:02 UTC So are we in an outage? Why is there no work? ID: 1181118 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22817 Credit: 416,307,556 RAC: 380	Message 1181125 - Posted: 28 Dec 2011, 11:37:13 UTC No signs of an outage, and I've been getting a fairly steady stream of work Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1181125 ·

Richard1949 Send message Joined: 20 Oct 99 Posts: 18 Credit: 232,635 RAC: 0	Message 1181127 - Posted: 28 Dec 2011, 11:49:05 UTC - in response to Message 1181113. I keep getting "not requesting any tasks." And why would that be a server problem when your client doesn't ask for work? GruÃŸ, Gundolf I don't know whats going on. I have uninstalled and reinstalled. I have reset the project. I have tried different BOINC versions. Nothing works. All I get when I request tasks is "not requesting tasks" message. Every other BOINC project works fine. It's only SETI I can't get anything from even though it shows it has 100's of thousands of tasks ready to send out. ID: 1181127 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1181128 - Posted: 28 Dec 2011, 12:00:09 UTC - in response to Message 1181125. No signs of an outage, and I've been getting a fairly steady stream of work Bouncing off the limits here. Pity its all sitting in my download queue:-( Dropped back to 1 WU per card and with max button pushing only getting enough to keep on average 2 out of 3 cards running. Shorties and AP's, never a good combination. Kevin ID: 1181128 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1181129 - Posted: 28 Dec 2011, 12:01:00 UTC - in response to Message 1181127. Last modified: 28 Dec 2011, 12:11:23 UTC I keep getting "not requesting any tasks." And why would that be a server problem when your client doesn't ask for work? GruÃŸ, Gundolf I don't know whats going on. I have uninstalled and reinstalled. I have reset the project. I have tried different BOINC versions. Nothing works. All I get when I request tasks is "not requesting tasks" message. Every other BOINC project works fine. It's only SETI I can't get anything from even though it shows it has 100's of thousands of tasks ready to send out. If your host isn't asking for work from Seti then eithier you're got No New Tasks set, you're got the Seti Project Suspended, you're got one or more Seti Tasks Suspended, or your Cache is Full already. Claggy ID: 1181129 ·

Dave Send message Joined: 29 Mar 02 Posts: 778 Credit: 25,001,396 RAC: 0	Message 1181130 - Posted: 28 Dec 2011, 12:08:12 UTC - in response to Message 1181128. Shorties and AP's, never a good combination. We're after the Goldilocks zone. ID: 1181130 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.