Panic Mode On (16) Server problems

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304	Message 901858 - Posted: 31 May 2009, 0:05:58 UTC - in response to Message 901856. I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc. The joys of optimising a system. Remove one bottle neck to find or create another. Rinse & repeat till the system is about right- or it's reached it's breaking point. Then upgrade the system & start all over again, with different bottlenecks & solutions. Grant Darwin NT ID: 901858 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 901864 - Posted: 31 May 2009, 0:15:13 UTC - in response to Message 901856. Last modified: 31 May 2009, 0:25:10 UTC Even though there's plenty of MB work ready to send all i'm getting are 31/05/2009 8:22:51 SETI@home Message from server: (Project has no jobs available) messages. Matt made a comment the other day that was kind-of interesting. The scheduler has a small queue of 100 work units that are available for assignment. Apparently, space is reserved for each application, so if there are 40 slots reserved for AP (and I don't know, it could be divided equally) then the queue is only 60 Multibeam. ... and if the queue can't be replenished faster than the scheduler assigns work, then we get where we are today: lots of "no jobs available" messages because the feeder can't quite keep up. Indeed. And so far as we can tell, it is "equally" (BOINC server default), despite Josef's stirling efforts to tell the staff a) this isn't optimal b) it is configurable There are glimmers of hope that the penny may be beginning to drop in the 2nd. par of message 900643. But how long it will be before that understanding is translated into config files, history alone will tell. I often find myself asking "are words like fault applicable in this case?" One side effect (intended or otherwise) is to smooth out downloads. If work units were assigned twice as quickly, you'd see twice as many simultaneous connections to the download server, and bandwidth use would have more peaks (which tend to exceed bandwidth anyway). I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc. Who said anything about 'fault'? Searching this thread only throws it up as a component of 'default'. But you may be trying to steer us on to another train of thought. One of the problems I perceive with the whole BOINC/project setup is that it is huge and complex. Like the committee of blind people trying to describe an elephant, each participant tends to see only the part nearest to them. Matt Lebofsky is brilliant at managing the hotch-potch of servers, operating systems, databases and network appliances that inhabit his closet. But he doesn't actually understand BOINC very well (no criticism: just acknowledgement of his speciality). In this case - don't forget that a balance-swing towards MB would result in shorter, though admitedly more numerous, downloads. I think that's well within Matt's skillset to manage. Edit - after all these years, I still can't spell Matt's surname properly. And I know what it feels like. ID: 901864 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51515 Credit: 1,018,363,574 RAC: 1,004	Message 901867 - Posted: 31 May 2009, 0:22:34 UTC Don't make the kitties mad................ "Time is simply the mechanism that keeps everything from happening all at once." ID: 901867 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 901879 - Posted: 31 May 2009, 1:36:52 UTC - in response to Message 901868. Last modified: 31 May 2009, 1:46:54 UTC Don't make the kitties mad................ That's not possible Mark, Yer not a Joker, As Mad is not the right word, Angry is. Mad means Crazy and so many people always use the word Mad to say that their Angry when to those Who know the real meaning It says Crazy. And I don't think Yer Crazy. Someone else that I know of, Maybe, Maybe not. Not meaning to be pedantic, but mad as in crazy is just as legitimate an English definition as mad as in angry is. If not, then there's at least two generations of Sesame Streeters who were erroneously told, "...it's not bad to get mad!". :-) <edit> Back on topic; Yeah, this replica server has been trying hard all day long to go 'castors up' off and on. :-( Alinator ID: 901879 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 901887 - Posted: 31 May 2009, 1:58:18 UTC Some observations on the Feeder weighting: In addition to Matt's comments, I had an email exchange with Eric, his response was also favorable to shifting to more slots for MB. The ap_splitter processes very seldom produce more than 1 result per second. The Feeder is supposed to refill the AP_v5 slots in the shared memory for the Scheduler every 2 seconds. Unless there's a queue of "ready to send" AP_v5 results, guess how many slots can possibly be filled. Ned's thought that getting the number of MB slots back to something approaching the 100 available before AP was released might cause download difficulties is certainly a risk. Yet only delivering about 45 MBits/sec of work when AP_v5 isn't available doesn't make sense to me, the goal ought to be getting the MB side caught up. That shared memory mechanism does have possibilities as a limiter to keep downloads from saturating, but I'd rather see it implemented as a deliberate throttle a project could use when needed. Having over 90% MB should mean that hosts with default preferences get mostly MB work, leaving more of the AP_v5 work available for those who really want it. I don't think turtles eat cat food unless they're really hungry... When 5.05+ comes out of Beta and the optimized AP advantage shrinks, I expect those who focus on credits to be unhappy. Once the server-side credit multiplier adapts, I think credit rates for optimized MB and AP may be about equal. Joe ID: 901887 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 901892 - Posted: 31 May 2009, 2:16:17 UTC - in response to Message 901858. I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc. The joys of optimising a system. Remove one bottle neck to find or create another. Rinse & repeat till the system is about right- or it's reached it's breaking point. Then upgrade the system & start all over again, with different bottlenecks & solutions. I've been working on a list server that is so incredibly efficient at generating outbound mail that it efficiently flooded the underlying SMTP server. Slowing down the list server actually made messages go out faster. ID: 901892 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51515 Credit: 1,018,363,574 RAC: 1,004	Message 901893 - Posted: 31 May 2009, 2:23:12 UTC - in response to Message 901887. Having over 90% MB should mean that hosts with default preferences get mostly MB work, leaving more of the AP_v5 work available for those who really want it. I don't think turtles eat cat food unless they're really hungry... When 5.05+ comes out of Beta and the optimized AP advantage shrinks, I expect those who focus on credits to be unhappy. Once the server-side credit multiplier adapts, I think credit rates for optimized MB and AP may be about equal. Joe Do not make fun of the kitties......... Whoever the 'turtles' are.........they eat 'cat food'......whenever and wherever they want to.......SIR.......meeeeeeeeowwwwwwwwwwwwwwwwwwwwwwwwwwww. "Time is simply the mechanism that keeps everything from happening all at once." ID: 901893 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 901895 - Posted: 31 May 2009, 2:29:10 UTC - in response to Message 901864. Last modified: 31 May 2009, 2:30:18 UTC I often find myself asking "are words like fault applicable in this case?" One side effect (intended or otherwise) is to smooth out downloads. If work units were assigned twice as quickly, you'd see twice as many simultaneous connections to the download server, and bandwidth use would have more peaks (which tend to exceed bandwidth anyway). I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc. Who said anything about 'fault'? Searching this thread only throws it up as a component of 'default'. But you may be trying to steer us on to another train of thought. One of my recurring themes is that we have to look at this whole thing as one system -- starting at the recorder on the telescope, and ending in the science database. The system includes the BOINC software running on each volunteer's PC. The goal should be to maximize throughput from end to end. My (sadly, long) experience with that is that when you have a chain A->B->C->D and you make "B" truly optimal, it tends to suck "A" dry, and flood "C" -- neither of which is optimal when you look at the "ABCD" system and not at each process. I've had thoughts like "why only 100 slots" in the scheduler ready queue, followed by "why not have one queue per application" or "why not just make the queue bigger." .... and then I realize that would mean the scheduler would never turn a system away hungry, and all of those hosts will instantly want to download the assigned files, and now we have a huge spike in bandwidth. The BOINC servers tell the BOINC client "wait 11 seconds before calling again" and if that was 11 minutes, it would slow down all of the clients hungrily demanding work. Slow down the connection rate and you reduce the load on the scheduler. Requests complete faster because there is more bandwidth per connection. Slowing work assignment means fewer failed attempts. ... and as long as our clients are not empty, there will be no wasted CPU cycles. I know it is a paradox, but slowing down would very likely make things a lot faster. ID: 901895 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 901898 - Posted: 31 May 2009, 2:42:53 UTC - in response to Message 901887. Ned's thought that getting the number of MB slots back to something approaching the 100 available before AP was released might cause download difficulties is certainly a risk. Yet only delivering about 45 MBits/sec of work when AP_v5 isn't available doesn't make sense to me, the goal ought to be getting the MB side caught up. That shared memory mechanism does have possibilities as a limiter to keep downloads from saturating, but I'd rather see it implemented as a deliberate throttle a project could use when needed. It seems to me that the only reason to reserve slots in the queue is to make sure that those who say "only multibeam" will have a good chance of getting only multibeam and those who say "only astropulse" will have a good chance of getting only astropulse. For that to work, you have to always have a little bit of each available. The problem (like in so many other things) comes when predicting the future. I've read elsewhere that right now the Astropulse splitters have processed all of the waiting tapes, and there is nothing more to split. If there are 50 Astropulse "slots" then and 0 work, then there are always 50 empty slots. What I'd probably code: try to fill 50 slots with AP, try to fill 50 slots with MB, then fill whatever is still open with whatever you have. Plan "B" would be to keep count of what is being sent (AP or MB) and try to follow the balance -- if you ran out of AP you'd be sending 100% MB and all slots would be MB -- maybe giving the "minority" project a few more slots than strictly needed so it would tend to increase when that work was available. I'm sure there are other factors we haven't thought about (yet). ID: 901898 ·

Westsail and Pyxey Volunteer tester Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0	Message 901900 - Posted: 31 May 2009, 2:46:10 UTC - in response to Message 901895. Last modified: 31 May 2009, 2:47:34 UTC One of my recurring themes is that we have to look at this whole thing as one system -- starting at the recorder on the telescope, and ending in the science database. The system includes the BOINC software running on each volunteer's PC. The goal should be to maximize throughput from end to end. My (sadly, long) experience with that is that when you have a chain A->B->C->D and you make "B" truly optimal, it tends to suck "A" dry, and flood "C" -- neither of which is optimal when you look at the "ABCD" system and not at each process. I've had thoughts like "why only 100 slots" in the scheduler ready queue, followed by "why not have one queue per application" or "why not just make the queue bigger." .... and then I realize that would mean the scheduler would never turn a system away hungry, and all of those hosts will instantly want to download the assigned files, and now we have a huge spike in bandwidth. The BOINC servers tell the BOINC client "wait 11 seconds before calling again" and if that was 11 minutes, it would slow down all of the clients hungrily demanding work. Slow down the connection rate and you reduce the load on the scheduler. Requests complete faster because there is more bandwidth per connection. Slowing work assignment means fewer failed attempts. ... and as long as our clients are not empty, there will be no wasted CPU cycles. I know it is a paradox, but slowing down would very likely make things a lot faster. Seems all very logical. I have similar prolem with tesla card. If 2 threads are fed to it with "high" priority it will effectively drown out the other sytem process causing a decrease in total throughput. "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov ID: 901900 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51515 Credit: 1,018,363,574 RAC: 1,004	Message 901901 - Posted: 31 May 2009, 2:51:56 UTC When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ "Time is simply the mechanism that keeps everything from happening all at once." ID: 901901 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 901902 - Posted: 31 May 2009, 3:05:15 UTC - in response to Message 901901. When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned ID: 901902 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51515 Credit: 1,018,363,574 RAC: 1,004	Message 901903 - Posted: 31 May 2009, 3:10:27 UTC - in response to Message 901902. When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned Ned......don't try to coddle me.......... I only look after my kitties' best interests.........not yours or anybody else's/ Got it??? "Time is simply the mechanism that keeps everything from happening all at once." ID: 901903 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51515 Credit: 1,018,363,574 RAC: 1,004	Message 901904 - Posted: 31 May 2009, 3:12:42 UTC - in response to Message 901903. Last modified: 31 May 2009, 3:14:46 UTC When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned Ned......don't try to coddle me.......... I only look after my kitties' best interests.........not yours or anybody else's/ Got it??? Sorry Ned.......that was rather rude. But to the point.......I did not mean to be so brash. "Time is simply the mechanism that keeps everything from happening all at once." ID: 901904 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 901913 - Posted: 31 May 2009, 3:32:52 UTC - in response to Message 901904. Last modified: 31 May 2009, 3:33:12 UTC When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned Ned......don't try to coddle me.......... I only look after my kitties' best interests.........not yours or anybody else's/ Got it??? Sorry Ned.......that was rather rude. But to the point.......I did not mean to be so brash. The project should feed all the kitties, not just the ones who yeowl at the bowl. They can't cater to some "fat cats" and let the rest starve. Got it? ID: 901913 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304	Message 901937 - Posted: 31 May 2009, 4:56:28 UTC - in response to Message 901927. Nope, Equal rations to all. Problem is, some need more while others less. Grant Darwin NT ID: 901937 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 901940 - Posted: 31 May 2009, 5:15:40 UTC - in response to Message 901937. Nope, Equal rations to all. Problem is, some need more while others less. That's why I've been suggesting "empty queue" instead of "full queue" as the proper measure. Best throughput overall does not come from topping up someone's 10 day cache while others run dry. ID: 901940 ·

Misfit Volunteer tester Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0	Message 901946 - Posted: 31 May 2009, 6:00:15 UTC - in response to Message 901913. When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned Ned......don't try to coddle me.......... I only look after my kitties' best interests.........not yours or anybody else's/ Got it??? Sorry Ned.......that was rather rude. But to the point.......I did not mean to be so brash. The project should feed all the kitties, not just the ones who yeowl at the bowl. They can't cater to some "fat cats" and let the rest starve. Got it? Big Cat has spoken! ;) (Pssst... I have yarn.) me@rescam.org ID: 901946 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51515 Credit: 1,018,363,574 RAC: 1,004	Message 901998 - Posted: 31 May 2009, 10:47:31 UTC - in response to Message 901913. Last modified: 31 May 2009, 10:51:14 UTC When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned Ned......don't try to coddle me.......... I only look after my kitties' best interests.........not yours or anybody else's/ Got it??? Sorry Ned.......that was rather rude. But to the point.......I did not mean to be so brash. The project should feed all the kitties, not just the ones who yeowl at the bowl. They can't cater to some "fat cats" and let the rest starve. Got it? The kitties that yeowl at the bowl speak for those who can or will not....... I am the voice of kitties far and wide.....and speak for those who do not stand up for themselves. I am not always right, for sure. But at least I stand up and speak my peace. Got it? "Time is simply the mechanism that keeps everything from happening all at once." ID: 901998 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 902095 - Posted: 31 May 2009, 17:10:44 UTC - in response to Message 901998. When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned Ned......don't try to coddle me.......... I only look after my kitties' best interests.........not yours or anybody else's/ Got it??? Sorry Ned.......that was rather rude. But to the point.......I did not mean to be so brash. The project should feed all the kitties, not just the ones who yeowl at the bowl. They can't cater to some "fat cats" and let the rest starve. Got it? The kitties that yeowl at the bowl speak for those who can or will not....... I am the voice of kitties far and wide.....and speak for those who do not stand up for themselves. I am not always right, for sure. But at least I stand up and speak my peace. Got it? At the beginning of this thread you said: When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ In other words, if you don't get yours, you will quit. In this post, you said: I am the voice of kitties far and wide.....and speak for those who do not stand up for themselves. I am not always right, for sure. But at least I stand up and speak my peace. Either you are demanding that your cache stay full and to heck with everyone else, or you are in favor of equitable distribution of work. Can't be both. When we started a technical discussion of how to best keep everyone working, that was when you threatened to quit. You said you only care about your kitties, no one elses. Maybe I missed something, but it sounds like you don't care how many kitties go hungry as long as your "fat cats" have bowls completely full at all time. Then you say "I speak for those who can't speak for themselves" -- which seems to say that you're for equitable distribution -- kibble for everyone. Please pick one position. In the meantime, the technical discussion is about the best way to allocate work: how to efficiently and quickly keep at least some work in every queue without anyone going without. ID: 902095 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.