Message boards :
Number crunching :
Panic Mode On (16) Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next
Author | Message |
---|---|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Which is ultimately the point. It's nice when your cache is full, but the only time you should worry (if ever) is when it is empty. If the queue is low, BOINC will get work eventually. Waiting is. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I often find myself asking "are words like fault applicable in this case?" One side effect (intended or otherwise) is to smooth out downloads. If work units were assigned twice as quickly, you'd see twice as many simultaneous connections to the download server, and bandwidth use would have more peaks (which tend to exceed bandwidth anyway). I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc. The joys of optimising a system. Remove one bottle neck to find or create another. Rinse & repeat till the system is about right- or it's reached it's breaking point. Then upgrade the system & start all over again, with different bottlenecks & solutions. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14673 Credit: 200,643,578 RAC: 874 |
Who said anything about 'fault'? Searching this thread only throws it up as a component of 'default'. But you may be trying to steer us on to another train of thought. One of the problems I perceive with the whole BOINC/project setup is that it is huge and complex. Like the committee of blind people trying to describe an elephant, each participant tends to see only the part nearest to them. Matt Lebofsky is brilliant at managing the hotch-potch of servers, operating systems, databases and network appliances that inhabit his closet. But he doesn't actually understand BOINC very well (no criticism: just acknowledgement of his speciality). In this case - don't forget that a balance-swing towards MB would result in shorter, though admitedly more numerous, downloads. I think that's well within Matt's skillset to manage. Edit - after all these years, I still can't spell Matt's surname properly. And I know what it feels like. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Don't make the kitties mad................ "Time is simply the mechanism that keeps everything from happening all at once." |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66202 Credit: 55,293,173 RAC: 49 |
Don't make the kitties mad................ That's not possible Mark, Yer not a Joker, As Mad is not the right word, Angry is. Mad means Crazy and so many people always use the word Mad to say that their Angry when to those Who know the real meaning It says Crazy. And I don't think Yer Crazy. Someone else that I know of, Maybe, Maybe not. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Don't make the kitties mad................ Not meaning to be pedantic, but mad as in crazy is just as legitimate an English definition as mad as in angry is. If not, then there's at least two generations of Sesame Streeters who were erroneously told, "...it's not bad to get mad!". :-) <edit> Back on topic; Yeah, this replica server has been trying hard all day long to go 'castors up' off and on. :-( Alinator |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Some observations on the Feeder weighting: In addition to Matt's comments, I had an email exchange with Eric, his response was also favorable to shifting to more slots for MB. The ap_splitter processes very seldom produce more than 1 result per second. The Feeder is supposed to refill the AP_v5 slots in the shared memory for the Scheduler every 2 seconds. Unless there's a queue of "ready to send" AP_v5 results, guess how many slots can possibly be filled. Ned's thought that getting the number of MB slots back to something approaching the 100 available before AP was released might cause download difficulties is certainly a risk. Yet only delivering about 45 MBits/sec of work when AP_v5 isn't available doesn't make sense to me, the goal ought to be getting the MB side caught up. That shared memory mechanism does have possibilities as a limiter to keep downloads from saturating, but I'd rather see it implemented as a deliberate throttle a project could use when needed. Having over 90% MB should mean that hosts with default preferences get mostly MB work, leaving more of the AP_v5 work available for those who really want it. I don't think turtles eat cat food unless they're really hungry... When 5.05+ comes out of Beta and the optimized AP advantage shrinks, I expect those who focus on credits to be unhappy. Once the server-side credit multiplier adapts, I think credit rates for optimized MB and AP may be about equal. Joe |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc. I've been working on a list server that is so incredibly efficient at generating outbound mail that it efficiently flooded the underlying SMTP server. Slowing down the list server actually made messages go out faster. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Do not make fun of the kitties......... Whoever the 'turtles' are.........they eat 'cat food'......whenever and wherever they want to.......SIR.......meeeeeeeeowwwwwwwwwwwwwwwwwwwwwwwwwwww. "Time is simply the mechanism that keeps everything from happening all at once." |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
One of my recurring themes is that we have to look at this whole thing as one system -- starting at the recorder on the telescope, and ending in the science database. The system includes the BOINC software running on each volunteer's PC. The goal should be to maximize throughput from end to end. My (sadly, long) experience with that is that when you have a chain A->B->C->D and you make "B" truly optimal, it tends to suck "A" dry, and flood "C" -- neither of which is optimal when you look at the "ABCD" system and not at each process. I've had thoughts like "why only 100 slots" in the scheduler ready queue, followed by "why not have one queue per application" or "why not just make the queue bigger." .... and then I realize that would mean the scheduler would never turn a system away hungry, and all of those hosts will instantly want to download the assigned files, and now we have a huge spike in bandwidth. The BOINC servers tell the BOINC client "wait 11 seconds before calling again" and if that was 11 minutes, it would slow down all of the clients hungrily demanding work. Slow down the connection rate and you reduce the load on the scheduler. Requests complete faster because there is more bandwidth per connection. Slowing work assignment means fewer failed attempts. ... and as long as our clients are not empty, there will be no wasted CPU cycles. I know it is a paradox, but slowing down would very likely make things a lot faster. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
It seems to me that the only reason to reserve slots in the queue is to make sure that those who say "only multibeam" will have a good chance of getting only multibeam and those who say "only astropulse" will have a good chance of getting only astropulse. For that to work, you have to always have a little bit of each available. The problem (like in so many other things) comes when predicting the future. I've read elsewhere that right now the Astropulse splitters have processed all of the waiting tapes, and there is nothing more to split. If there are 50 Astropulse "slots" then and 0 work, then there are always 50 empty slots. What I'd probably code: try to fill 50 slots with AP, try to fill 50 slots with MB, then fill whatever is still open with whatever you have. Plan "B" would be to keep count of what is being sent (AP or MB) and try to follow the balance -- if you ran out of AP you'd be sending 100% MB and all slots would be MB -- maybe giving the "minority" project a few more slots than strictly needed so it would tend to increase when that work was available. I'm sure there are other factors we haven't thought about (yet). |
Westsail and *Pyxey* Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0 |
Seems all very logical. I have similar prolem with tesla card. If 2 threads are fed to it with "high" priority it will effectively drown out the other sytem process causing a decrease in total throughput. "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ "Time is simply the mechanism that keeps everything from happening all at once." |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Mark, I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat. It's only a problem when the kitties start to starve. Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible. -- Ned |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Ned......don't try to coddle me.......... I only look after my kitties' best interests.........not yours or anybody else's/ Got it??? "Time is simply the mechanism that keeps everything from happening all at once." |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Sorry Ned.......that was rather rude. But to the point.......I did not mean to be so brash. "Time is simply the mechanism that keeps everything from happening all at once." |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ The project should feed all the kitties, not just the ones who yeowl at the bowl. They can't cater to some "fat cats" and let the rest starve. Got it? |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66202 Credit: 55,293,173 RAC: 49 |
Don't make the kitties mad................ Are You sure the Server didn't get bit? Maybe the server needs a Rabies shot. ;) Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66202 Credit: 55,293,173 RAC: 49 |
When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............ Nope, Equal rations to all. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.