Message boards :
Technical News :
Small Word (Sep 20 2007)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
The system needs to be intervention free; your description requires some manual case handling, which will certainly fail to meet the goals. "Set and Forget" should be the watchword. System would not be much more complicated - just another option to tick in preferences - "machine will work offline" yes/no An 'Away' button in client to manually return uncrunched WUs and suspend. WUs will be returned or reallocated anyway if no contact well before their expiry. System would still work on configurable defaults without manual intervention. |
W-K 666 Send message Joined: 18 May 99 Posts: 19398 Credit: 40,757,560 RAC: 67 |
Can anybody really say what this problem is with long deadlines is? I have just looked at my small list, only 11 units older than 10 days exist which are pending. With an ave RAC of 1200 and guessing the ave cr/unit is 40cr, the oldest unit was issued on 14 Aug 2007 0:37:33 UTC, so therefore I probably crunched 32 * 30 = 960 units in the period up to the 16th. 11/960 *100% = ~1%. I don't think 1% is anything to get your knickers in a twist about. Andy |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
Well, the heartbeat already exists when the client makes contact, which it has to do regularly on much smaller timescales than WU deadlines! I would do the programming myself if I wasn't so busy being paid to do similar things! Still, I will try and get hold of the source code when I have some free time and see if there's anything I can do with it. As for penalties for going AWOL... I'd rather like the opportunity to throw rotten fruit and vegetables at them... bring back stocks! :-) |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
Can anybody really say what this problem is with long deadlines is? In your case, not a big problem. I have 16,000 pending, and many of those for more than a month, some of them from users who have connected once, downloaded a ton of WUs and never been heard of since for whatever reason. They hold up the entire pipeline and delay the science as they can't be accepted until validated, and can't be validated until everyone in the quorum completes. Meanwhile, the pending results take up resources and space in the database. OK I know they're in the bank so to speak, but it is a manifestation of a suboptimal system that can be improved, and it is my passion to make/program beautiful and efficient systems! |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Well, the heartbeat already exists when the client makes contact, which it has to do regularly on much smaller timescales than WU deadlines! The client does NOT have to make contact more frequently than the timescale of deadlines. Whatever gave you the idea that the client does need to make contact more frequently? BOINC WIKI |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
[. You do not take in account multi project processing. Boinc will calculate when a project gets a turn for the CPU and if a project goes in to EDF then it is possible that results have to wait for a long time. However I am running 24/7 to meet deadlines and each result of a project will be returned on time. Why should I abort valid work just because you can't wait for some meaningless credit. |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
Of course, it doesn't have to do anything, or crunch and I don't have to get out of bed in the morning either! Technically it could connect once, get up to ten days of work, and disappear. Or, do them and/or remain idle and submit a few seconds before the deadlines expire. What is the use in that? How many people would spend their lives trying to keep a WU alive and valid for as long as possible just for the hell of it? Even offline systems would attempt to connect at least when the cache ran out, and in all likelihood that would be WELL before a deadline. A client connected to the net will contact at least as often as the cache duration is set. It says so in the preferences. "Computer is connected to the Internet about every [ 2 days ] (Leave blank or 0 if always connected. BOINC will try to maintain at least this much work.)" |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
Why should I abort valid work just because you can't wait for some meaningless credit. I'm not proposing you abort anything, just spitting out what you can't chew so that others can do the work and get the science done quicker. Credits are secondary. Where's the flaw in that? It is selfish and anally retentive to keep work that one can't process in time. If one can give back work, then EDF would never arise. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
It is selfish and anally retentive to keep work that one can't process in time. Assuming you've got other than intermittent connectivity. When you set "connect every 10 days" you are really saying "I'm only going to have connectivity 3 times a month." ... and much of what you're saying Andy works just fine in an always-connected environment. Before multibeam, deadlines were pretty short. It pretty much guaranteed work would be back in a timely manner. Now, we've got deadlines that are fairly long: BOINC doesn't have trouble carrying a big cache, and it rarely if ever needs EDF, but the disadvantage is the long delay when a system goes missing (for whatever reason). Shorter deadlines would solve this fairly nicely. We just need to get past this silly "EDF is a HUGE CRISIS!" mentality. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
A client connected to the net will contact at least as often as the cache duration is set. It says so in the preferences. Not so. A slow system or one sharing with many projects will only contact the S@H servers when needed. Essentially that may only be when a WU is nearly finished, i.e. the system is within "connect time" of running out of work. My 200MHz system has a "connect time" of slightly over a day, but takes up to a week to do a WU. It will only request work within the last day. Joe |
ML1 Send message Joined: 25 Nov 01 Posts: 21209 Credit: 7,508,002 RAC: 20 |
I'm not proposing you abort anything, just spitting out what you can't chew But that is precisely what the cache junkies do due to their overly paranoid fear that they may run out of WUs and suffer a reduced RAC. I guess an alternative is to have "EDF" instead junk/return any WUs that are queued up (unworked) so that "EDF" can be avoided. Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
But that is precisely what the cache junkies do due to their overly paranoid fear that they may run out of WUs and suffer a reduced RAC. Well, I'd recommend that everyone have back up projects and keep a 1-2 day cache, and this will help the servers greatly. S@H seems to be running more smoothly now. Here's a little analogy: There's a large table in the kitchen, with a few people sitting around it. On the table there is a very large bowl of cherries, which is kept full by a regular delivery. Cherries come in pairs, and the aim is to compare the size of each cherrystone with its sibling. The people take one cherry from each pair, put them into their own bowls and eat them. Some quickly, some slowly. Some sit at the table, some go away to eat. They spit out the stones, which are returned to the table when possible and go into a pending pot to be eventually reunited with their siblings for analysis. Some people come along, stuff their bowls full and then wander off and throw up on the floor, until the sweeper comes along months later to mop up and put them back into the bowl. Some others stuff their bowls full too, but they don't have a big appetite. They always like to have a full bowl just in case, but don't want to give them back for others to process. Meanwhile the pending pot gets bigger and heavier, results get delayed and throughput suffers. It is also expensive to store, keep track of and backup the status of so many bored cherrystones! OK, it's a simple analogy, and SETI has a potentially infinite supply of 'cherries', there is no firm goal apart from just finding an ET signal, and there is an infinite time to do it in. However, when do the millions of positive results actually get investigated? All this processing is a waste of time if the results just go to a "write only" database! So who cares if the process isn't perfect? I do, because it can be improved upon. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
My thoughts are that the easiest to implement for those very conscientious participants is the 'dump unstarted WUs' GUI button. However, using "No New Work" earlier would be even more conscientious. Yeah, guess how we can make people more conscientious? Put an upper limit on "Computer is connected to the Internet about every" and "Maintain enough work for an additional" to 7 days. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
So, you're saying that this 754 pin AMD Athlon 3200 is beating the pants off a whole raft of Core II Duos and quad? I appreciate the compliments, but I don't buy it. No, I'm saying since those faster CPUs can crunch more WUs in the same amount of time, they will rack up pending faster because other computers don't have nearly as many processing cores. We are experiencing a quadrupling of CPU power in the last two years alone simply because users are going dual core/quad core (and yes, I'm aware that not everyone is buying a new computer, but it adds up quickly and will continue to do so). Regardless of what their cache setting is, as soon as the WU is finished and the system has reached it's "Connect to network every X" setting, it will update with the servers. If what you are saying were correct, then the systems would wait until their cache has been crunched before updating with the servers and thus cause a delay in pending credit. Since this is not the case, it means that the cache setting cannot be the culprit for high pending credit. It is more likely that this is a side-effect of the stop-and-go problems with the servers since MultiBeam was rolled out combined with the really small WUs that people crunch in short periods of time and pound the servers and the longer workunits that take up to half a day or more (on some CPUs). Throw in the fact that people have a hard time connecting to SETI@Home when they're down (during failures, not Tuesday's outage) and they immediately turn over to their backup project causing further processing delays. Then finally you have the group that got frustrated and just left without finishing their work (this, I will say, is truly rude but nothing can be done about it unless they detach from the project). Your Athlon 3200 is "beating the pants off" of Core 2s because your CPU is probably nearly average of what most people have and doesn't have to wait as long for it's pending to validate while a Core 2 Quad just keeps crunching 4 times as much having to wait for all the slower CPUs to finish. Those machines should have turnarounds at least 1/4 of this one and RACs at least four times higher, but they don't, and the science and we sit and wait a month for a Core II, duo, or quad to return work. As I explained above, their turnarounds are going to be 4x as much, and so their pending will be 4x as much. True, the science sits and waits in the mean time, but "the signal" is already going to be several thousand years old already (and we wouldn't have the time to respond as you posted earlier in your Mock Scenario), so I don't think it's going to make a big difference if another month or two go by. "When all possibilities are eliminated what remains, however improbable, must be the truth." Love that quote, but I hate how people abuse it. 1) We must first find all possibilities. Since possibilities can literally be endless, it makes it quite difficult to leave anything "remaining". 2) One must also understand that just because you've eliminated all possibilities to your own satisfaction, doesn't mean you truly have. This is why we have scientific review. Proclaiming something must be Truth simply because it makes sense to you, doesn't make it so. As long as an intelligent argument can be made for the opposite side (which I feel I've done, as well as others have too), then you truly haven't even finished the first part of the equation that is embedded in that quote. And that's life, too. Couldn't have said it better myself. |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
Why should I abort valid work just because you can't wait for some meaningless credit. Like I said I return work in time, it just may take all the time allowed by the deadline of the result. If that is no problem to you I don't understand what you want to change. Some results have deadlines close to 3 months in the future and I was under the impression that you where not willing to wait that long to recieve your credit. From a sciene pov there is no need to make any changes. The only thing needed is valid returned work. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
The system needs to be intervention free; your description requires some manual case handling, which will certainly fail to meet the goals. "Set and Forget" should be the watchword. After perusing discussion after my post..... Yikes.....the project is not going to alter the way it handle business just to placate users that don't want to wait for their credits. Some suggestions are not bad, but the best solution right now is for the 'estimated completion' times to be corrected to something a little closer to what is actually occuring, and I think Eric or Matt will get around to that. I personally think that any options involving user input might as well be left on the table (opt-in, opt-out, checkbox, 'I'm going to lunch right now, but I'll be back in a few months') because a lot of users wouldn't use them anyway, and it would just open another Pandora's box of questions.....(What do you have your 'time I am going to lunch' preference set to?). The system is not broken, and let's not forget that any modifications to Boinc not only affect Seti, but all other projects under the Boinc umbrella as well. "Time is simply the mechanism that keeps everything from happening all at once." |
H Elzinga Send message Joined: 20 Aug 99 Posts: 125 Credit: 8,277,116 RAC: 0 |
Here's a little analogy: For this to reflect SETI/BOINC jou should ad a few things. First you would need a cherrytree (the data tapes). Second you should be able to clone a cherry which grows from that tree (indeed a WU) When a cherry goes bad it is put in the bin. Cherrys taken out of the large bowl should never be returned but recloned and put into the bowl again. Beside this remarks i agree with you on some points. The long deadlines are probably one of the major contributors to the problem. I run a old machine with a 10 day cache and a 15 day turnaround time. A period of 3 weeks should be sufficient. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
But that is precisely what the cache junkies do due to their overly paranoid fear that they may run out of WUs and suffer a reduced RAC. Since we're doing analogies..... Each cherry has a labelled date when it goes bad. Bad cherries aren't very tasty, and have to be eaten before they go bad. If the labelled date is 2 months in the future, someone can take a big bowl of cherries, walk away, "lose" them, and we won't know until 2 months when the stones don't show up. If the slowest eater (not me, I love cherries!) takes 2 days to eat a cherry, then all we really need is adequate margin -- a "bad" date 10 days in the future should keep everyone happy. So, do we really need a whole new mechanism requiring cherry-eaters to report what they have in their bowl, or do we simply need more realistic expiration dates on our food? |
Heechee Send message Joined: 29 Sep 99 Posts: 5 Credit: 13,765,984 RAC: 32 |
So, do we really need a whole new mechanism requiring cherry-eaters to report what they have in their bowl, or do we simply need more realistic expiration dates on our food? In all of the debates on this thread I don't think I have seen where the science is the main consideration. The most important thing to consider is that the greatest amount of work possible is done for the project. By having WUs that have long expiration dates it can mean more WU are completed in a given amount of time because some of the older machines have a chance to complete the calculations. The biggest problem with long expiration dates is that a few users may get upset and quit the project. Because the ones that would quit would probably have the faster machines, this could have a more detrimental effect than losing some of the slower machine due to short expiration dates. IMHO, I think that the project should shorten the expiration date a little, but not by that much. Hopefully this will create a good balance and give the greatest amount of WUs completed in the long run. I also think it would be a mistake to complicate the project with unnecessary extra controls. It works the way it is now, lets keep it as simple as possible. An ant on the move does more than a dozing ox. - Lao Tzu |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
In all of the debates on this thread I don't think I have seen where the science is the main consideration. The most important thing to consider is that the greatest amount of work possible is done for the project. Agreed. By having WUs that have long expiration dates it can mean more WU are completed in a given amount of time because some of the older machines have a chance to complete the calculations. The biggest problem with long expiration dates is that a few users may get upset and quit the project. Because the ones that would quit would probably have the faster machines, this could have a more detrimental effect than losing some of the slower machine due to short expiration dates. Speaking in very general terms: the project should never cater to specific users out of fear they'll leave. That can be considered a weapon and can be wielded against the project anytime a user dislikes what is going on, which would in turn become politics (which, as I understand, every scientist hates). Besides, I'm certain that if you add up all the crunching power of those users with only one machine attached who barely pay attention to what's going on and don't get involved in these Credit Debates will total more than even the avid power-cruncher. IMHO, I think that the project should shorten the expiration date a little, but not by that much. Hopefully this will create a good balance and give the greatest amount of WUs completed in the long run. Sounds like a good idea overall. Whatever the outcome, I'm sure the project will find a palatable solution that is most beneficial to the project above all else, as it should. Me? I'm not even worried how much 'pending credit' I have racked up. As far as I'm concerned, as long as the data is valuable to them, I'm glad I helped out. I will be rewarded eventually and I've got tons of patience. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.