Message boards :
Number crunching :
Initial tasks in Work Unit just gone down to TWO
Message board moderation
Author | Message |
---|---|
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
The initial allocation of TWO tasks to a work unit has just been received: http://setiathome.berkeley.edu/workunit.php?wuid=145930257 So, it's a YES/No situation after both are completed, then what? If only one more task is issued on a NO result, the average time for completion of work units is going to increase considerably where one of the 2 tasks is a plodder, or is part of an oversized cache. At least with an initial 3 tasks, the likelihood of drawing 2 poor performers is far less likely for the other host involved. Keith |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
The initial allocation of TWO tasks to a work unit has just been received: Well, Matt or Eric had hinted that an initial issue of 2 was in the works, and it seems that it may have come to pass. This is a good thing for Seti, because it allows those 'plodders' to contribute valid science, rather that just trailing results for credits. And that is what the DC concept was about in the first place, that even the slow computers could still contribute something valid to the sum total science of the project. The down side (and it is really not a down side in the long term) is that for the faster crunchers like myself, pending credits are likely to rise as we are waiting for results to be returned by the slower computers out there. They will come home sooner or later. It may also cause an increase in 'claimed credit' hits, because your result is now forced to wait for that 'plodder', who is more likely to be running a pre-flops-counting client, rather than being 'saved' by another fast cruncher that gets their result in before the slow host does. But this is really all about what is best for the project, not about granting credits, so I say "Jolly Good". "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
MSattler I use the term "plodders" in a wider sense to include static hosts and those hosts with excessively large caches where many of the tasks hold up the work unit right up to their report deadline. It is admirable if SETI might be working more efficiently, but I have doubts that the YES/NO situation with 2 initial is that good an idea when there are at present so many work units requiring a 3rd success to finish (due to results of tasks not being "sufficiently similar". Let's just hope all goes well. I do not see that there is (up 'til now) any problem with the plodders in the sense you are mentioning. As soon as the work units(s) with 2 successes are removed from a plodder's cache, a replacement is made with a fresh task from another work unit. He has a reduced chance of being 1st starter in the work unit, but so what. He will pretty well always be 2nd starter and consequently will almost certainly go straight for validation on successfully completing the task. But, as you say, the effect on the project as a whole is more important. Let's hope the build up of pending credits does not become unbearable (whether that's for SETI or for us as participants). I am interested to see what happens in the "NO" situation. ONE more or TWO more tasks? Have got that task running at present at 25% and am looking for a success in 3 hours 30 minutes and await the other host's result. Keith |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
MSattler I think the answer to your question lies in the other parameters in the WU issue.... max # of error/total/success results 5, 10, 5 In other words, if there are more than 5 errors the WU is ditched, there will be up to 10 total issues, and no more than 5 results will be acknowledged. So, if the first 2 issues do not validate with each other, up to 10 will be issued until there is a quorum (of 2). And if somehow, enough issues are made and work in progress does not cancel them, up to 5 results will be accepted as valid and issued credit. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
MSattler Yes, I had worked that out for myself. Will ONE or TWO new tasks be issued after the first 2 are successes, but are not closely similar? I suspect that, if it's only one, there will be several long delays (i.e. pending tasks) in work unit completion. We shall see. Just another 2 hours 30 minutes and I may see the answer! Keith |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
Will ONE or TWO new tasks be issued after the first 2 are successes, but are not closely similar? I suspect that, if it's only one, there will be several long delays (i.e. pending tasks) in work unit completion. We shall see. Just another 2 hours 30 minutes and I may see the answer! If wu fails validation due to "no consensus yet", it waits for any results still in progress. If there's no results still in progress, 1 replacement is generated and sent-out. If fails validation then replacement in, 1 more is sent-out, and this is repeated until wu either validated or errored-out. BOINC-projects has the opportunity to send any re-issue only to "reliable" computers, a known "good" computer with a minimum credit, rac and a short turnaround-time, and re-issue deadline can be made shorter. Not sure if SETI@home uses this... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
Will ONE or TWO new tasks be issued after the first 2 are successes, but are not closely similar? I suspect that, if it's only one, there will be several long delays (i.e. pending tasks) in work unit completion. We shall see. Just another 2 hours 30 minutes and I may see the answer! Ingleside The ONE ONLY task issue option is what I have been expecting, and fear that this will extend considerably the pending list, barring the possible controls you mention. From past performances, I would doubt that SETI has been "fast tracking" any of the work units in the way that you say is possible. Maybe the need for doing so may now be overwhelming. At least, the slowest end of the scale should be avoided in issuing that single extra task. Just half an hour to completion for my task in the "TWO ONLY" work unit, and no sign of completion by my "co-host", who has another task to complete first, I believe. Keith |
W-K 666 Send message Joined: 18 May 99 Posts: 19075 Credit: 40,757,560 RAC: 67 |
A few days ago, in another thread (615586)I looked at the results for my Pent M. I assume most other computers will show similar results. But it showed that for all the workunits over 2 days old less than 4.5% of results had not been returned, but none had yet reached deadline, by days or weeks. Extrapolating this data to 2/2 then less that 9% will be slow to return, assuming no other changes, and at max. a possible doubling of pending. The advantage of 2/2 is that the data will be crunched quicker, the multibeam work is from 7 beams each with separate horizontal and vertical polarisation compared to a single beam now, so there could be a lot more work. And the slower computers will be contributing to the science, rather than a backup for the other two in case one fails. |
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
A few days ago, in another thread (615586)I looked at the results for my Pent M. I assume most other computers will show similar results. But it showed that for all the workunits over 2 days old less than 4.5% of results had not been returned, but none had yet reached deadline, by days or weeks. WinterKnight I think your assumption that most other computers have similar results may be somewhat hopeful, and the few that hold up the rest bear a disproportionately large effect. Nevertheless, be that or be that not, the probable most efficient way of running the 2/2 scheme would be to "stream" the hosts into bands of similar RAC, and pair off initial tasks within the new work unit to 2 hosts within the same band. And possibly also issue 3rd and subsequent tasks as "fast-tracking" such as is suggested by Ingleside. This way, any delay in validation will be minimized, which should be a win-win situation for everyone (i.e. SETI, fast crunchers, slow crunchers and all shades between). Keith |
W-K 666 Send message Joined: 18 May 99 Posts: 19075 Credit: 40,757,560 RAC: 67 |
A few days ago, in another thread (615586)I looked at the results for my Pent M. I assume most other computers will show similar results. But it showed that for all the workunits over 2 days old less than 4.5% of results had not been returned, but none had yet reached deadline, by days or weeks. I have briefly looked at other computers and don't think my computers figures are too far out. On pairing computers, it is probably not RAC that should be used but 'Average turnaround time' that should be used. This would overcome the number of cpu's problem RAC would introduce. Andy |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
...the probable most efficient way of running the 2/2 scheme would be to "stream" the hosts into bands of similar RAC, and pair off initial tasks within the new work unit to 2 hosts within the same band. I think average turnaround would be a better parameter for matching hosts. A computer with a huge queue but high RAC causes as much delay as a plodder with short queue. Other projects are pairing by CPU capabilities for Homogeneous Redundancy, so this should be theoretically possible. But it may impose enough additional load on the backend that it wouldn't be practical for this project. Joe |
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
...the probable most efficient way of running the 2/2 scheme would be to "stream" the hosts into bands of similar RAC, and pair off initial tasks within the new work unit to 2 hosts within the same band. Joe & WinterKnight I'm glad we are thinking along similar lines. I absolutely agree about the "turn around time". Would it be possible to put that forward as a practical suggestion? Keith |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
MSattler You are making an erroneous assumption here when it come to plodders. Once a result is started it will run to completion regardless of whatever else happens. The only exception is the unconditional abort if the host runs over the deadline and a quorum is already formed. Therefore, when trailers are issued by default, any host which takes longer than the average turnaround for any reason runs a high risk of wasting it's time running a trailer. As you pointed out, it doesn't matter whether the reason for 'plodding' is a fast host running 20 projects with equal reource share and a 10 day cache, or a slug running SAH only with a 0.01 day cache. A trailer, is a trailer, is a trailer and is still just as useless and a waste of power. In the case of the slug, it means that it will in all likelyhood be the trailer, regardless of what you try to do to workaround it (except for just not running SAH at all, that is). In any event, so what if a result stays pending for awhile (even a long while)? It has no long term effect on any 'competitive' performance metric. The bottomline here is that in the BOINC framework one should never expect to have credit granted sooner than the deadline for the WU, possibly longer. Also, IMHO, using HR or Locality Scheduling for 'throughput modification' purposes is not a legitimate use of that functionality, and can create other new problems which don't currently exist in SAH. For example, EAH uses Locality scheduling, and it created some problems at the end of the S5R1 run when it came time to 'cleanup' straggling work for some of the template frequencies. PAH uses HR, and it was not uncommon during the previous runs to run out of work on some lesser deployed platforms because there wasn't any new work split for it at the time. Finally, since the nature of the SAH analysis doesn't require HR, LS, or any other feature from a science viewpoint, grouping and sending work according to how fast machines are or their average turnaround time introduces the possibility that subtle problems between different platforms might go undetected longer. Alinator |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
I don't think there is any need to put more strain on the servers just because some people don't want to wait before they get there credit. You may have to wait a few more days before you get what is due but so what? You can't take it to a shop and buy anything with it. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
The reason tasks are sent to more than one computer is to detect errors:
|
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
MSattler Just put you right on that, Alinator. I did not make the assumption you infer. I well knew that once a task has started crunching, it will run to a finish provided that is within the report deadline. My reference was to the redundant tasks, which, on balance, get replaced on the point of redundancy by a task from another work unit. The net effect of this is that the redundancies have virtually no detrimental effect on the availability of tasks to start crunching. I can see that in the 3/2 system any 3rd task started (plodder or otherwise) before the first 2 give a validation, will be wasted effort so far as the project is concerned, and this will be avoided in the 2/2 system. It doesn't matter a lot, but I thought I'd just point that out. I'm sure you're right on the rest, but I did not understand much of it with all the abbreviations. I'm still a little hazy on these closely and not so closely similar results and how that will operate in the 2/2 system. Soon it will all be clear, no doubt. Keith |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Apologies on that then. I guess the discussion boils down to what is more important. To ensure to the highest degree possible that every result run by a host actually performs a useful scientific function, or whether a certain percentage of all work should be relegated to the 'useless' category by default, just so the WU can be cleared faster. <edit> Undefined Acronyms: EAH = Einstein at Home PAH = Predictor at Home HR = Homogenous Redundancy LS = Locality Scheduling Alinator |
Sirius B Send message Joined: 26 Dec 00 Posts: 24879 Credit: 3,081,182 RAC: 7 |
I guess the discussion boils down to what is more important. Robert Goddard worked on rocketry in the 1920's The Germans advanced his worked in the 1940's (Werner Von Braun) The Americans worked on rockets from 1940's to date (captured Von Braun) July 1969!!!!!! What a fantastic month! What has this to do with S@H crunching? Easy, everyone continued working until that day in 1969 - 50 years after Goddard started his work. Admittingly, the credit league tables look good & it is nice to see your name climb up that table, but the main goal is to get THAT result regardless the speed of the machines doing the work. I am looking forward to expanding my network with a quad core & also looking forward to the release of Windows Home Server (home & business reasons). However, I will not be building these systems ahead of time just so that I can increase my standing in the league. They will be built as & when required (finances permitting), & once built, I will have S@H installed on both. |
Philadelphia Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0 |
I guess the discussion boils down to what is more important. You were really doing well 'till the next part, the Linux guys are not happy, lol I am looking forward to expanding my network with a quad core & also looking forward to the release of Windows Home Server (home & business reasons). |
Natronomonas Send message Joined: 13 Apr 02 Posts: 176 Credit: 3,367,602 RAC: 0 |
Since you get the credit in the end, so what if one result is delayed a week; if it happened to every WU, you'd still end up with the same R-average-C. This is partly why the RAC concept exists is it not? If you're doing enough WU, it all balances out; the people doing fewer WU may notice a bit more short-term volatility in credit granting, but you're still getting the same credit over a month, year etc... Crunching SETI@Home as a member of the Whirlpool BOINC Teams |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.