Message boards :
Number crunching :
About Deadlines or Database reduction proposals
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 16 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The way you get 14 million waiting is if a large number of tasks in the field are Resends, with multiple Hosts per Work Unit, waiting on a slow Host to run the task. A shorter Work Cache would help in that scenario....maybe a day or so. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
If you are getting work with a long deadline, isn't that new work?No, the deadline at this project is fixed by the AR of the data. The time allowed for processing - from issue of your copy of the WU to your deadline date - is the same for all volunteers, whether they are processing one of the initial pair of tasks, or a later resend. Which reminds me - BOINC project servers have a configuration option for Accelerating retries, which includes sending out the resends with a reduced_delay_bound [aka deadline]. We could use that now. |
W-K 666 Send message Joined: 18 May 99 Posts: 19407 Credit: 40,757,560 RAC: 67 |
Question. Why is the setting for cache size measured in time, but the restriction on cache size is measured in tasks? This IMHO seems to go against all logic, a slow host, for whatever restriction, hardware or time crunching, to ask for a ten day cache and get more tasks than it will crunch, but at the same time restricts the top hosts with multiple GPU's, that can crunch a task in ~2 mins on each GPU, to 150 tasks shared across the multiple GPU's. Surely it would be better for the project to restrict the max cache size on time rather than tasks. So that the fast host might actually get the 12 hour cache required to cover the outrages, and the slow host will not get the 150 tasks it has no hope of completing before deadline. Another question, which no doubt, the software boffins will say is too difficult to implement now, is: Why is the data contained on the "Computer Details" screen not used to more effect. It describes the CPU, so could have been used in the recent AMD problem. It could also still be in use to solve the Nvidia driver problem, and not issue tasks to drivers on a "banned list" |
rob smith Send message Joined: 7 Mar 03 Posts: 22539 Credit: 416,307,556 RAC: 380 |
Cache size is the lesser of the time selected by the user and the limits set by the server. edit to add: Your second question - technically quite feasible to do, but probably not done as they were seen as a short term problems, thus not worth changing the server code given the constraints on the project staff (not enough of them) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Reducing the deadline will result in some number of current slow computers to cease to contribute to SETI (even slow computers process WUs).Not so. The slowest of systems can do a WU in 2 days. Even if it's not on all day every day, 1 month is plenty of time for it to return work. If it's not, then there really is no point in catering to a system that can contribute so little, when catering to it cause problems such as we re having now with the database. I really don't consider a system that returns 1 WU a month to be contributing, but that's what setting a 28 day deadline will still enable. It will still be able to participate in Seti. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Richard's final comment is one that is well worth remembering in these discussions:But reducing the deadline will (once all the existing outstanding ones have been cleared out, of course). Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Eventually the inactive caches will timeout, and the WUs will be available for resend. ItBut the database can't. That's part of the present problem- work is being sent out, but no result returned. So you have to wait for the deadline to be reached for it to be sent to another system to provide a valid result. Oh, that system didn't provide a result either- wait till the deadline expires again & resend it. Or a system provided a result, that didn't Validate, so it's is sent out again (hopefully this time it doesn't timeout & the result returned validates). If not, do it again. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Which reminds me - BOINC project servers have a configuration option for Accelerating retries, which includes sending out the resends with a reduced_delay_bound [aka deadline]. We could use that now.That sounds like a very good suggestion. Grant Darwin NT |
Wiggo Send message Joined: 24 Jan 00 Posts: 36865 Credit: 261,360,520 RAC: 489 |
Now remember those numbers were based at a time when some here were still using or had farms of old Socket 486-Super 7 back then.By Richard HaselgroveAnd back in 2007 you could still use a pre SSE instruction CPU. ;-) Also by Richard HaselgroveNow why have they doubled since then? As a second thought why is CreditNew so out of wack now? Is it a problem caused by those old pre-SSE CPU's no longer being in the system these days, but were a good part of the factoring in back then? I'm sorry, I'm just starting to see a pattern (though it maybe just my imagination), but I thought that I'd just blurt it out anyway. ;-) Cheers. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
No one seems to have an issue with the shortest WUs deadline being 7 days? On one of my systems that type of WU takes the CPU 30min to process. The longest running WU i've seen on the same system took 1hr 45min (most long running WUs are done in 1hr 30 or so) That's 3.5 times longer than the shortest. 3.5 * 7 gives us 24.5 days if the deadline was in proportion to that for the shortest of a-non noisy WU. Add on another few days as a buffer, and that gives us 28 days as a maximum deadline. Inline with what has been proposed. And as i mentioned in another post, i don't consider a system that returns 1 result in a month to really be contributing anything to the project, but that is what a 28 day deadline will mean as a lower limit of what a system has to be able to process to do work for Seti. As long as it can do 1 WU a month, it can take part in the project. Grant Darwin NT |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Finally found it .... A year ago Eric posted a graph of the age of workunits (tasks) residing on the server RAID giving a good snapshot of the client turn around times. Even those tasks which the graph shows as hanging around for 60 days wouldn't be there if the client had a smaller cache in the first place. Thanks Richard for those server settings. Now if we could just find someone with some time to try different settings and document the results of each. When shrinking cache size, it might be best to do it in 3.5d steps every day, so that we don't get that HUGE outbound of resends that Richard noted would happen. |
W-K 666 Send message Joined: 18 May 99 Posts: 19407 Credit: 40,757,560 RAC: 67 |
Thanks for that graph. A quick look says a 10 day deadline would probably be workable. But if the powers that be want to be cautious then a 25 day deadline would be sufficient. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Thanks for that graph.Even 5 days would be plenty, but the project does want to make it possible for even low powered infrequently used systems to contribute. And crunchers do have issues beyond their control- floods, fires, storms and power, network, system issues even without those other events occurring. So 28 days gives people plenty of time to deal with problems & return work before it times out & and still get Credit for it. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Reducing the deadline will result in some number of current slow computers to cease to . . You are ignoring the fact that it is the database size compared to available memory that is part of the cause of the validation process being so slow, and it is the large number of tasks that are "in progress" that is part of that database bloat. Reducing the amount of WUs sitting for long periods in 'in progress' queues will reduce the size of the database hopefully to a size that will fit in memory available and speed up that validation process. Stephen . . |
rob smith Send message Joined: 7 Mar 03 Posts: 22539 Credit: 416,307,556 RAC: 380 |
You are ignoring the fact that it is the database size compared to available memory that is causing the validation process to be slow This statement is wrong - the validators are more than capable of keeping up with all the data thrown at them - it is the rate of return of work by users that limits the validation rate. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
W-K 666 Send message Joined: 18 May 99 Posts: 19407 Credit: 40,757,560 RAC: 67 |
As I was just having some lunch, I had a look at why there are so many Valid tasks showing in my account. It turns out that nearly 600 out of the total of 1020 it is over 24 hours since they were validated. I didn't check all 600 but didn't see any in the 10% (2/page) that I did look at. So why haven't they been purged, or is it they are part of the 4 million awaiting Assimilation? If everybody's accounts are the same and they are still awaiting Assimilation, the that number should be below 2.5 million. |
rob smith Send message Joined: 7 Mar 03 Posts: 22539 Credit: 416,307,556 RAC: 380 |
We might have a smoking gun - the assimilator queue should be fairly small, certainly not in the millions. Thought - are the assimilators being throttled at the same time as the splitters? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
We might have a smoking gun - the assimilator queue should be fairly small, certainly not in the millions. Thought - are the assimilators being throttled at the same time as the splitters?Mind you, the assimilator queue is subject to yet another constraint - the processing speed of the Science database, as distinct from the BOINC database that we've been discussing up till now. Anything which slows down the science database - like, for example, taking a fresh snapshot for processing with Nebula over at Einstein/ATLAS cluster - will likely bork assimilation for a while. And I do see backroom tweaks being applied to Nebula periodically - it's not dead. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Validation is working just fine but the server status page is displaying the data with slightly misleading labels. Tasks labeled 'Results returned and awaiting validation' are really all the returned tasks that haven't been assimilated yet. Out of those 14 million results labeled as 'Results returned and awaiting validation' about 5 million are results from hosts whose wingmen are still crunching their task corresponding to the same workunit. Only about two thousand are tasks that are ready to be validated but validators haven't got to them yet and 9 milllion are validated tasks that are waiting to be assimilated. The real problem is in assimilation being unable process the results at the rate they are being returned. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
...Anything which slows down the science database - like, for example, taking a fresh snapshot for processing with Nebula over at Einstein/ATLAS cluster - will likely bork assimilation for a while...Is anyone else allowed access to the Science database? Hopefully only a very few are allowed access if it borks the system for a while.... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.