Task Deadline Discussion

Author	Message
Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1905675 - Posted: 8 Dec 2017, 19:27:12 UTC - in response to Message 1905575. As Raistmer has said, if you reduce the deadline you automatically increase the number of resends... It seems to me that there's only a very tiny grain of truth to that. Again, using my data, out of 901 tasks still hanging around past the 80% mark, only 23 were eventually validated. The other 878 had to be resent no matter what. Those do not increase the number of resends. Actually they do. Both never returning hosts as I said earlier _and_ those who regularly miss deadline. If you trash 1 task per 3 week it's 3 times less than if you trash 1 task each week. For year you will get 3 times more resends from such hosts. + increase in such hosts number just due to increasing in processing power required to finish in time. @The other 878 had to be resent no matter what.@ you consider this as one time event, but actually it should be considered as recurring event! I'm sorry, Raistmer, but I think your argument is flawed. For one thing, I think there's a fundamental difference between tasks being resent, which is what happens to all timed out tasks, and tasks being replaced on the hosts where the originals timed out, which only happens some of the time. There are several identifiable categories of hosts which miss the deadlines. 1) "Drive-bys": Those users who visit the project because something they've read somewhere makes it seem interesting. They sign up, download a bunch of tasks, then immediately change their minds for some reason, and just simply drive away, leaving all their downloaded tasks to eventually time out. The timed-out tasks will never be replaced. A reduction in deadline times would simply clear out those WUs and their wingmens' pendings that much sooner. 2) Hosts that have died, or been abruptly replaced, or simply been shut down by established users for any number of reasons: Again, like in group 1, timed out tasks will not be replaced. Shorter deadlines will help. 3) Hosts with "ghosts": Those who have accumulated some number of tasks which still exist in the DB but, for whatever reason, no longer reside on the host and are unknown to the scheduler. These tasks have already been replaced on the host, but they can't be resent to other hosts until they time out, thus tying up DB space and resources unnecessarily. Shorter deadlines would certainly help with these, but so, in some cases, would more responsible users, or periodic temporary reactivation of the "lost" task resend mechanism, or even a web site method for users to trigger either resending or abandonment of such tasks. 4) Hosts who steadily download large numbers of tasks without successfully completing any (such as the one with 6,000+ in the queue that I noted in earlier posts): These hosts are likely generating ghosts, so the daily downloads constitute a continuous, immediate replacement stream, long before the task deadlines are ever reached. Shorter deadlines would have a significant impact, though better quota management and some sort of user notification would be even more important areas to address. 5) Hosts making sporadic contact: There are some with high turnaround times who also seem to be able to download an excessive number of tasks. They still seem to be doing some processing, occasionally returning batches of completed tasks, but they also have large numbers of time-outs. Reduced deadlines may or may not have an impact on those hosts, whose issues are probably better addressed through improved quota management. 6) Hosts with temporary issues: There are certainly some who miss their deadlines for a short while, then resume normal processing. Perhaps the users went on extended vacations or business trips, took a while to replace a failed part in their PC, needed to cut their electric bill for awhile, etc. In some cases, shorter deadlines may cause tasks on these hosts to time out once they resume operations, even though those tasks might still be successfully completed. I think these hosts represent a small portion of the user base, but certainly shouldn't be ignored in the discussion. 7) Hosts running multiple projects: Depending on resource shares, and BOINC's ability to manage same, some of these hosts seem to push the deadline limits time and again, sometimes even exceeding them. I don't think there's a clear answer as to whether, or how, shorter deadlines might affect these hosts. I tend to believe that many of them will still achieve "just in time" task completion no matter what the deadlines, simply because of the way the resource shares get managed and BOINC's tendency to resort to high priority execution to just barely squeak by. Certainly a category that might warrant further research. 8) Slow, low-volume, low-power, and/or part-time hosts: "Slow" is a relative term in today's processing environment. While Android devices have typically been held up as poster children for the "Seti@home for even the lowliest" cruncher argument, I didn't see them as an issue in the samples I posted earlier, but perhaps they can be found. And older, slower, inefficient hosts can realistically only be accommodated by the project for so long. Reference my earlier post about my Win98 machine. I think, though, based on my sample analysis at the beginning of this discussion, that these hosts really aren't currently an issue. However, if one of their tasks does time out then, yes, a replacement will be sent, so shortening deadlines could have a negative impact. Same for the part-timers. I've probably overlooked a category or two but, from my perspective (and based on the examples from my earlier posts), shorter deadlines will result in speedier resends, little, if any, increase in replacement turnover, and, above all, a more streamlined and efficient DB. ID: 1905675 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1905887 - Posted: 9 Dec 2017, 9:45:51 UTC - in response to Message 1905675. Last modified: 9 Dec 2017, 9:56:49 UTC Well, just simple example: Lets consider 2 hosts, one is missing deadline on regular basis, another one doing work OK. Lets arbitrary set (just for simplicity) deadline to 6 days and 2 days (long and short) and turnaround time for good host of 1 day. Long deadline case: Day 1: bad - 1, good -1 ( number is the number of records DB makes to account for sent WU) DB size:2 Day 2: bad - 0, good -1 DB size:2 Day 3: bad-0, good -1 DB size: 2 Day 4: bad-0, good-1 DB size: 2 Day 5: bad-0, good-1,DB size: 2 Day 6: bad-0, good-1, DB size: 2 [Day 7: bad-1, good-1 (maybe, resend from bad one but doesn't matter, each sent WU should have own record). DB size: 3 cause resend needs day to be processed, so DB holds 3 records: expired, new for bad, resend for good ] So, total number of records made in DB for 6 days is 1+6=7 Mean size of DB for days 2-7 ( 6 days, day 1 isn't recurring one so out of averaging) is 2 and 1/6 Now short deadline case: Day 1: bad-1, good-1 DB size:2 Day 2: bad-0, good-1 DB size:2 Day 3: bad-1, good-1 (can be same task as was on bad prev days but no matter, separate record about new WU instance) DB size:3 Need to note here, that old record for bad host will still remain n this day cause good one spends day to process task. So, on day 3 we actually have 3 simultaneous records: expired for bad, newly sent for bad and resend for good. Day 4: bad-0,good-1 (here DB shrinks to "normal" size of 2 records ) DB size:2 Day 5: bad-1, good-1 DB size:3 Day 6: bad-0, good-1 DB size:2 [Day 7: bad-1, good-1 DB size:3] So, total number of records made in DB for 6 days is 3+6=9 Mean size of DB for days 2-7 ( 6 days, day 1 isn't recurring one so out of averaging) is 2 and 1/2 =2.5 As one can see, short deadline case has increased both DB load and average storage size. Where the flaw? SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1905887 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22803 Credit: 416,307,556 RAC: 380	Message 1905888 - Posted: 9 Dec 2017, 9:53:51 UTC Thanks for enumeting the difference between long and short deadlines. I'll stop trying to put together a logical argument now that you have provided a worked example that so clearly shows the issues. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1905888 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1905894 - Posted: 9 Dec 2017, 10:25:24 UTC - in response to Message 1905675. Last modified: 9 Dec 2017, 10:25:46 UTC ime. There are several identifiable categories of hosts which miss the deadlines. I'm out of time currently so will make detailed consideration of these categories later. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1905894 ·

BetelgeuseFive Volunteer tester Send message Joined: 6 Jul 99 Posts: 158 Credit: 17,117,787 RAC: 19	Message 1905913 - Posted: 9 Dec 2017, 12:56:00 UTC - in response to Message 1905629. A bad example to choose - that cruncher is in a classical "drive by". It has only contacted the server on the day of its creation, and has only tasks from that date, so there is virtually no way of stopping such events. What would be more interesting would be to see an example of a computer that contacts regularly, grabs loads of tasks, and returns very few valid results within the tasks deadlines because those are the "real menaces". Perhaps a couple from my earlier posts would be better. HostID: 8261239 has an average turnaround of 38.07 days, yet has 95 tasks on board at the moment (down from 107 when I posted 2 days ago). His timeouts have climbed to 32. Appears to be a host which only makes sporadic contact, yet still manages to download a quantity of tasks far in excess of its ability to process them in a timely fashion. HostID: 6122802 has 6,148 tasks on board, with 361 recently timed out. It probably hasn't actually successfully processed a task in a long time, yet still is allowed to download more than a hundred new tasks every day. Addressing hosts such as these requires looking at different issues than just task deadlines, but shortening task deadlines would likely at least reduce the number of essentially dead tasks they would be sitting on at any given time. Both hosts you mention (especially the second) have a very low RAC. Decreasing the maximum number of tasks for such hosts seems like a way to reduce the problems they are causing. What I don't understand is how the second host was able to get so many tasks. Some kind of rescheduling ? Tom ID: 1905913 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1905968 - Posted: 9 Dec 2017, 18:45:04 UTC - in response to Message 1905887. Where the flaw? I think there are two, actually. The first is assuming that the "bad" host is always sent a new task to replace each one that times out. I believe that happens only in a limited number of circumstances, which I tried to enumerate in my earlier post. In the case of a "ghost", that replacement is sent long before the deadline, anyway. The second, I think, is the 24-hour purge cycle for validated WUs. In your example, even a 6-day deadline is an extremely short one, compared to either the existing deadlines or my proposed reduction. Having a purge cycle that represents such a high percentage of the deadline window likely inflates your DB occupancy numbers. Since I found that a deadline of slightly over 48 days was the average in my sample, a 1 day purge cycle would represent about 2.1% of that average. So, if you reduce the 48 days to 6, wouldn't it also be necessary to reduce the 24-hour purge cycle to about 3 hours, and use that for both your "long" and "short" deadline calculations? To carry that further, my recommendation for a 20% reduction in deadlines would make a 4.8 day "short" deadline more appropriate than the 2 day one in your example, as well. ID: 1905968 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1905971 - Posted: 9 Dec 2017, 18:59:25 UTC - in response to Message 1905913. What I don't understand is how the second host was able to get so many tasks. Some kind of rescheduling ? Tom It's likely all those tasks are "ghosts", which the servers think were successfully sent to that host but which never completely made it into the BOINC task list on the host. BOINC probably currently shows no tasks in progress. Therefore, BOINC keeps requesting new tasks to fill the bucket, but from BOINC's perspective the bucket remains empty. Since those tasks eventually are recorded as errors when they time out, the quota management system should, theoretically, be able to throttle such a host, but it doesn't. One of the problems that I can see in this case is that, since the host is running stock, there are multiple usable CPU and GPU apps in play, with separate "max tasks per day" limits for each, and the scheduler just keeps trying several of them every day. As was suggested in an earlier post by Wiggo, one possible cause of "ghosts" like this may be an overly aggressive AV program, possibly combined with some twists added by Win10. If the user isn't monitoring BOINC, he's likely not even aware that anything is wrong. ID: 1905971 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1905987 - Posted: 9 Dec 2017, 20:20:27 UTC - in response to Message 1905968. Last modified: 9 Dec 2017, 20:23:15 UTC The first is assuming that the "bad" host is always sent a new task to replace each one that times out. Most of time. Either it's a 1 task per day host with such low overhead that we don't speak about it here at all. The second, I think, is the 24-hour purge cycle for validated WUs. In your example, even a 6-day deadline is an extremely short one, compared to either the existing deadlines or my proposed reduction. Having a purge cycle that represents such a high percentage of the deadline window likely inflates your DB occupancy numbers. Since I found that a deadline of slightly over 48 days was the average in my sample, a 1 day purge cycle would represent about 2.1% of that average. So, if you reduce the 48 days to 6, wouldn't it also be necessary to reduce the 24-hour purge cycle to about 3 hours, and use that for both your "long" and "short" deadline calculations? No matter what relative numbers are. With any numbers short deadline in my example will consume more DB resources than long one. And it's not purge cycle, it's computation cycle for good host. Purge considered to be immediate after good result return for validation. Adding non-zero purge only increase the difference between long and short cases (cause inflated DB will last longer). @would make a 4.8 day "short" deadline more appropriate than the 2 day one in your example, as well.@ numbers in my example completely arbitrary, just to show in numbers effect of deadline shortening. I think one can get proportion with more real numbers if one wants. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1905987 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1906005 - Posted: 9 Dec 2017, 21:54:27 UTC - in response to Message 1905675. Last modified: 9 Dec 2017, 21:58:00 UTC I admit that my example doesnâ€™t cover all deadline misses. Lets consider separate categories: 1) "Drive-bys": Those users who visit the project because something they've read somewhere makes it seem interesting. They sign up, download a bunch of tasks, then immediately change their minds for some reason, and just simply drive away, leaving all their downloaded tasks to eventually time out. The timed-out tasks will never be replaced. A reduction in deadline times would simply clear out those WUs and their wingmens' pendings that much sooner. Not falls in â€œmy caseâ€. Deadline reduction will not change DB load (by load I understand DB records change per second) cause this load depends from quota and drive-away hosts creation rate solely. But will decrease average DB size (cause expired WUs will be eliminated faster). But much more effective way to make improvement here is quota management improvement (to reduce number of initial WUs available for such hosts). This will affect both components: decrease DB load and decrease DB average size. So, deadline change is not right way to get improvement here. 2) Hosts that have died, or been abruptly replaced, or simply been shut down by established users for any number of reasons: Again, like in group 1, timed out tasks will not be replaced. Shorter deadlines will help. Not â€œmy caseâ€. Quota management doesnâ€™t help here. Shortening deadline will not decrease DB load but will decrease average DB size. 3) Hosts with "ghosts": Those who have accumulated some number of tasks which still exist in the DB but, for whatever reason, no longer reside on the host and are unknown to the scheduler. These tasks have already been replaced on the host, but they can't be resent to other hosts until they time out, thus tying up DB space and resources unnecessarily. Shorter deadlines would certainly help with these, but so, in some cases, would more responsible users, or periodic temporary reactivation of the "lost" task resend mechanism, or even a web site method for users to trigger either resending or abandonment of such tasks. If ghosts expired by missing deadline and host accumulates ghosts regularly itâ€™s â€œmy caseâ€. 4) Hosts who steadily download large numbers of tasks without successfully completing any (such as the one with 6,000+ in the queue that I noted in earlier posts): These hosts are likely generating ghosts, so the daily downloads constitute a continuous, immediate replacement stream, ~~long before the task deadlines are ever reached~~. Shorter deadlines would have a significant impact, though better quota management and some sort of user notification would be even more important areas to address. â€œMy caseâ€. Deadline shortening will have big impact, but negative one â€“ look example. Note: if â€œlong before the task deadlines are ever reachedâ€ then it out of consideration regarding deadlines. I consider only ghosts timed out by deadline of course. Others are subject to quota management change thread. 5) Hosts making sporadic contact: There are some with high turnaround times who also seem to be able to download an excessive number of tasks. They still seem to be doing some processing, occasionally returning batches of completed tasks, but they also have large numbers of time-outs. Reduced deadlines may or may not have an impact on those hosts, whose issues are probably better addressed through improved quota management. If host misses deadline on timely fashion, itâ€™s â€œmy caseâ€. 6) Hosts with temporary issues: There are certainly some who miss their deadlines for a short while, then resume normal processing. Perhaps the users went on extended vacations or business trips, took a while to replace a failed part in their PC, needed to cut their electric bill for awhile, etc. In some cases, shorter deadlines may cause tasks on these hosts to time out once they resume operations, even though those tasks might still be successfully completed. I think these hosts represent a small portion of the user base, but certainly shouldn't be ignored in the discussion. Not â€œmy caseâ€. Deadline shortening will reduce chances for such hosts to recover so I expect negative overall impact. 7) Hosts running multiple projects: Depending on resource shares, and BOINC's ability to manage same, some of these hosts seem to push the deadline limits time and again, sometimes even exceeding them. If deadline missed itâ€™s missed on regular basis so itâ€™s â€œmy caseâ€. 8) Slow, low-volume, low-power, and/or part-time hosts: "Slow" is a relative term in today's processing environment. Itâ€™s â€my caseâ€ too. So, out of 8 listed typed of deadline miss the clear positive impact from deadline shortening could be achieved only on type 2. On most of other impact will be negative. And I think share of type 2 deadline misses canâ€™t account for deadline change. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1906005 ·

ausymark Send message Joined: 9 Aug 99 Posts: 95 Credit: 10,175,128 RAC: 0	Message 1906049 - Posted: 10 Dec 2017, 0:07:25 UTC I'm going to suggest what I think might be a simple system that may help matters. Caveat: I have no idea how seti currently allocates WU's so please excuse any ignorance on my part. Firstly some assumptions: 1) How fast a host is only indication of its potential processing ability 2) That host may only be on an hour or two per day, or may be running other BOINC tasks 3) No matter how fast a host is its real performance is how many WU it uploads per day 4) Below I talk about an "Issue total" which is the total number of outstanding WU that should be issued to a host. i.e. a host should not be processing more WU's than defined by the Issue Total. 5) In the below proposal I will use some 'constants' that can be tweeked to suit seti better - these 'constants' are given so that the example can be made. I will mark these constants in BOLD to highlight them There are two situations that I am addressing with this, in the hope that it will improve the other issues mentioned throughout this thread. Situation One: Existing Host Firstly there should be a 30 day average of WU processed per day. This allows for changes in operation and/or new hardware being introduced onto the host. To be calculated once per day. Note: Only the average needs to be stored, not an array of daily WU.s Secondly the Issue Total of WU issued to a host should: * For a host processing at least 1 WU per day: not be greater than three times the 30 day daily average. This allows both a buffer for outages and hardware improvements. The deadline for these tasks should be something like ten days . Due to the daily average being used this is equivalent to the host taking 10 times longer to process work units. e.g. if the 30 day daily average is 40, then total number of WU issued to a host would be 40 x 3 = 120 WU. * For a host processing less than 1 WU per day: issue a total of no more than 3 WU with a 60 day deadline. Situation Two: New Host The problem here is that there is no 30 day average to draw throughput data from. To fix this I propose a simple solution. Firstly set the 30 day average at a figure of 3 and send the first 3 WU. Give a deadline of say 30 days . For the first 48 hours of processing for every returned WU issue 3 more WU's - this will build up an initial buffer. Once the 48 hours is over use that WU throughput as the new 30 day daily average figure (i.e total WU in 48 hours divided by 2 to give average daily figure). The host is now ready to go onto the main WU distribution system above. If no WU are returned in the first 48 hour period set 30 day average figure to 2. This allows the host to still ramp up WU throughput if it was off to a slow start, it also reduces the number of WU allocated to slow hosts. If the 30 day average ever drops below 0.33, then only issue 2 WU, and the Issue Total should be set to no less than 2 with a 60 day deadline. So does any of the above have merit and is it simple enough to be easily implemented? Cheers Mark ID: 1906049 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1906054 - Posted: 10 Dec 2017, 0:25:24 UTC - in response to Message 1906005. Somehow, I'm not getting my point across about reduction in DB size, and I'm running out of ideas about how to present it better, but I honestly don't see how shortening deadlines could not help but achieve that goal. Perhaps someone else could present the case better, but I'll take one more brief crack at it. I do see, though, that we have different ideas about what constitutes "DB load". I'm looking at it from the standpoint of the overhead of DB access, whether that's updates, queries, backups, or the like. All of those should have reduced overhead when there are fewer entries, and correspondingly smaller indexes, in the DB. You've referred to DB load as "DB records change per second", which I see as only part of the equation. Now, I do agree that better quota management could ultimately prove to have a greater impact than shorter deadlines, but the two are not mutually exclusive. It's just that the current long deadlines appear to be unnecessary and shortening them should require much less effort than attacking the quota management issue, at least if it's approached as a simple across-the-board percentage reduction. If somebody has the willingness, skill, and time to have a go at it, then I'm all for it! So, out of 8 listed typed of deadline miss the clear positive impact from deadline shortening could be achieved only on type 2. On most of other impact will be negative. No, in the absence of any changes to quota management, the first 4 of my categories would absolutely benefit from shorter deadlines, and I believe those are the 4 that are responsible for the highest percentage of timeouts. Just looking at that small sample of 20 examples that I posted earlier, at least 12 out of the 20 fall into categories 2 through 4. The sample is too small for a truly valid statistical analysis (hence the absence of any true "drive-bys" in the sample), but I think it should still be at least generally representative. (If anyone else would like to take a crack at extracting a larger sample, the full list of timeouts for my October wingmen is available here I'm kind of getting burned out on the manual analysis.) In addition, category 7 should also see improvement, but whether shorter deadlines might increase timeouts for that group is uncertain, so I consider that one to be up in the air. To some extent, that may also be the case for Category 5. Let me address what I think is one additional flaw in your argument, although it probably overlaps a bit with what I've said before about "resent" tasks versus "replacement" tasks. You make the case that each time a task has to be resent, it adds to the DB, I would ask...How? The host that receives that task is not receiving it as an extra task, on top of any newly split tasks it receives. The receiving host still only gets a given number of tasks for any given time frame (assuming optimal scheduler performance). They could conceivably be all newly split or all resends but are, of course, most commonly some mix of the two. Redistributing those timed out tasks does not increase the number of tasks that any given host, or the project as a whole, can process, any more than newly split tasks do. Nor does it increase the DB size. Now, let me try using that host with 6,000+ tasks to see if I can better explain why shorter deadlines would help with that category of hosts. I believe the same logic applies to the more general "ghost" category and others, as well. At the moment, that host currently has 6,248 tasks listed. Of those, 5,780 are shown as "In progress" while 468 are already "Errors" (all timed out as far as I can see). If we can assume that tasks are timing out at about the same 48.85-day average deadline that I identified in my first post in this thread (feel free to come up with a more specific average for this host, if anyone so chooses), then those "In progress" tasks are timing out at the rate of about 118 per day, likely about the same rate that new ones are being downloaded. So, what happens if the average deadline is shortened by 20%, to 39.08 days? The host still will be downloading about 118 new tasks per day, but a shorter deadline will only permit it to accumulate about 4,624 tasks before they start timing out. Since every "In progress" task represents a Workunit which averages, IIRC, about 2.1 tasks per WU, that should make for a DB reduction of 1,156 WUs and about 2,427 task entries in the DB for this one host alone. Admittedly, this is an extreme case, but I've run across a number of those over time and the cumulative effect of many more less extreme cases is likely to be even more significant. Now, would better quota management result in an even bigger benefit to DB size? Yes, yes, yes.....but what would be the relative effort to achieve it versus a simple reduction in deadline length? ID: 1906054 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13950 Credit: 208,696,464 RAC: 304	Message 1906055 - Posted: 10 Dec 2017, 0:26:45 UTC - in response to Message 1906049. Last modified: 10 Dec 2017, 0:56:49 UTC 3) No matter how fast a host is its real performance is how many WU it uploads per day Make that Validates. Some host upload a lot of results, but they don't validate. Firstly there should be a 30 day average of WU processed per day. This allows for changes in operation and/or new hardware being introduced onto the host. To be calculated once per day. Note: Only the average needs to be stored, not an array of daily WU.s The Average turnaround time for each application actually gives a good indication of how long it takes to return work. Grant Darwin NT ID: 1906055 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1906065 - Posted: 10 Dec 2017, 0:50:22 UTC OK, I revisit to an old subject ... Turn resend lost tasks back on. I remember it was said it was causing excess database load which is why it is off, but forget what the bottle neck was - hardware or software. Would SSDs in the array help or is it the overall system that can't handle the throughput? Same goes for the daily, monthly, yearly stats and other things. They just keep shutting things off as the database size increases. So what is needed to fix the real problems with the huge database? ID: 1906065 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13950 Credit: 208,696,464 RAC: 304	Message 1906068 - Posted: 10 Dec 2017, 1:05:12 UTC - in response to Message 1906065. Last modified: 10 Dec 2017, 1:07:43 UTC Would SSDs in the array help Hell yes. The biggest issue as I understand it is a storage I/O (Input Output) bottleneck. When the system comes up after an outage, it takes about 2-3 hours for the splitters to really get going- in the past Eric (or was it Matt?) has said this is due to the time it takes for the disk caches to re-cache the data. If we could replace the current HDD storage with AFA (All Flash Arrays) those I/O bottlenecks would (effectively) cease to exist and the time for the replica to catch up, the splitters to get up to speed etc would be reduced to bugger all IMHO. Of course over time the next bottleneck would then become the network backbone (if it's not already 10GB/s or better) and the servers accessing the storage. But that's the case with any system- removing one bottleneck will always expose the next... Grant Darwin NT ID: 1906068 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13950 Credit: 208,696,464 RAC: 304	Message 1906069 - Posted: 10 Dec 2017, 1:05:35 UTC Last modified: 10 Dec 2017, 1:05:53 UTC Jeff's figures show us that the huge majority of work is returned well within the deadline period (98.9% are returned within 50% of the deadline period), so the deadlines don't need to be as long as they are. The longer a WU is waiting on a result, the longer it sits there in the Master database before the result can be moved to the Science database and the WU removed from the Master database. The very fact that the number of WUs a host can have at any given time has been limited by the project in order to reduce the Master database load shows that the sooner a WU is processed, validated & assimilated to the Science database, the sooner if can be removed from the Master database- thus reducing the load on the Master database. Jeff's figures show that the vast majority of the work is returned within 11.2 days (89.8%. within 0-5% of Allowed turnaround time) so you could have deadlines based on a given application's Average turnaround time, but that wouldn't give any leeway for client issues, or Seti server issues. The fact that the Seti deadlines are so long, and other projects are so short means that many of the systems that have multiple projects take so long to return Seti work, because Seti effectively gives the other project priority. With a 4 week maximum deadline even more work would be returned sooner than it is now. And with the shorter deadlines resulting in work being returned sooner with little (if any) increase in the need for re-issuing work, the load on the Master database would be further reduced. Grant Darwin NT ID: 1906069 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13950 Credit: 208,696,464 RAC: 304	Message 1906070 - Posted: 10 Dec 2017, 1:13:41 UTC - in response to Message 1906054. You make the case that each time a task has to be resent, it adds to the DB, I would ask...How? It depends how the database is setup. If it just changes a value in an existing field, then the only effect on the load would be indexing (if it's an indexed field). The would be no change in the number of records so no change in the size of the database. If it requires a new record to be added, then that would add to the database size, as well as a re-indexing load (on multiple fields, not just a single one). Grant Darwin NT ID: 1906070 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1906072 - Posted: 10 Dec 2017, 1:23:42 UTC - in response to Message 1906070. Last modified: 10 Dec 2017, 1:32:29 UTC You make the case that each time a task has to be resent, it adds to the DB, I would ask...How? It depends how the database is setup. If it just changes a value in an existing field, then the only effect on the load would be indexing (if it's an indexed field). The would be no change in the number of records so no change in the size of the database. If it requires a new record to be added, then that would add to the database size, as well as a re-indexing load (on multiple fields, not just a single one). Well, a resent task really is a new task, so it requires a separate entry, BUT.....it doesn't add to the DB any more than a newly split task does. And if there's a resend ready to go, it just delays the need for another new one to be split. EDIT: Keep in mind that the actual WU data, the files that we download, are stored outside the DB in those "fanout" folders, so a task resend would, I assume, simply point to an existing file for the WU. Which brings to mind another point. As long as a task is being held hostage by a laggardly host, that WU data file is still taking up disk space. ID: 1906072 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13950 Credit: 208,696,464 RAC: 304	Message 1906075 - Posted: 10 Dec 2017, 1:32:19 UTC - in response to Message 1906072. You make the case that each time a task has to be resent, it adds to the DB, I would ask...How? It depends how the database is setup. If it just changes a value in an existing field, then the only effect on the load would be indexing (if it's an indexed field). The would be no change in the number of records so no change in the size of the database. If it requires a new record to be added, then that would add to the database size, as well as a re-indexing load (on multiple fields, not just a single one). Well, a resent task really is a new task, so it requires a separate entry, BUT.....it doesn't add to the DB any more than a newly split task does. And if there's a resend ready to go, it just delays the need for another new one to be split. True. Hence I don't see reducing the deadlines resulting in any increase in database load, only decreasing it. That's why they've got the 100WU server side limit, to reduce the number of records in the database at any given time. Even without any fancy allowances for faster/slower hosts- a maximum 4 week deadline would improve things significantly. Systems that can't meet that deadline won't get those WUs, but they can still get WUs with shorter deadlines. And the BOINC manager will be able to juggle the new shorter Seti deadlines with the deadlines for other projects on systems that do multiple projects & will eventually settle down again. Grant Darwin NT ID: 1906075 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1906076 - Posted: 10 Dec 2017, 1:33:08 UTC - in response to Message 1906075. See my belated edit to my last post, as well. ID: 1906076 ·

ausymark Send message Joined: 9 Aug 99 Posts: 95 Credit: 10,175,128 RAC: 0	Message 1906079 - Posted: 10 Dec 2017, 1:36:15 UTC - in response to Message 1906055. 3) No matter how fast a host is its real performance is how many WU it uploads per day Make that Validates. Some host upload a lot of results, but they don't validate. >>> Agreed ;) Firstly there should be a 30 day average of WU processed per day. This allows for changes in operation and/or new hardware being introduced onto the host. To be calculated once per day. Note: Only the average needs to be stored, not an array of daily WU.s The Average turnaround time for each application actually gives a good indication of how long it takes to return work. >>> the issue with using WU average turn around time is that it is something that happens after potentially lots of unneeded WU's are allocated to a host, especially true for slow hosts. I dont know how/if seti uses turnaround time atm, if it does how is it used to allocate buffers and determine WU deadlines? Cheers :) ID: 1906079 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.