What is up with this "High Priority"?

Author	Message
Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 893666 - Posted: 11 May 2009, 15:55:19 UTC Last modified: 11 May 2009, 15:58:11 UTC I have a 2 CPU host that is processing AP work units on both CPUs. It is running BOINC 24x7, and has network access 24x7 as well. One work unit is 23% done, and estimated to finish in almost 69 hours (2.875 days). The other work unit is 6% done and is estimated to finish in about 83 hours (3.416 days). Both of these work units are due in about 9 days on 20 May, 2009 at 7:27:02 AM (PDT). The first one (23% done) is running "High Priority" while the second one (6% done) is running normally. It seems to me that if BOINC thinks that it may not be able to complete the first one (23% done) before the due date without "High Priority" then it certainly should also be running the second one (6% done) with "High Priority". [edit]By the way, this host is using version 6.6.20[/edit] ID: 893666 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 893679 - Posted: 11 May 2009, 17:04:49 UTC Last modified: 11 May 2009, 17:06:36 UTC This is what happens when you run excessively high cache settings, even if you have 'rocket' for a host. Just take a peek at your average turnaround times for them. There probably really isn't a deadline problem in your case since you only run SAH, but BOINC just can't assume that. Alinator ID: 893679 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 893688 - Posted: 11 May 2009, 17:54:43 UTC - in response to Message 893679. This is what happens when you run excessively high cache settings, even if you have 'rocket' for a host. Just take a peek at your average turnaround times for them. There probably really isn't a deadline problem in your case since you only run SAH, but BOINC just can't assume that. Alinator Thanks, but this is not the case here. I have set my cache at 1 day, however there may be a bug in the calculation of "One day's work" because my other host, which has 4 CPUs and a GPU will get hundreds of CUDA tasks at once, until it reaches the 500/day limit, in spite of the 1 day cache. I have since turned off S@H so that my CUDA-capable host will not get any more CUDA tasks until its queue is depleted somewhat. Run only the selected applications: SETI@home Enhanced: no Astropulse: yes Astropulse v5: yes The host in question here is not a "rocket". It has 5 tasks, 3 in the queue and 2 running. As you pointed out, I am running only S@H, so the "High Priority" is not a problem for me. However, there seems to be an inconsistancy in the behavior of the code when a task that has less work left to do is a candidate for "High Priority" while another task with the same due date and more work left to do is not. If I were running other projects, then that normal priority task could be preempted by another project and thus might be unfinished at the deadline. Someone might want to look at the code to see if there are bugs in the calculation of "One day's work" and of "High Priority". ID: 893688 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 893699 - Posted: 11 May 2009, 19:30:13 UTC Last modified: 11 May 2009, 19:33:26 UTC OK.... I couldn't see what your cache setting are, so I only had the turnaround times to go on. Yes, there have been some issues with excessive work fetching in some of the newer CC's. That could easily explain the turnaround times currently showing. Also, many reports from other users of the later CUDA compatible CC's indicates that work scheduling on multi-core/multi compute resource hosts leaves a lot to be desired (a deliberate understatement). So I guess that pretty well explains the 18 day turnaround for the CUDA host, based on the other info you just posted. One thing I find curious about the other one though is why it took it 21 days to turnaround the two reported AP tasks it has showing. If there was some kind of problem which lead to the CC thinking the app was running much slower than it really was, this may have caused a big jump in TDCF, which is now making the CC think it's in deadline trouble when it really isn't and thus forced the jump to HP/EDF. Alinator ID: 893699 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 893708 - Posted: 11 May 2009, 20:34:29 UTC - in response to Message 893666. It seems to me that if BOINC thinks that it may not be able to complete the first one (23% done) before the due date without "High Priority" then it certainly should also be running the second one (6% done) with "High Priority". [edit]By the way, this host is using version 6.6.20[/edit] BOINC looks at all of the work, and it checks to see if it could run over deadlines, taking into account the connection interval, the "% on" time, the duration correction factor, etc. ... and it decides from there what work needs to be done in what order. Usually, if things are really, really odd (if the time to complete doesn't make sense, or if it is rushing when it clearly shouldn't) it means that something like the "% on" or the DCF is badly wrong. The simple fix is to do nothing: as time passes they will correct back to their proper values. ID: 893708 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 893715 - Posted: 11 May 2009, 20:51:02 UTC - in response to Message 893708. Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. The part which still has me scratching my head though is the reason for the 21 day turnaround for the completed AP tasks on the 6300. That seems kind of strange for a 24/7 SAH only host. ;-) Alinator ID: 893715 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 893721 - Posted: 11 May 2009, 21:08:34 UTC - in response to Message 893715. Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. The part which still has me scratching my head though is the reason for the 21 day turnaround for the completed AP tasks on the 6300. That seems kind of strange for a 24/7 SAH only host. ;-) Alinator Yet the run time, with the stock app and nothing odd in the stderr_txt, is about 93 hours - much as we would expect. Run 93 hours, elapsed 504 hours - what was the CPU up to for the other 80% of the time? ID: 893721 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 893769 - Posted: 12 May 2009, 0:16:01 UTC BOINC does a round robin simulation between projects and FIFO within a project. It counts the tasks that are close to running over deadline. It then runs that many starting from the earliest deadline for the project(s) that have tasks in deadline trouble. The computation deadline is earlier than the report deadline. Computation_deadline = report_deadline - (work_buf_min_queue + task_switch_interval). work_buf_min_queue is set through "Computer is connected to the Internet about every", and task_switch_interval is set through "Switch between applications every ". The setting "Computer is connected to the Internet about every" is used by those with non-permanent connections to tell the client how frequently the computer will be able to get to the internet. If the task is not completed before the last connection prior to the report deadline, the work will be reported late and could be rejected as worthless. The task switch interval is also subtracted as that is the minimum guranteed interval that BOINC will check to see if tasks need to run. If a task has only a few seconds of time left, the round robin simulator may wait until not much longer than that before the deadline to start it running. BOINC WIKI ID: 893769 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 893813 - Posted: 12 May 2009, 2:01:04 UTC - in response to Message 893715. Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. The part which still has me scratching my head though is the reason for the 21 day turnaround for the completed AP tasks on the 6300. That seems kind of strange for a 24/7 SAH only host. ;-) Alinator If it's having trouble (virus scans affecting the checkpoint files, or who knows what else), then maybe the time/performance metrics are not wacky, and the real problem needs to be fixed. ID: 893813 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 893815 - Posted: 12 May 2009, 2:02:15 UTC - in response to Message 893721. Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. The part which still has me scratching my head though is the reason for the 21 day turnaround for the completed AP tasks on the 6300. That seems kind of strange for a 24/7 SAH only host. ;-) Alinator Yet the run time, with the stock app and nothing odd in the stderr_txt, is about 93 hours - much as we would expect. Run 93 hours, elapsed 504 hours - what was the CPU up to for the other 80% of the time? What happens to the time between a successful checkpoint and an awkward shutdown? That time would rather effectively disappear, wouldn't it? ID: 893815 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 893816 - Posted: 12 May 2009, 2:03:02 UTC To answer some of the questions brought up (in reverse chronological order). These are the stats for the 6300, except as noted. work_buf_min_queue = 0 days. task_switch_interval = 60 min, however this will not happen due to no other projects being run. Run 93 hours, elapsed 504 hours - what was the CPU up to for the other 80% of the time? = No idea, since it is 24x7 on AP and little else. Maybe it is due to the backlog of AP tasks. When one AP task takes 93 hours, and the host can do 2 at a time, if there were a bunch in the queue then it would take days or even weeks for it to get around to starting to process some of the last tasks. At this time the queue is down to 3. There may be a problem with getting the last one in before its deadline, but the rest will be OK, baring a power failure. the "% on" time = just about 100% cache setting = 1 day Task duration correction factor = 0.319843 Task duration correction factor for the Q6600 = 3.899101 (but I know that this is inflated as it results in estimated time to completion for AP tasks at around 800 hours which is clearly about a factor of 10 too high. With TDCF = 0.291835 on the Q6600, the AP tasks are estimated at very close to their actual time. ID: 893816 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 893825 - Posted: 12 May 2009, 2:23:18 UTC By the way, if you have looked at the tasks in progress for the Cuda host, you will see more than 2,000 tasks. However, many of those will never be returned by that host because they disappeared when I was reinstalling the NVidia control panel. Presently there are less than 1100 tasks in the directory c:\Documents and Settings\All Users\Application Data\boinc\projects\setiathome.berkeley.edu ID: 893825 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 894067 - Posted: 13 May 2009, 1:01:31 UTC - in response to Message 893816. To answer some of the questions brought up (in reverse chronological order). These are the stats for the 6300, except as noted. work_buf_min_queue = 0 days. task_switch_interval = 60 min, however this will not happen due to no other projects being run. Run 93 hours, elapsed 504 hours - what was the CPU up to for the other 80% of the time? = No idea, since it is 24x7 on AP and little else. Maybe it is due to the backlog of AP tasks. When one AP task takes 93 hours, and the host can do 2 at a time, if there were a bunch in the queue then it would take days or even weeks for it to get around to starting to process some of the last tasks. At this time the queue is down to 3. There may be a problem with getting the last one in before its deadline, but the rest will be OK, baring a power failure. the "% on" time = just about 100% cache setting = 1 day Task duration correction factor = 0.319843 Task duration correction factor for the Q6600 = 3.899101 (but I know that this is inflated as it results in estimated time to completion for AP tasks at around 800 hours which is clearly about a factor of 10 too high. With TDCF = 0.291835 on the Q6600, the AP tasks are estimated at very close to their actual time. What are the rest of the time stats? BOINC WIKI ID: 894067 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 894241 - Posted: 13 May 2009, 17:18:56 UTC - in response to Message 894067. What are the rest of the time stats? <time_stats> <on_frac>0.997709</on_frac> <connected_frac>1.000000</connected_frac> <active_frac>0.999936</active_frac> <last_update>1242234991.490408</last_update> </time_stats> ID: 894241 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 894429 - Posted: 14 May 2009, 0:30:42 UTC - in response to Message 894241. What are the rest of the time stats? <time_stats> <on_frac>0.997709</on_frac> <connected_frac>1.000000</connected_frac> <active_frac>0.999936</active_frac> <last_update>1242234991.490408</last_update> </time_stats> OK, there should be cpu_efficiency someplace. BOINC WIKI ID: 894429 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 894528 - Posted: 14 May 2009, 7:47:36 UTC - in response to Message 894429. OK, there should be cpu_efficiency someplace. Not since v6.4.2 - scheduling (par. 4), [trac]changeset:16610[/trac]. Do try to keep up, John, please ;-) ID: 894528 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 894748 - Posted: 14 May 2009, 21:28:53 UTC Interesting ... the "High Priority" AP work unit has finished, and a new one started. The formerly normal running work using is now "High Priority" and the new one is running normal... Go figure... ID: 894748 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 894752 - Posted: 14 May 2009, 21:34:06 UTC With the recent completion of one AP wu, I was able to recalculate the time to finish the wu in my current queue. The last one will be done in about 6.1 days, and it is due at 11pm on the 6th day... Then it will be time for the computer to rest... :^D ID: 894752 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 894769 - Posted: 14 May 2009, 22:11:30 UTC - in response to Message 894752. With the recent completion of one AP wu, I was able to recalculate the time to finish the wu in my current queue. The last one will be done in about 6.1 days, and it is due at 11pm on the 6th day... Then it will be time for the computer to rest... :^D 6 days is an awfully long time. Have you thought of trying the Optimised App? F. ID: 894769 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 894842 - Posted: 15 May 2009, 3:05:15 UTC - in response to Message 894528. OK, there should be cpu_efficiency someplace. Not since v6.4.2 - scheduling (par. 4), [trac]changeset:16610[/trac]. Do try to keep up, John, please ;-) Somehow I was under the impression that it was separated out in the project. In anycase, there SHOULD be a CPU efficiency. I have a scenario where the current code fails dramatically. BOINC WIKI ID: 894842 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.