Message boards :
Number crunching :
What is up with this "High Priority"?
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0 ![]() |
I have a 2 CPU host that is processing AP work units on both CPUs. It is running BOINC 24x7, and has network access 24x7 as well. One work unit is 23% done, and estimated to finish in almost 69 hours (2.875 days). The other work unit is 6% done and is estimated to finish in about 83 hours (3.416 days). Both of these work units are due in about 9 days on 20 May, 2009 at 7:27:02 AM (PDT). The first one (23% done) is running "High Priority" while the second one (6% done) is running normally. It seems to me that if BOINC thinks that it may not be able to complete the first one (23% done) before the due date without "High Priority" then it certainly should also be running the second one (6% done) with "High Priority". [edit]By the way, this host is using version 6.6.20[/edit] |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 ![]() |
This is what happens when you run excessively high cache settings, even if you have 'rocket' for a host. Just take a peek at your average turnaround times for them. There probably really isn't a deadline problem in your case since you only run SAH, but BOINC just can't assume that. Alinator |
![]() ![]() Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0 ![]() |
This is what happens when you run excessively high cache settings, even if you have 'rocket' for a host. Thanks, but this is not the case here. I have set my cache at 1 day, however there may be a bug in the calculation of "One day's work" because my other host, which has 4 CPUs and a GPU will get hundreds of CUDA tasks at once, until it reaches the 500/day limit, in spite of the 1 day cache. I have since turned off S@H so that my CUDA-capable host will not get any more CUDA tasks until its queue is depleted somewhat. Run only the selected applications: SETI@home Enhanced: no Astropulse: yes Astropulse v5: yes The host in question here is not a "rocket". It has 5 tasks, 3 in the queue and 2 running. As you pointed out, I am running only S@H, so the "High Priority" is not a problem for me. However, there seems to be an inconsistancy in the behavior of the code when a task that has less work left to do is a candidate for "High Priority" while another task with the same due date and more work left to do is not. If I were running other projects, then that normal priority task could be preempted by another project and thus might be unfinished at the deadline. Someone might want to look at the code to see if there are bugs in the calculation of "One day's work" and of "High Priority". |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 ![]() |
OK.... I couldn't see what your cache setting are, so I only had the turnaround times to go on. Yes, there have been some issues with excessive work fetching in some of the newer CC's. That could easily explain the turnaround times currently showing. Also, many reports from other users of the later CUDA compatible CC's indicates that work scheduling on multi-core/multi compute resource hosts leaves a lot to be desired (a deliberate understatement). So I guess that pretty well explains the 18 day turnaround for the CUDA host, based on the other info you just posted. One thing I find curious about the other one though is why it took it 21 days to turnaround the two reported AP tasks it has showing. If there was some kind of problem which lead to the CC thinking the app was running much slower than it really was, this may have caused a big jump in TDCF, which is now making the CC think it's in deadline trouble when it really isn't and thus forced the jump to HP/EDF. Alinator |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
BOINC looks at all of the work, and it checks to see if it could run over deadlines, taking into account the connection interval, the "% on" time, the duration correction factor, etc. ... and it decides from there what work needs to be done in what order. Usually, if things are really, really odd (if the time to complete doesn't make sense, or if it is rushing when it clearly shouldn't) it means that something like the "% on" or the DCF is badly wrong. The simple fix is to do nothing: as time passes they will correct back to their proper values. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 ![]() |
Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. The part which still has me scratching my head though is the reason for the 21 day turnaround for the completed AP tasks on the 6300. That seems kind of strange for a 24/7 SAH only host. ;-) Alinator |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. Yet the run time, with the stock app and nothing odd in the stderr_txt, is about 93 hours - much as we would expect. Run 93 hours, elapsed 504 hours - what was the CPU up to for the other 80% of the time? |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 ![]() |
BOINC does a round robin simulation between projects and FIFO within a project. It counts the tasks that are close to running over deadline. It then runs that many starting from the earliest deadline for the project(s) that have tasks in deadline trouble. The computation deadline is earlier than the report deadline. Computation_deadline = report_deadline - (work_buf_min_queue + task_switch_interval). work_buf_min_queue is set through "Computer is connected to the Internet about every", and task_switch_interval is set through "Switch between applications every ". The setting "Computer is connected to the Internet about every" is used by those with non-permanent connections to tell the client how frequently the computer will be able to get to the internet. If the task is not completed before the last connection prior to the report deadline, the work will be reported late and could be rejected as worthless. The task switch interval is also subtracted as that is the minimum guranteed interval that BOINC will check to see if tasks need to run. If a task has only a few seconds of time left, the round robin simulator may wait until not much longer than that before the deadline to start it running. ![]() ![]() BOINC WIKI |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. If it's having trouble (virus scans affecting the checkpoint files, or who knows what else), then maybe the time/performance metrics are not wacky, and the real problem needs to be fixed. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
Yep, I was thinking along the lines of wacky Time/Performance Metrics or TDCF playing role here. What happens to the time between a successful checkpoint and an awkward shutdown? That time would rather effectively disappear, wouldn't it? |
![]() ![]() Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0 ![]() |
To answer some of the questions brought up (in reverse chronological order). These are the stats for the 6300, except as noted.
|
![]() ![]() Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0 ![]() |
By the way, if you have looked at the tasks in progress for the Cuda host, you will see more than 2,000 tasks. However, many of those will never be returned by that host because they disappeared when I was reinstalling the NVidia control panel. Presently there are less than 1100 tasks in the directory c:\Documents and Settings\All Users\Application Data\boinc\projects\setiathome.berkeley.edu |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 ![]() |
To answer some of the questions brought up (in reverse chronological order). What are the rest of the time stats? ![]() ![]() BOINC WIKI |
![]() ![]() Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0 ![]() |
<time_stats> <on_frac>0.997709</on_frac> <connected_frac>1.000000</connected_frac> <active_frac>0.999936</active_frac> <last_update>1242234991.490408</last_update> </time_stats> |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 ![]() |
OK, there should be cpu_efficiency someplace. ![]() ![]() BOINC WIKI |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
OK, there should be cpu_efficiency someplace. Not since v6.4.2 - scheduling (par. 4), [trac]changeset:16610[/trac]. Do try to keep up, John, please ;-) |
![]() ![]() Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0 ![]() |
Interesting ... the "High Priority" AP work unit has finished, and a new one started. The formerly normal running work using is now "High Priority" and the new one is running normal... Go figure... |
![]() ![]() Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0 ![]() |
With the recent completion of one AP wu, I was able to recalculate the time to finish the wu in my current queue. The last one will be done in about 6.1 days, and it is due at 11pm on the 6th day... Then it will be time for the computer to rest... :^D |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 ![]() |
With the recent completion of one AP wu, I was able to recalculate the time to finish the wu in my current queue. The last one will be done in about 6.1 days, and it is due at 11pm on the 6th day... Then it will be time for the computer to rest... :^D 6 days is an awfully long time. Have you thought of trying the Optimised App? F. ![]() |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 ![]() |
OK, there should be cpu_efficiency someplace. Somehow I was under the impression that it was separated out in the project. In anycase, there SHOULD be a CPU efficiency. I have a scenario where the current code fails dramatically. ![]() ![]() BOINC WIKI |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.