Problem: Computers with far too many tasks to possibly ever finish them by the deadline


log in

Advanced search

Questions and Answers : Wish list : Problem: Computers with far too many tasks to possibly ever finish them by the deadline

Author Message
Wiyosaya
Send message
Joined: 19 May 99
Posts: 39
Credit: 801,970
RAC: 8
United States
Message 696433 - Posted: 1 Jan 2008, 5:04:57 UTC

OK, so I don't see a general problem area, but I think it time that I pointed this out.

Several of my completed tasks are in status "pending" for credit. Normal, I understand, however, some of the computers that are doing the verification tasks have far too many tasks for them to ever possibly complete any task by the assigned deadline.

Take, for instance, the following computers:

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4042091
Has 66 tasks and an average turn around time of some 16 days.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=3144766
Has 296 tasks with an average turn around time of 12.89 days

http://setiathome.berkeley.edu/show_host_detail.php?hostid=3139204
Has 196 tasks with an average turn around time of 6.01 days

http://setiathome.berkeley.edu/show_host_detail.php?hostid=2884773
Has 65 tasks with an average turn around time of 22.28 days.

Hopefully, I have made a point with these example computers, however, I could easily provide at least several more example computers if necessary.

Here is what I noticed. Unless I limited via preferences disk space to 0.5 GB, one of my computers would download far too many tasks for it to ever complete them all.

Given this, and the three example computers (of which I could easily provide several more), I think this is a bug, and it would be great if someone looks into this.

Thanks.

____________

Aurora Borealis
Volunteer tester
Avatar
Send message
Joined: 14 Jan 01
Posts: 2975
Credit: 4,957,831
RAC: 1,389
Canada
Message 696445 - Posted: 1 Jan 2008, 6:28:38 UTC
Last modified: 1 Jan 2008, 6:48:39 UTC

The turn around numbers does not necessarily indicate that they can't complete the work on time. It only shows they are held in their queue for a long time. Some Seti WU have a due date of as much as 2 months. Boinc sees to it that all WU are crunched and results returned by due date. Boinc is design to prioritize the work when needed so that if some WU are tight to their deadline that they get crunched first and on time. While a project is in this high priority mode Boinc will not get new work from that project.

As you can see from my signature, I run many projects. The projects I crunch for have WU that take everything from a few minutes to thousands of hours to complete. Some have very tight deadlines, while other have extremely long ones. I've never return work late, even when the projects had grossly under estimated crunch time needed. In one case a project had under estimated the crunch time it would take by a factor of 10 and it still manage to sneak in before the due date. Boinc is quite good at handling all kinds of extreme situations. A user would have to heavily manipulate (i.e. suspend WU to force extra DLs) or drastically reduce the amount of time devoted to crunching to push Boinc into a situation it couldn't handle.
____________
Questions? Answers are in the "Unofficial" BOINC Wiki.

Boinc V7.0.27
Win7 i5 3.33G 4GB, GTX470

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13542
Credit: 29,400,151
RAC: 15,967
United States
Message 696481 - Posted: 1 Jan 2008, 13:22:04 UTC - in response to Message 696433.

Here is what I noticed. Unless I limited via preferences disk space to 0.5 GB, one of my computers would download far too many tasks for it to ever complete them all.


In addition to what Aurora said, I'd like to add that the disk space setting is the last thing looked at when downloading work for a host. Variables that go into downloading a queue of work include: the amount of time the system is running, the amount of time BOINC is allowed to run, and the processor efficiency rating (some architectures are more efficient than others, so likewise are able to finish work faster than others at the same clock speed). It is this last reason that simply looking at how big a host's queue isn't a good enough indicator that they won't finish in time. Their CPU may be more efficient and likewise are able to processor more work which in turn means a larger queue of work downloaded.
____________

Wiyosaya
Send message
Joined: 19 May 99
Posts: 39
Credit: 801,970
RAC: 8
United States
Message 696573 - Posted: 1 Jan 2008, 21:05:54 UTC
Last modified: 1 Jan 2008, 21:17:22 UTC

Aurora, do you run your computers constantly? Given that you are subscribed to many different BOINC projects, it is difficult to determine if you run them constantly or not based on your average seti WU processing time.

If you do run them constantly, do you know if anyone has run a test case that determines whether BOINC accurately calculates times for computers that are not on constantly? If the case has not been tested, then it may be prudent for someone to run such a test case.

For the machine of mine that I spoke of, it is not on constantly (unlike my SETI Classic days), and deadlines were passing before my machine got started on the distributed WUs. When this happens, it is extra work load for servers, and if it happens frequently, it is an inefficient usage of computing power. (Pardon me for pointing that out; I'm just trying to be helpful.)

Though I cannot point it out with a specific example (since when a WU is completed the stats for those WUs expire very quickly), I have seen computers (exhibiting similar symptoms as the ones I cited) that co-process the same WUs as my machines expire the deadline for the co-WU, and then the co-WU would be processed quickly by a more "well-behaved" machine. I have seen this more than once.

OzzFan, though disk space may be looked at as one of the least of priorities for work load, it is probably one of the only parameters that a user can control to determine the number of tasks distributed to a their machine. The only other one might be the number of CPUs on a multi-cpu machine.

I'll bump up the disk space again in my preferences, and see how my machines fare. If what I have observed persists, I'll post again to this thread. Perhaps this is no longer an issue.

It occurs to me from the tone of your replies that I may not have correctly emphasized the issue as I see it. The issue as I see it is any particular machine not having enough time to complete all its tasks when multiplying the number of pending tasks * the average turn-around time. For example, multiply the average turn-around time by the pending tasks for the machines I posted. Take machine 4042091, for instance. It has (or had) 66 tasks with an average turn-around time of ~16 days. 66 * 16 = 1056 days. This is far greater than the deadline for any singular WU. If the task load were appropriate for the machine, I would expect to see the total number of days required for the number of pending tasks * the average turn-around time nearly equal to the deadline of the oldest WU.

Also note that it can reasonably be assumed that for the machines I cited, that they are either not on all the time, or participate in other projects since the average turn-around time is in days. A machine on all the time that processes SETI all the time would likely have an average turn-around time of fractional days unless its CPU power were low. Most of my machines process WUs in 6 hours or less. If I left mine on all the time, they would have average turn-around times of less than one day. And when I say "well-behaved" computer, those seem to be machines that have an average turn-around time in fractional days.

The common factor in cases where I have seen other machines exhibit this and in my own machines seems to be that the are not on all the time, or participate in other projects such that they do not process SETI all the time. Put another way, it seems to only be machines that do not process SETI all the time that exhibit this.

EDIT -

I may be mistaken about the disk space preferences parameter, perhaps it is the preferences parameter "Maintain enough work for an additional". I'm leaving my disk space settings as they are, and changing "Maintain enough work for an additional" to the maximum of 10.
____________

Aurora Borealis
Volunteer tester
Avatar
Send message
Joined: 14 Jan 01
Posts: 2975
Credit: 4,957,831
RAC: 1,389
Canada
Message 696979 - Posted: 3 Jan 2008, 10:30:38 UTC - in response to Message 696573.
Last modified: 3 Jan 2008, 10:49:18 UTC

Aurora, do you run your computers constantly? Given that you are subscribed to many different BOINC projects, it is difficult to determine if you run them constantly or not based on your average seti WU processing time.

My system run 24/7/365. Turn around time has no meaning in this context. Seti currently get 1.83% share of my resource which explains why my RAC is only 35 even with 2 system crunching. My turnaround time is normally varies between 2 and 10 days depending on how long the work sits in my cache, how fast it can be crunched and what other work is owed more time. My Duron system just got a 35 hr Seti WU due 24 Feb. It also has a QMC WU that still needs 105 hrs and is now due in 12 days. The turn around on that Seti WU will likely be around 20 to 30 days, perhaps longer depending what other project is due extra time. In any case Boinc will get to it eventually just as it is now concentrating on finishing the QMC WU. It so happens that I have other work (about 30 hr.) in my cache with due dates before the QMC. It will all get done on time.
If you do run them constantly, do you know if anyone has run a test case that determines whether BOINC accurately calculates times for computers that are not on constantly? If the case has not been tested, then it may be prudent for someone to run such a test case.

Boinc uses an efficiency number to keep track of % of time a system is on and what % of that Boinc is run. Click the ID # of your computers to see these numbers. As long as a system is use is a fairly consistent fashion (i.e. no long, random periods of inactivity) Boinc will have an adjustment factor for how much work it can handle.
For the machine of mine that I spoke of, it is not on constantly (unlike my SETI Classic days), and deadlines were passing before my machine got started on the distributed WUs. When this happens, it is extra work load for servers, and if it happens frequently, it is an inefficient usage of computing power. (Pardon me for pointing that out; I'm just trying to be helpful.)

With V5.10.x of Boinc has local preferences you can use to override the web based preferences. This allows you to fine tune the preferences on individual systems that need different values such as cache size.
Snipe...
It occurs to me from the tone of your replies that I may not have correctly emphasized the issue as I see it. The issue as I see it is any particular machine not having enough time to complete all its tasks when multiplying the number of pending tasks * the average turn-around time. For example, multiply the average turn-around time by the pending tasks for the machines I posted. Take machine 4042091, for instance. It has (or had) 66 tasks with an average turn-around time of ~16 days. 66 * 16 = 1056 days. This is far greater than the deadline for any singular WU. If the task load were appropriate for the machine, I would expect to see the total number of days required for the number of pending tasks * the average turn-around time nearly equal to the deadline of the oldest WU.

Also note that it can reasonably be assumed that for the machines I cited, that they are either not on all the time, or participate in other projects since the average turn-around time is in days. A machine on all the time that processes SETI all the time would likely have an average turn-around time of fractional days unless its CPU power were low. Most of my machines process WUs in 6 hours or less. If I left mine on all the time, they would have average turn-around times of less than one day. And when I say "well-behaved" computer, those seem to be machines that have an average turn-around time in fractional days.

The common factor in cases where I have seen other machines exhibit this and in my own machines seems to be that the are not on all the time, or participate in other projects such that they do not process SETI all the time. Put another way, it seems to only be machines that do not process SETI all the time that exhibit this.

Your calculation is based on a false premise. This is not a cumulative number. 16 days is the average amount of time a WU stays in is cache which is easily possible with a cache setting of 10-10 and some WU with long due dates. You are assuming that he got all the work at the same time instead of his cache being replenished when he returns work. For argument sake, assume that the WU are all the same size and length of due date and the cache contains a constant 66 WUs. The turn around time is 16 days = 384 hours. He would have to return one WU on average every (384/66) 5.81 hrs. This is quite doable
In reality, processing time could be from 30 min. to 15 hrs on a fairly average computer. Some WU have a 7 day deadline and others can have a 2 month deadline(the one I just got on my Duron system is due in 58 days). Boinc will process the short deadline one first if necessary (priority mode) and process the long ones at leisure. He most likely isn't missing deadlines or his quota would automatically be reduced by one for each missed deadline and so eventually would is work on hand. If the quota is still at 100/day he isn't missing deadlines. That is where you first have to look to see if a system has problems returning work not the average turn around time.


EDIT -

I may be mistaken about the disk space preferences parameter, perhaps it is the preferences parameter "Maintain enough work for an additional". I'm leaving my disk space settings as they are, and changing "Maintain enough work for an additional" to the maximum of 10.

____________
Questions? Answers are in the "Unofficial" BOINC Wiki.

Boinc V7.0.27
Win7 i5 3.33G 4GB, GTX470

Questions and Answers : Wish list : Problem: Computers with far too many tasks to possibly ever finish them by the deadline

Copyright © 2014 University of California