Aborted Units...Any solutions... |
![]() |
| log in |
Message boards : Number crunching : Aborted Units...Any solutions...
1 · 2 · 3 · Next
| Author | Message |
|---|---|
|
In the past ten days or so I have aborted at least 10 units that were failing to compute correctly. As the units were being worked on the completion time was getting longer. Many, many hours of computer time were wasted which of course doesn't help the US NAVY TEAM in overcoming the Air Force team lead in total credits. | |
| ID: 453732 · | |
In the past ten days or so I have aborted at least 10 units that were failing to compute correctly. As the units were being worked on the completion time was getting longer. Many, many hours of computer time were wasted which of course doesn't help the US NAVY TEAM in overcoming the Air Force team lead in total credits. Of the three computers I looked at, two are running BOINC version 5.2.13, and the third is 5.4.9. Current version is 5.4.11, with a 5.8.x version on the horizon. Have you considered upgrading to the most current production verion? MJ ____________ | |
| ID: 453761 · | |
In the past ten days or so I have aborted at least 10 units that were failing to compute correctly. As the units were being worked on the completion time was getting longer. Many, many hours of computer time were wasted which of course doesn't help the US NAVY TEAM in overcoming the Air Force team lead in total credits. Have you tried just exiting the program and then restarting it as opposed to aborting the units? This is has worked for many people in the past. And what do you mean by "failing to compute correctly"? ____________ | |
| ID: 454789 · | |
|
" Failure to computer correctly"... I saw a work unit having completed over 12 hours of computer time and then noticed that the expected completion time was increasing instead of decreasing. At this point I aborted the unit. In one case I can think of even thought the unit had over 10 hours of computer time the percent of completion was only .312 percent. Other than this I really don't know how to correctly persent the arguement that the unit was not computing correctly. | |
| ID: 454913 · | |
|
PS: I will next time exit the program and restart it, following your advise and see what happens. Then perhaps as some else suggested get the latest BOINC version on all my machines. | |
| ID: 454914 · | |
|
I failed to mention that today I had to abort another two units and I believe all this aborted unit were on machine that had what you call DUO 2 processors. | |
| ID: 454915 · | |
I failed to mention that today I had to abort another two units and I believe all this aborted unit were on machine that had what you call DUO 2 processors. At looking at some of your results under 1 of your computers, I'm seeing your having compute error's on work units that others crunch successfully with a -9 error, which is a too much noise work unit. I don't know if anyone can go further with this bit of info, but maybe it might help. Jeremy ____________ | |
| ID: 454922 · | |
" Failure to computer correctly"... I saw a work unit having completed over 12 hours of computer time and then noticed that the expected completion time was increasing instead of decreasing. At this point I aborted the unit. In one case I can think of even thought the unit had over 10 hours of computer time the percent of completion was only .312 percent. Other than this I really don't know how to correctly persent the arguement that the unit was not computing correctly. There is a quirk of the CPU scheduler that can cause the expected time to complete to be increasing, even though the task is making progress. The project provides an initial estimate of the time to process. The BOINC client uses this estimate and the fraction complete in a weighted average with a calculation based on the time spent and the fraction complete. As the task progresses in fraction complete, BOINC weights the calculation more toward what is really happening rather than what was originally estimated to happen. So, if the original estimate was high, then the time left can increase throughout the entire run of the task. The duration correction will also be increased to match this longer than estimated result so that the next result will hopefully have a somewhat better original estimate. Some reasons for the design. Some projects do not have accurate estimates of work to complete. The actual processing time is heavily dependent on the configuration of the host computer. Fraction complete as reported does not match the actual fraction of processing time completed (at least for some projects). Some projects only update fraction completed occasionally, and some not at all. Some projects have tasks that exit much earlier that normal (noisy WUs in S@H). Best idea is to actually let a couple of these run to completion to see what happens. Some of the S@H results can run for a few days before completion. ____________ BOINC WIKI | |
| ID: 454952 · | |
|
Just finished updating my computers t0 the latest BOINC version (5.4.11). Will let you all know if this works. | |
| ID: 455620 · | |
|
Downloaded the latest version of BOINC to all my machines. In the past two days I have only aborted one unit that was not computing correctly. I believe the completion time was about 2 hours (getting many of those)and after about 12 hours of computing the completion time was increasing. | |
| ID: 464734 · | |
|
Yank, | |
| ID: 466977 · | |
|
Thanks for the information John, the next time it happens I will try that procedure. | |
| ID: 467358 · | |
|
I've been away from my computers for awhile (Thanksgiving, test activities at work, etc) so I haven't been looking at my BOINC statistics lately. Once before I left for the holidays, I noted (on my new dual-core Athlon system) that one work unit was at 12+ hours and the time to complete was increasing. I let it go for another 24 hours and nothing changed except the time to complete still increased. So I aborted. I though nothing of it. | |
| ID: 468202 · | |
|
FYI - | |
| ID: 468209 · | |
FYI - A "SETI@Home Informational message -9 result_overflow" is not aborted by BOINC, S@H just recognized it as too noisy. Why such WUs seem prone to hanging is an unsolved question. The resultid link or even wuid link is far more useful than work unit number. Joe | |
| ID: 468312 · | |
|
LOL.... I'd almost go as far as saying infinitely, at least as far as basic troubleshooting on the forums goes. :-) | |
| ID: 468324 · | |
|
I started seeing this problem last year around the beginning of December when I got the first hyperthread cpu. I have 2 hyperthread cpu's running Seti now, one is a P4 3.20 Prescott the other a P4 3.06 Prescott. every 3 or 4 days one or the other machines hang like this. I exit out of Boinc and restart, check the tasks tab and the hung wu ends within 30 seconds and uploads. | |
| ID: 468440 · | |
|
Ummm, same here. | |
| ID: 468700 · | |
|
Today I had a unit that had ran for 23 hours + with the completion time of 1 hour + to go and it was increasing in time. It started with a reported time of 8 hours to completion. I suspended the operation for that unit. Sometime later I canceled the suspendion and later found that the unit reported completed at 8 hours and a few minutes. | |
| ID: 468742 · | |
|
Just aborted three units that were not computing correctly. I first suspended the units and let them rusume there computing two times but that did not correct the problem. If this would help the units are 05jn0aa.29008.33056.315906.3.220-3 and also 223-2 and also 3.226-2. As in all these cases the completion time was increasing instead of decreasing. | |
| ID: 469189 · | |
Message boards : Number crunching : Aborted Units...Any solutions...
| Copyright © 2013 University of California |