Message boards :
Number crunching :
Work Unit problem
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7
Author | Message |
---|---|
H Elzinga Send message Joined: 20 Aug 99 Posts: 125 Credit: 8,277,116 RAC: 0 |
I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process. I also did this fix and was "rewarded" with a huge increase in computing time. What i notiched was that 1 slow unit (thats all i had until now) could raise the time instantly. Correctly processed units seem to only have a minor influence on scaling the time down. Is this a design fault or a feature of which i fail to see the logic. |
CougarKy Send message Joined: 20 Aug 01 Posts: 5 Credit: 4,076,741 RAC: 1 |
I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process. Thank you for the help. |
Jim-R. Send message Joined: 7 Feb 06 Posts: 1494 Credit: 194,148 RAC: 0 |
I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process. It is a feature. The reason being that the estimated time to completion is supposed to be slightly on the high side so that you don't download a bunch of work that you can't finish before the deadline. This has actually happened. We have had "runs" of various angle ranges which take different times to complete. When we have a run of very short running time angle ranges, the Duration Correction Factor (DCF) will slowly drop. With a long run of these, it can get "used" to the low crunch times and start downloading more work to compensate. Then a "run" of very long running time work may come across. BOINC thinks that these will run approximately the same as the others so it downloads a bunch of them. This results in your computer going into "Earliest Deadline First" (panic) mode ignoring everything else just to get these work units crunched. If the DCF were to decrease immediately upon completing one of these very short running time units, it would immediately download more work and possibly end up in EDF (Earliest deadline first) mode. So it's designed to recover quickly from a low value by jumping immediately to the value of a longer running unit, but decrease slowly from the longer times to shorter ones. Jim Some people plan their life out and look back at the wealth they've had. Others live life day by day and look back at the wealth of experiences and enjoyment they've had. |
H Elzinga Send message Joined: 20 Aug 99 Posts: 125 Credit: 8,277,116 RAC: 0 |
I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process. I See. The client completely unaware of the error assumes this is the first one of a set of similar (long) units. |
Jim-R. Send message Joined: 7 Feb 06 Posts: 1494 Credit: 194,148 RAC: 0 |
Exactly, so it will take a while to get the estimated time back down to normal. That's the reason it was suggested editing the client_state.xml file. Jim Some people plan their life out and look back at the wealth they've had. Others live life day by day and look back at the wealth of experiences and enjoyment they've had. |
HTH Send message Joined: 8 Jul 00 Posts: 691 Credit: 909,237 RAC: 0 |
WU: 147512707. 0.26 cobblestones? Is this correct? The third guy didn't get credit at all. What's wrong? It is the WU that crunched very very slowly. Manned mission to Mars in 2019 Petition <-- Sign this, please. |
bounty.hunter Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0 |
WU: 147512707. The third guy aborted the WU manually. |
mdpagel Send message Joined: 18 Sep 99 Posts: 53 Credit: 2,619,543 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=147603991 this was the first of 3 units that was taking 24 hours to process without actually completing itself. My typical runtime on a unit is 1.5 hrs. I actually had chalked it up to signing up for E@h and getting the executables somehow mangled in the memory of BOINC, so I detached from E@h and deleted my S@h executable - which of course screwed up the execution of other WUs. In any event, only one user claims to have processed that unit, and is making claims for other WUs along the order of 90 cobblestones. He's using a client for Darwin. Is there any chance that the main windows app has a bug that Darwin doesn't? |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=147603991 That Mac did manage to get to Pulse overflow before BOINC killed the task for Maximum CPU time exceeded. I think that's mainly because the BOINC benchmarks for those quad systems are not nearly as much higher as their capability to crunch SETI work is. That makes the maximum time limit relatively further out. The other possibility is the compiler for those Mac builds may be producing more efficient code for the triplet finding loop. The high claims are of course because the stock Mac builds have the 3.81 multiplier from Beta. Joe |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Is this a 'bad' WU too? It was running ~ 1.5 hours, it was at ~ 15 % (not stopped!), ~ 2.5 hours (remaining time) (Normally my PC need ~ 1.5 hours for this AR..) New Rev. 2.4 from Crunch3r.. http://setiathome.berkeley.edu/workunit.php?wuid=149328091 |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
It was created 19 Aug 2007 9:38:04 UTC, long after the splitter problem was cured. Another (wuid 149328099) from the same splitter group processed normally, so the thresholds are almost certainly correct. Looking in the WU or result would of course provide the best evidence, if you saved information before aborting. Joe |
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I too have got another one 04mr07ab.10282.4980.3.4.87_2 and I know it is a -9 one going 2hrs and only 0.06 again. Will leave it to 16:00 BST and then abort it sorry to the other person waiting on this. That one ran for just over 3 hours on a P4 2.4GHz - a bit slower than yours. If you could bear to run it for just a little bit longer, you could kill it for good - seems a shame not to put it out of its misery, now you've already spent so much time on it. Edit - I should have commented on it being a 'past deadline' re-issue. D**n. We could be seeing a lot of these - all hands to the boards! |
top1214 Send message Joined: 18 Oct 06 Posts: 1 Credit: 44,898 RAC: 0 |
I suspended my latest bad task (04mr07ab.7106.5798.10.4.216_3) as soon as I got it. SETI isn't sending me any more work though. Is that normal for task suspension? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I suspended my latest bad task (04mr07ab.7106.5798.10.4.216_3) as soon as I got it. SETI isn't sending me any more work though. Is that normal for task suspension? Yes it is, and I see you've had the WU for over a week - pity you didn't ask earlier. If you feel up to performing Joe Segur's "technically adventurous" surgery described in this post, you could get it to run very quickly to completion - since it's already been processed by someone else, that would get rid of it for good. But you'll have to be quick: the deadline expires in under an hour, and after that it'll be put in the queue for issuing to someone else. If you don't feel that adventurous, just abort it now - it's so close to deadline that it'll hardly make any difference. |
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
For those of you technically adventurous, here's another option to handle the WUs with negative triplet threshold. I am not sure if this is a good idea. If a work unit is discarded due to too many errors, this might notify the administrators that something needed to be done about the problem WU. Once Eric finally fixes the splitter (what the admins did looks like a band-aid on code that was not their specialty so their patch may have broken the splitter in a way that they might not have seen), he will know that this work unit errored out, and have it resplit with the corrected splitter. If you modify the work unit, this flag might not be generated. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.