Message boards :
Number crunching :
Long Work Units
Message board moderation
Author | Message |
---|---|
Kevster Send message Joined: 11 Jan 01 Posts: 33 Credit: 1,548,476 RAC: 0 ![]() |
99.9% of the work units on my computer take just over 4 hours. Once in a while I get strange work units that take many times longer. If I stop Boinc Manager and restart, the work unit will often finish, sometimes not. Other times things will not make sense, like after 10 hours a work unit is 10% complete with only 16 hours left. You do the math, it just doesn't work out. I check my CPU status, and my computer isn't working on anything else. I basically use my computer for email, so it's not like it's busy trying to solve the answer to world peace now and then. What it going on? |
![]() ![]() Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 ![]() |
99.9% of the work units on my computer take just over 4 hours. Once in a while I get strange work units that take many times longer. If I stop Boinc Manager and restart, the work unit will often finish, sometimes not. Other times things will not make sense, like after 10 hours a work unit is 10% complete with only 16 hours left. You do the math, it just doesn't work out. I check my CPU status, and my computer isn't working on anything else. I basically use my computer for email, so it's not like it's busy trying to solve the answer to world peace now and then. What it going on? Different WU's have different Angle Ranges. http://www.boinc-wiki.info/True_Angle_Range http://www.boinc-wiki.info/SETI%40Home_FAQ:_The_SETI%40Home_Project#Why_is_there_so_much_variability_in_work_unit_completion_time_with_version_3.x.3F ![]() ![]() |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
I've gotten a few of these, and have one now. It seems that the last one(s) finished in about 10x the time it was projected to take. And my wingman finished it in a 'standard' time. So I lost about 9 wu's equivalent by letting it finish. This was not an AR issue. So what is suggested? Do we abort the errant wu's when we spot them, or let them loop almost forever and waste the compute time? Anyone?? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I've gotten a few of these, and have one now. It seems that the last one(s) finished in about 10x the time it was projected to take. And my wingman finished it in a 'standard' time. So I lost about 9 wu's equivalent by letting it finish. This was not an AR issue. Do we get a clue? A WU ID, perhaps, or a Task ID? Even a host ID? Your cache size and batch-processing mode makes hunting for a needle in the proverbial a tad difficult. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Kevster's task 796655083 restarted again, and again, and again, from the beginning. Setting 'keep applications in memory when suspended' can help avoid this - though since the RAC on his other two projects is 0, that's probably not the problem here. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
Could be one of them: http://setiathome.berkeley.edu/result.php?resultid=795910157 . It is long, but maybe not 'too' long. The one currently running is 29mr07af.22054.23794.15.7.156_1; it is 3h into a projected 6h run that normally takes 1.5-2h. The egregious one has been cleared by the system. It matched its partner, but had way too many hours. I think it completed on or after 4/4/08. Sorry, I can't quote more. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Could be one of them: http://setiathome.berkeley.edu/result.php?resultid=795910157 . It is long, but maybe not 'too' long. A Q6700 @ 2.66 GHz takes less than half the time of a P4 @ 3.00 GHz? Doesn't seem to be terribly much wrong with that. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
Could be one of them: http://setiathome.berkeley.edu/result.php?resultid=795910157 . It is long, but maybe not 'too' long. Like I said, it could have been one of them. The bad boy which has disappeard from the on-line query was about 8-10x longer than it's wingman. If I catch a fresh one, like the running now, I'll try to post the info here. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
Could be one of them: http://setiathome.berkeley.edu/result.php?resultid=795910157 . It is long, but maybe not 'too' long. Oh, yes, I listed this one because it took a long time and I got half the credit I would have expected. The ones I completed issued to me at the same time completed in 6000 s and received 70 cs's. This one too 12K s and got 60 cs's. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Could be one of them: http://setiathome.berkeley.edu/result.php?resultid=795910157 . It is long, but maybe not 'too' long. Like Keith T said, different ARs perform differently. It's not linear. |
![]() ![]() Send message Joined: 4 Feb 05 Posts: 35 Credit: 31,021,410 RAC: 0 ![]() |
I've had the same problem for about a year on my laptop (and ONLY on my laptop). Sometimes a WU gets stuck in processing, the progress indicator stops, and the '% to completion' goes up instead of down. I suspected Crunch3r's optimized app and changed to Simon's Chicken app, but then it happened again…and again… I feel that this particular problem has nothing to do with AR, and because it doesn’t happen on all computers, I think it is somehow software related. Recently there was the 'headless' batch (13feb08ac.8515.4162.3.7.xxx) but that was just a batch of dud WUs. Whether it’s Windows, the optimized app, or something different, I don’t know, but if somebody could think of a cure I would certainly be grateful. I haven’t noticed if the problem occurs in random WUs or if the problem is associated with certain batches. I only had a few in the past month, but in January/February I had several every week. Right now I can find only one bad WU in my work list (but I cancelled it when I saw it on the computer): 762186011. The laptop is a Lenovo/IBM T60 Thinkpad. /Mark |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
http://setiathome.berkeley.edu/workunit.php?wuid=242589114 this took 55K sec to earn what the wingman earned in 7Ks, using essentially the same cpu. (C2Q) I suspect there is some errant code somewhere, perhaps corners cut in order to obtain performance? http://setiathome.berkeley.edu/workunit.php?wuid=242590925 This one took 12K sec to earn what I usually earn in 7Ks. I am really beginning to have disdain for this credit granting system. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
WU 242589114 this took 55K sec to earn what the wingman earned in 7Ks, using essentially the same cpu. (C2Q) I suspect there is some errant code somewhere, perhaps corners cut in order to obtain performance? Crunch3r 2.4V 64-bit on Vista SP1 - quite a lot of variables to look at there. But no sign of restarting. WU 242590925 This one took 12K sec to earn what I usually earn in 7Ks. I am really beginning to have disdain for this credit granting system. Likewise. I saw a rant from someone on one of the boards who claimed that Vista SP1 had knocked back his RAC, but no facts: no examples, no WU links, no 'before and after' comparisons. So I ignored it. But I did put SP1 on my own Vista (32-bit) box on Sunday, and that is one that I monitor and log quite closely. If anything shows up in the way of extended crunch times, I'll notice it and post about it. But so far, no problems at all. PS You saw what I did with your links? Edit - just re-vac'd the chart, and I've got one at 11,407 seconds against a usual range of 6K - 8K. But it was one of the rare AR=0.242808 ones, claiming 106 credits, so I don't think we can blame Vista for that (and anyway, it was a few days ago, and it's been purged now). Everything else is within the same scatter as before. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
Here is another one that ran anomalously. And this one ran to normal completion but failed to validate, which is very very rare for this box. I think that seti and/or boinc has a hard time playing with other tasks in the Vista 64b SP1 playground. It seems that the workunits have trouble, leading to long runtimes and once in a while an invalid result, when my backup runs or when I install software (like MS updates). Both of these actions seem to lock out other activities on the computer for short periods of times, and boinc seems to choke. The difficulty can be seen in the boinc log. See this thread for further information. |
![]() Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 ![]() |
Recently I did see two 20,000-second workunits, which only gave as much as 10,000-second ones for my PD950s. It looked like at least one restarted late in cycle but not the other. There were only two so I'm not gonna worry. Don't know what kind of error would have caused this. If I see more I will try to find out why. With error-free units crunchtimes are supposed to vary with the credits awarded but that proportion holds up poorly because it's difficult to impossible to accommodate the program to all processors and projects. Fortunately RAC doesn't vary too much because of the variety of workunits over a period of a week or so. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
And here is another one . My quad finished it in 34K sec and received 31 credits, while the wingman finished it 5K sec. There is a loose cannon in the code and it is hurting the project. I really wish someone competent would investigate. In this example, I could have completed 6 other wu's in the time wasted. Isn't anyone (but me) PO'd? |
![]() Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 ![]() |
I saw at least three more long units today, three times longer than other typical units, all on one of my two PD950s. One unit is 28mr07al...59 and another is 27mr08al...219. One was a reset, the other wasn't. The third one was a "shortie" that required the typical time for a longer unit. What's goin' on, anyway? If something's wrong with that machine all of its units should be long. Not so! |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
And another one, where the wingman spent 11K sec on a C2D and I spent 34K sec on a C2Q. (Should have matched better if something wasn't wrong, right?) No restarts listed for my machine, by the way: <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> Optimized SETI@Home Enhanced application Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra Version: Windows SSSE3 64-bit based on S@H V5.15 'Noo? No - Ni!' Revision: R-2.4V|xT|FFT:IPP_SSSE3|Ben-Joe CPUID: Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz Speed: 4 x 2660 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 x86_64 Work Unit Info WU Credit multi. is: 2.85 WU True angle range: 0.436959 Spikes Pulses Triplets Gaussians Flops 0 0 2 1 15754773052931 </stderr_txt> ]]> I do notice that when this happens, boinc manager recomputes all the expected times and reports some wild expected run-times for the wu's ready to start. These estimates slowly return to 'normal' as more wu's are computed, however. |
![]() ![]() Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0 ![]() |
And another one, where the wingman spent 11K sec on a C2D and I spent 34K sec on a C2Q. (Should have matched better if something wasn't wrong, right?) No restarts listed for my machine, by the way: That's your DCF (duration correction factor) being adjusted based on expected vs actual run time. [edit]correction: estimated vs actual, not expected vs actual[/edit] |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 ![]() |
Yes, I agree it is the DCF. But the wild estimates (4-8x too large) are due to these errant long work units I get. Probably the dcf is being computed correctly. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.