Message boards :
Number crunching :
Work Unit problem
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51484 Credit: 1,018,363,574 RAC: 1,004 |
There may be more WUs affected by this problem. I have one crunching on my x64 quad that has been at it for almost 3-1/2 hours and is only at .083% complete. And this is a different series....recently downloaded....05mr07aa.12591.24612.13.4.241...so it appears this is still perhaps a splitter problem. Where would I find that info? I just looked, and the WU finally finished with a -9 overflow after over 3-1/2 hours of crunching and asking for .61 worth of credit for the trouble. "Time is simply the mechanism that keeps everything from happening all at once." |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
FYI - we were observing the triplet overflow behavior as soon as these particular files were being split days ago. Usually these are caused by heavy areas of RFI and we work beyond them on our own. Some fires you just let burn, you know? Anyway, we're on it. Any thoughts as to what we're to do with the WU's we've suspended after they stopped responding at ~0.0nnn% If we simply abort them someone else will get them, and if we let them sit there BOINC will never request more work. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874 |
Where would I find that info? I just looked, and the WU finally finished with a -9 overflow after over 3-1/2 hours of crunching and asking for .61 worth of credit for the trouble. If the WU has finished and reported, it's too late now: but for future reference (and in case any other drive-by readers are interested) - All the WU data is contained in the big (367KB) files in the ..\\BOINC\\projects\\setiathome.berkeley.edu folder. It's plain text: you can open it with Notepad, Wordpad or any other plain-text viewer. Wordpad displays it more clearly than Notepad. Be very careful not to make or save any changes to the file, but it's perfectly safe to have a peek. The files start with an xml section - you'll be familiar with the style from the app_info.xml files. You'll see a <workunit_header>, and then more information about the WU than you ever really wanted to know. The bit that seems to have gotten itself messed up this time is towards the bottom of the header, half way down a section called <analysis_cfg>. The tag in one of my suspended WUs reads <triplet_thresh>-2.06835318</triplet_thresh> - Joe reckons the number is only meaningful if it's positive, which is whay I was asking what yours was. Anyway, Matt's on the case now, so I think we can relax a bit. |
Blurf Send message Joined: 2 Sep 06 Posts: 8964 Credit: 12,678,685 RAC: 0 |
In the future people can PM me...I've called the lab several times and tend to be more available than Pappa |
SATAN Send message Joined: 27 Aug 06 Posts: 835 Credit: 2,129,006 RAC: 0 |
I've just aborted 10 that wouldn't even download, and have just got another 20, doing the exactly the same thing. Something doesn't add up, they say there are plenty of units to download, but they very rarely get here. I'm sure the guys will sort it out in the end. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
FYI - we were observing the triplet overflow behavior as soon as these particular files were being split days ago. Usually these are caused by heavy areas of RFI and we work beyond them on our own. Some fires you just let burn, you know? Anyway, we're on it. If you don't have too many, perhaps a Resume, get work, Suspend sequence would be practical. You'd only have to leave them active until the Scheduler had assigned the work to download. Of course we hope the project will resolve the situation before the 8.68 day deadline, otherwise those WUs will be sent to someone else anyhow. The only other way to be kind to other users is to let the WUs run and hope they eventually overflow on Pulses. Those participants who don't watch crunching are effectively doing that, it's the only way to get a matching result. Joe |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
FYI - we were observing the triplet overflow behavior as soon as these particular files were being split days ago. Usually these are caused by heavy areas of RFI and we work beyond them on our own. Some fires you just let burn, you know? Anyway, we're on it. Makes sense. Thanks again. |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
Here's another one exhibiting the same behavior. I am just waiting for the WU to overflow so that others don't get stuck on it. |
Rongar Send message Joined: 4 Aug 99 Posts: 13 Credit: 149,653 RAC: 0 |
Hi, to prevent to get the WUs send out again. Shouldn't we collect all WU IDs to get them removed from the database manually or can this be fixed by running a script? Best regards Michael |
Bob Nadler Send message Joined: 3 Sep 99 Posts: 7 Credit: 726,368 RAC: 0 |
Hi, Hi Everyone, I have just recently returned to teh seti@home project.. I also have one of these troublesome work units - 04mr07ab.14840.4980.6.4.243 . http://setiathome.berkeley.edu/workunit.php?wuid=147539328 Here is the <triplet_thresh>-0.764051318</triplet_thresh> I post here in case this info can be used as a data point. Good luck! Bob |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874 |
I think the final word in this thread should go to Matt, in the latest Technical News. Thanks to MadMac for starting the thread in the first place, and getting us all thinking. |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
I think the final word in this thread should go to Matt, in the latest Technical News. Thanks to MadMac for starting the thread in the first place, and getting us all thinking. And thanks to Joe Segur for getting involved, quickly locating that negative triplet threshold, and getting a dialog started with Matt. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Yep - Joe was quite helpful. - Matt I think the final word in this thread should go to Matt, in the latest Technical News. Thanks to MadMac for starting the thread in the first place, and getting us all thinking. -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Hi, It probably could be done with a script, assuming someone had time to write one. Let's at least list the ones we know about and try to get them cancelled. Even that may be more effort than Matt or Jeff have time for, but we can ask. Cancelling might mean never getting the fractional credit for those which have been allowed to complete, speak up if you have objections. Because there are 256 WUs with identical thresholds in each group, only the first 3 fields of the WU name are needed. Here's mine plus those already mentioned in the thread: 04mr07ab.10282.4980 04mr07ab.14840.4980 04mr07ab.32128.5798 05mr07aa.12591.24612 05mr07aa.15859.24612 05mr07ab.7301.368637 Joe |
Tklop Send message Joined: 11 May 03 Posts: 175 Credit: 613,952 RAC: 0 |
Hi, Hello, fellow crunchers... Here's my solitary example (so far) 04mr07ab.7106.5389 Have we reached a consensus for action needed? I thought Matt reccomended letting them run, until the the overflow check causes them to stop. Does anyone know when we might expect that to happen? So far, this one has been crunching 16+ hours, with only .012 complete. I don't mind just letting it run--provided it finishes before its deadline, and I sure don't want to just dump it on some other user... The computer trying to crunch it (if it even matters) is a Pentium M 1.86GHz, with 1GB RAM, running Windows XP Pro 5.1.2600, SP2, Build 2600... Anyways, Keep on crunching, all... SETI@Home Forever! ___Tklop (Step-Founder, U.S. Air Force team) |
W-K 666 Send message Joined: 18 May 99 Posts: 19417 Credit: 40,757,560 RAC: 67 |
Does anybody know if these batches could be canceled at Berkeley, I have just had another one on re-issue, copy _4. Just caught it before I put nose to grindstone. It had done 20mins for 0.018%, so aborted. Andy |
kittyman Send message Joined: 9 Jul 00 Posts: 51484 Credit: 1,018,363,574 RAC: 1,004 |
Hi, Until the problem is fixed and a solution implemented, either just let it run to completion (it will probably -9 and exit), or suspend that WU for now. The only problem with suspending it is that another user commented the he though Boinc will not attempt to get any new work while a WU is suspended. If you have additional work in your cache, that is not a problem. If you do not, I guess I would abort it to try to get new work. Except that downloads seem to be problematic at the moment as well. Hang in there. "Time is simply the mechanism that keeps everything from happening all at once." |
Tklop Send message Joined: 11 May 03 Posts: 175 Credit: 613,952 RAC: 0 |
Hi, Thanks, msattler! I believe I shall let it go for another half day or so, before suspending it... If it -9's and exits, then all the better! Upon reviewing the work unit's history, it looks like that's what happened to another user already -- see here: http://setiathome.berkeley.edu/workunit.php?wuid=147604236 So, for now, patience it is! Keep on crunching, all... SETI@Home Forever! ___Tklop (Step-Founder, U.S. Air Force team) |
Rongar Send message Joined: 4 Aug 99 Posts: 13 Credit: 149,653 RAC: 0 |
Hi, my WU: 05mr07aa.12591.24612.13 I think I have to suspend it. Since my 'puter is not powered 24/7 it seems not to reach the next milestone and falls back to 0.012%. And it seems to have few signals (found so far 1 Pulse within 0.012% and 2h crunching time.) Best regards Michael |
kittyman Send message Joined: 9 Jul 00 Posts: 51484 Credit: 1,018,363,574 RAC: 1,004 |
Hi, Excellent choice, my friend, excellent choice! The kitties and I have been watching my RAC get slaughtered lately, but we will hang with it. We are in it for the science, although the science experiment seems to have gone a bit awry lately. A few test tubes shattered and such. Remember science class? Mr Wizard would have understood all of this. It'll get sorted out. "Time is simply the mechanism that keeps everything from happening all at once." |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.