Message boards :
Number crunching :
Work Unit problem
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
I MAY have an explanation for BOINC not downloading new Units if suspended are there. This had been used at LHC@home to fill up the own cache and receive more of the rare units without increasing the cache. MAY BE that due to this a function was added to BOINC not to do this. Don't know if LHC@home had anything to do with it, but a good guess. :) With v5.8.xx and later, suspending a Task in a project has the same effect as setting the project to "no new work", since the project isn't allowed to ask for more work, regardless of computer idle or not. Well, it's not exactly as setting "no new work", since there is a chance you'll get a server-abort for the Task you've suspended... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Tklop Send message Joined: 11 May 03 Posts: 175 Credit: 613,952 RAC: 0 |
I don't know whether or not this is helpful, but I did find several of these 04mr07 work units on my machines, but I only have one that's stuck... So, perhaps only some of these work units are squirrly... I read below, that some have determined some specific characteristic having to do with -9 or something--and as a relative novice, I have no idea what that's all about, but perhaps that feature only resides with my one stuck WU... All I'm saying, is that suspending all WU's that start with 04mr07 may be premature... Some of them might actually work! For myself, I think I will follow the "Leave It Alone" advice... Anyway... Keep on crunching, all... SETI@Home Forever! ___Tklop (Step-Founder, U.S. Air Force team) |
W-K 666 Send message Joined: 18 May 99 Posts: 19398 Credit: 40,757,560 RAC: 67 |
I don't know whether or not this is helpful, but I did find several of these 04mr07 work units on my machines, but I only have one that's stuck... a -9 overflow unit is normally regarded as noisy, usually it recorded man made interference. The overflow description is because the output file is restricted to a max of 30 pulses etc. Andy |
Tklop Send message Joined: 11 May 03 Posts: 175 Credit: 613,952 RAC: 0 |
a -9 overflow unit is normally regarded as noisy, usually it recorded man made interference. The overflow description is because the output file is restricted to a max of 30 pulses etc. Thanks, my friend! I appreciate the explanation... Keep on crunching, all... SETI@Home Forever! ___Tklop (Step-Founder, U.S. Air Force team) |
W-K 666 Send message Joined: 18 May 99 Posts: 19398 Credit: 40,757,560 RAC: 67 |
Just had eight 04mr07aa dash 9 overflow, but none none stuck. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Just had eight 04mr07aa dash 9 overflow, but none none stuck. From what Joe was saying, it seems plausible that the current very high rate of -9 units isn't just bad luck with a heavy dose of RFI interference, isn't a problem with blanking the radar, but is the result of setting the app sensitivity too high via the splitter parameters. In which case, Matt & Co are fighting the wrong fire. If they could fix the cause of the current high demand for WUs, caches would start to fill up again, the work request rate would die back, the servers (and the staff!) would cool down, and they'de be able to sort out the remaining server configuration issues in (relative) peace. Instead, they're trying to increase the server speed, and probably throwing away perfectly good data in the process. The beer is leaking out of the barrels, and all they're doing is digging a bigger drain. I wish there was a better (trusted, moderated) feedback channel to the project developers/operational staff. I tried posting a quiet note in Technical News, but it got rapidly swamped in the Credit Wars - IMHO, very bad netiquette to post that sort of stuff in a news forum. A similar thing happened with the app_info.xml bug in May. We, the denizens of the fora, had pretty much sussed out what the problem was and which bit of the server was broken by about midday Friday UTC. It was clear that the project staff were still in the dark about the true situation on Monday afternoon, CA time, some 80 hours later (I can provide sources for that assertion). If the staff could have a quick way of picking up on hard, technical facts gleaned by us crunchers, without having to wade through all the dross, I'm sure it would be helpful to them. As it is, I bet every single thread on these message boards errors out with a -9 when the staff try to read them. I wonder if there could be an inverse of the Technical News board? Posts could only be made, or could be moved into it, by a 'Technical Moderator' - one of a group of people with sufficient technical / hardware / networking / programming experience {there are enough of them active on the boards) to recognise a significant point and allow it to float to the surface. Just a thought. Comments? |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
Richard, How about someone (you perhaps) just copy and pasting worthy posts into a digest and mailing them to the admins every so often. |
Mad Hatter Send message Joined: 21 Sep 99 Posts: 2 Credit: 2,667,721 RAC: 0 |
I let one crunch for about 2 days on my PC, it never got to .01%, so I finally aborted it. Anywho, here is the work unit: 04mr07ab.7106.4571.10.4.176 Madhatter |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
There may be more WUs affected by this problem. I have one crunching on my x64 quad that has been at it for almost 3-1/2 hours and is only at .083% complete. And this is a different series....recently downloaded....05mr07aa.12591.24612.13.4.241...so it appears this is still perhaps a splitter problem. Normal completions times for MB seem to be running about 1hr 20min on this rig, so it's obvious there is a major problem with this WU. I will let it run for today, but I am going to abort it tonight if I see it has wasted another 14 hours on it and is still only a few percent complete. "Time is simply the mechanism that keeps everything from happening all at once." |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
There may be more WUs affected by this problem. I have one crunching on my x64 quad that has been at it for almost 3-1/2 hours and is only at .083% complete. And this is a different series....recently downloaded....05mr07aa.12591.24612.13.4.241...so it appears this is still perhaps a splitter problem. Created 16 Aug 2007 11:23:35 UTC So the splitters are running, and still creating ... gibberish. [Mark, could you check/post the <triplet_thresh>x.xxxxxxxx</triplet_thresh> for that WU? Thanks.] Anyone - everyone - please, how can we get through to the project staff to tackle this problem at source, not just keep feeding the monster? |
bounty.hunter Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0 |
Anyone - everyone - please, how can we get through to the project staff to tackle this problem at source, not just keep feeding the monster? Possibly a PM to Pappa......I think he is in phone touch with the lab.... |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
From what Joe was saying, it seems plausible that the current very high rate of -9 units isn't just bad luck with a heavy dose of RFI interference, isn't a problem with blanking the radar, but is the result of setting the app sensitivity too high via the splitter parameters. Very good idea! But who will sort out those 'Technical Moderators' from those who have knowledge and those who only think they have...? Who will do it? They might need lots of spare time to read all the PMs. ;) Maybe those who already have the 'Volunteer Developer' tag?? mic. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Anyone - everyone - please, how can we get through to the project staff to tackle this problem at source, not just keep feeding the monster? Just tried to do that, and got a new (to me) message: You are not allowed to send privates messages so often. Please wait some time before sending more messages. I've sent ONE private message today, and that was over 6 hours ago! Anyone know what the limit is? Edit - got a message away using the Beta board. Now I suppose I've used up my quota for the day there as well! |
bounty.hunter Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0 |
Just tried to do that, and got a new (to me) message: Richard, I just sent a PM to Pappa and followed it up with a test PM to you. I didn't get the strange message that you got. Might be a momentary blip on the db ? |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
And a truly informative and polite PM it was. Sorry, you used up your quota on me (if one exists) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Just tried to do that, and got a new (to me) message: Most odd. Message received - thank you - and it allowed me to reply. I thought of one possible explanation: I sent both my message to Astro, and my message to Pappa, to myself as a second recipient, so I would have an archive copy to refer to. I wondered whether there was a limit on the number of messages you can send to the same person. So I just tried copying myself in on a second reply to bounty.hunter - and that one was accepted as well. So I'm stuck. Anyway, let's hope Pappa isn't away on vacation, and has the phone number for someone other than Eric (who is). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Even odder. I've just had a reply to the message that the system wouldn't accept! I copied in a second recipient that I knew had telephone access to the lab (only Eric, unfortunately), and he's just replied. No direct voice contact with Matt, but apparently the message has been forwarded. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Richard, Sent an email to Pappa on an account he might check more often. Claggy. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
FYI - we were observing the triplet overflow behavior as soon as these particular files were being split days ago. Usually these are caused by heavy areas of RFI and we work beyond them on our own. Some fires you just let burn, you know? Anyway, we're on it. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
FYI - we were observing the triplet overflow behavior as soon as these particular files were being split days ago. Usually these are caused by heavy areas of RFI and we work beyond them on our own. Some fires you just let burn, you know? Anyway, we're on it. Great, thanks for letting us know. Just wanted to be sure you'd got the negative <triplet_thresh>, which Joe seemed to think was more unusual. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.