Work Unit problem

Message boards : Number crunching : Work Unit problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 620024 - Posted: 16 Aug 2007, 2:40:59 UTC - in response to Message 619894.  

I MAY have an explanation for BOINC not downloading new Units if suspended are there. This had been used at LHC@home to fill up the own cache and receive more of the rare units without increasing the cache. MAY BE that due to this a function was added to BOINC not to do this.

But I am only guessing here!

Don't know if LHC@home had anything to do with it, but a good guess. :)

With v5.8.xx and later, suspending a Task in a project has the same effect as setting the project to "no new work", since the project isn't allowed to ask for more work, regardless of computer idle or not.

Well, it's not exactly as setting "no new work", since there is a chance you'll get a server-abort for the Task you've suspended...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 620024 · Report as offensive
Profile Tklop
Avatar

Send message
Joined: 11 May 03
Posts: 175
Credit: 613,952
RAC: 0
United States
Message 620055 - Posted: 16 Aug 2007, 4:00:23 UTC
Last modified: 16 Aug 2007, 4:04:42 UTC

I don't know whether or not this is helpful, but I did find several of these 04mr07 work units on my machines, but I only have one that's stuck...

So, perhaps only some of these work units are squirrly...

I read below, that some have determined some specific characteristic having to do with -9 or something--and as a relative novice, I have no idea what that's all about, but perhaps that feature only resides with my one stuck WU...

All I'm saying, is that suspending all WU's that start with 04mr07 may be premature... Some of them might actually work!

For myself, I think I will follow the "Leave It Alone" advice...

Anyway...

Keep on crunching, all...
SETI@Home Forever!


___Tklop (Step-Founder, U.S. Air Force team)
ID: 620055 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19401
Credit: 40,757,560
RAC: 67
United Kingdom
Message 620060 - Posted: 16 Aug 2007, 4:07:55 UTC - in response to Message 620055.  

I don't know whether or not this is helpful, but I did find several of these 04mr07 work units on my machines, but I only have one that's stuck...

So, perhaps only some of these work units are squirrly...

I read below, that some have determined some specific characteristic having to do with -9 or something--and as a relative novice, I have no idea what that's all about, but perhaps that feature only resides with my one stuck WU...

All I'm saying, is that suspending all WU's that start with 04mr07 may be premature... Some of them might actually work!

For myself, I think I will follow the "Leave It Alone" advice...

Anyway...

a -9 overflow unit is normally regarded as noisy, usually it recorded man made interference. The overflow description is because the output file is restricted to a max of 30 pulses etc.

Andy
ID: 620060 · Report as offensive
Profile Tklop
Avatar

Send message
Joined: 11 May 03
Posts: 175
Credit: 613,952
RAC: 0
United States
Message 620063 - Posted: 16 Aug 2007, 4:11:13 UTC
Last modified: 16 Aug 2007, 4:11:37 UTC

a -9 overflow unit is normally regarded as noisy, usually it recorded man made interference. The overflow description is because the output file is restricted to a max of 30 pulses etc.


Thanks, my friend! I appreciate the explanation...
Keep on crunching, all...
SETI@Home Forever!


___Tklop (Step-Founder, U.S. Air Force team)
ID: 620063 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19401
Credit: 40,757,560
RAC: 67
United Kingdom
Message 620157 - Posted: 16 Aug 2007, 8:45:15 UTC

Just had eight 04mr07aa dash 9 overflow, but none none stuck.
ID: 620157 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 620172 - Posted: 16 Aug 2007, 9:36:50 UTC - in response to Message 620157.  

Just had eight 04mr07aa dash 9 overflow, but none none stuck.

From what Joe was saying, it seems plausible that the current very high rate of -9 units isn't just bad luck with a heavy dose of RFI interference, isn't a problem with blanking the radar, but is the result of setting the app sensitivity too high via the splitter parameters.

In which case, Matt & Co are fighting the wrong fire.

If they could fix the cause of the current high demand for WUs, caches would start to fill up again, the work request rate would die back, the servers (and the staff!) would cool down, and they'de be able to sort out the remaining server configuration issues in (relative) peace. Instead, they're trying to increase the server speed, and probably throwing away perfectly good data in the process.

The beer is leaking out of the barrels, and all they're doing is digging a bigger drain.

I wish there was a better (trusted, moderated) feedback channel to the project developers/operational staff. I tried posting a quiet note in Technical News, but it got rapidly swamped in the Credit Wars - IMHO, very bad netiquette to post that sort of stuff in a news forum.

A similar thing happened with the app_info.xml bug in May. We, the denizens of the fora, had pretty much sussed out what the problem was and which bit of the server was broken by about midday Friday UTC. It was clear that the project staff were still in the dark about the true situation on Monday afternoon, CA time, some 80 hours later (I can provide sources for that assertion). If the staff could have a quick way of picking up on hard, technical facts gleaned by us crunchers, without having to wade through all the dross, I'm sure it would be helpful to them. As it is, I bet every single thread on these message boards errors out with a -9 when the staff try to read them.

I wonder if there could be an inverse of the Technical News board? Posts could only be made, or could be moved into it, by a 'Technical Moderator' - one of a group of people with sufficient technical / hardware / networking / programming experience {there are enough of them active on the boards) to recognise a significant point and allow it to float to the surface.

Just a thought. Comments?

ID: 620172 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 620175 - Posted: 16 Aug 2007, 9:43:40 UTC
Last modified: 16 Aug 2007, 9:44:09 UTC

Richard,

How about someone (you perhaps) just copy and pasting worthy posts into a digest and mailing them to the admins every so often.
ID: 620175 · Report as offensive
Mad Hatter

Send message
Joined: 21 Sep 99
Posts: 2
Credit: 2,667,721
RAC: 0
United States
Message 620228 - Posted: 16 Aug 2007, 12:48:59 UTC

I let one crunch for about 2 days on my PC, it never got to .01%, so I finally aborted it.

Anywho, here is the work unit: 04mr07ab.7106.4571.10.4.176

Madhatter
ID: 620228 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 620277 - Posted: 16 Aug 2007, 14:59:30 UTC

There may be more WUs affected by this problem. I have one crunching on my x64 quad that has been at it for almost 3-1/2 hours and is only at .083% complete. And this is a different series....recently downloaded....05mr07aa.12591.24612.13.4.241...so it appears this is still perhaps a splitter problem.
Normal completions times for MB seem to be running about 1hr 20min on this rig, so it's obvious there is a major problem with this WU.
I will let it run for today, but I am going to abort it tonight if I see it has wasted another 14 hours on it and is still only a few percent complete.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 620277 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 620279 - Posted: 16 Aug 2007, 15:04:51 UTC - in response to Message 620277.  

There may be more WUs affected by this problem. I have one crunching on my x64 quad that has been at it for almost 3-1/2 hours and is only at .083% complete. And this is a different series....recently downloaded....05mr07aa.12591.24612.13.4.241...so it appears this is still perhaps a splitter problem.
Normal completions times for MB seem to be running about 1hr 20min on this rig, so it's obvious there is a major problem with this WU.
I will let it run for today, but I am going to abort it tonight if I see it has wasted another 14 hours on it and is still only a few percent complete.

Created 16 Aug 2007 11:23:35 UTC

So the splitters are running, and still creating ... gibberish.

[Mark, could you check/post the

<triplet_thresh>x.xxxxxxxx</triplet_thresh>

for that WU? Thanks.]

Anyone - everyone - please, how can we get through to the project staff to tackle this problem at source, not just keep feeding the monster?
ID: 620279 · Report as offensive
Profile bounty.hunter
Volunteer tester
Avatar

Send message
Joined: 22 Mar 04
Posts: 442
Credit: 459,063
RAC: 0
India
Message 620289 - Posted: 16 Aug 2007, 15:20:02 UTC - in response to Message 620279.  

Anyone - everyone - please, how can we get through to the project staff to tackle this problem at source, not just keep feeding the monster?


Possibly a PM to Pappa......I think he is in phone touch with the lab....
ID: 620289 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 620303 - Posted: 16 Aug 2007, 15:46:04 UTC

From what Joe was saying, it seems plausible that the current very high rate of -9 units isn't just bad luck with a heavy dose of RFI interference, isn't a problem with blanking the radar, but is the result of setting the app sensitivity too high via the splitter parameters.

In which case, Matt & Co are fighting the wrong fire.

If they could fix the cause of the current high demand for WUs, caches would start to fill up again, the work request rate would die back, the servers (and the staff!) would cool down, and they'de be able to sort out the remaining server configuration issues in (relative) peace. Instead, they're trying to increase the server speed, and probably throwing away perfectly good data in the process.

The beer is leaking out of the barrels, and all they're doing is digging a bigger drain.

I wish there was a better (trusted, moderated) feedback channel to the project developers/operational staff. I tried posting a quiet note in Technical News, but it got rapidly swamped in the Credit Wars - IMHO, very bad netiquette to post that sort of stuff in a news forum.

A similar thing happened with the app_info.xml bug in May. We, the denizens of the fora, had pretty much sussed out what the problem was and which bit of the server was broken by about midday Friday UTC. It was clear that the project staff were still in the dark about the true situation on Monday afternoon, CA time, some 80 hours later (I can provide sources for that assertion). If the staff could have a quick way of picking up on hard, technical facts gleaned by us crunchers, without having to wade through all the dross, I'm sure it would be helpful to them. As it is, I bet every single thread on these message boards errors out with a -9 when the staff try to read them.

I wonder if there could be an inverse of the Technical News board? Posts could only be made, or could be moved into it, by a 'Technical Moderator' - one of a group of people with sufficient technical / hardware / networking / programming experience {there are enough of them active on the boards) to recognise a significant point and allow it to float to the surface.

Just a thought. Comments?



Very good idea!

But who will sort out those 'Technical Moderators' from those who have knowledge and those who only think they have...?
Who will do it? They might need lots of spare time to read all the PMs. ;)

Maybe those who already have the 'Volunteer Developer' tag??




mic.


ID: 620303 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 620304 - Posted: 16 Aug 2007, 15:46:34 UTC - in response to Message 620289.  
Last modified: 16 Aug 2007, 16:08:59 UTC

Anyone - everyone - please, how can we get through to the project staff to tackle this problem at source, not just keep feeding the monster?


Possibly a PM to Pappa......I think he is in phone touch with the lab....

Just tried to do that, and got a new (to me) message:
You are not allowed to send privates messages so often. Please wait some time before sending more messages.

I've sent ONE private message today, and that was over 6 hours ago! Anyone know what the limit is?

Edit - got a message away using the Beta board. Now I suppose I've used up my quota for the day there as well!
ID: 620304 · Report as offensive
Profile bounty.hunter
Volunteer tester
Avatar

Send message
Joined: 22 Mar 04
Posts: 442
Credit: 459,063
RAC: 0
India
Message 620317 - Posted: 16 Aug 2007, 16:01:55 UTC - in response to Message 620304.  

Just tried to do that, and got a new (to me) message:
You are not allowed to send privates messages so often. Please wait some time before sending more messages.

I've sent ONE private message today, and that was over 6 hours ago! Anyone know what the limit is?

Richard, I just sent a PM to Pappa and followed it up with a test PM to you. I didn't get the strange message that you got. Might be a momentary blip on the db ?
ID: 620317 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 620321 - Posted: 16 Aug 2007, 16:06:51 UTC
Last modified: 16 Aug 2007, 16:08:22 UTC

And a truly informative and polite PM it was. Sorry, you used up your quota on me (if one exists)
ID: 620321 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 620326 - Posted: 16 Aug 2007, 16:17:57 UTC - in response to Message 620317.  

Just tried to do that, and got a new (to me) message:
You are not allowed to send privates messages so often. Please wait some time before sending more messages.

I've sent ONE private message today, and that was over 6 hours ago! Anyone know what the limit is?

Richard, I just sent a PM to Pappa and followed it up with a test PM to you. I didn't get the strange message that you got. Might be a momentary blip on the db ?

Most odd. Message received - thank you - and it allowed me to reply.

I thought of one possible explanation: I sent both my message to Astro, and my message to Pappa, to myself as a second recipient, so I would have an archive copy to refer to. I wondered whether there was a limit on the number of messages you can send to the same person. So I just tried copying myself in on a second reply to bounty.hunter - and that one was accepted as well. So I'm stuck. Anyway, let's hope Pappa isn't away on vacation, and has the phone number for someone other than Eric (who is).
ID: 620326 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 620329 - Posted: 16 Aug 2007, 16:24:02 UTC

Even odder. I've just had a reply to the message that the system wouldn't accept! I copied in a second recipient that I knew had telephone access to the lab (only Eric, unfortunately), and he's just replied. No direct voice contact with Matt, but apparently the message has been forwarded.
ID: 620329 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 620330 - Posted: 16 Aug 2007, 16:28:21 UTC

Richard,

Sent an email to Pappa on an account he might check more often.

Claggy.
ID: 620330 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 620332 - Posted: 16 Aug 2007, 16:33:16 UTC

FYI - we were observing the triplet overflow behavior as soon as these particular files were being split days ago. Usually these are caused by heavy areas of RFI and we work beyond them on our own. Some fires you just let burn, you know? Anyway, we're on it.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 620332 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 620333 - Posted: 16 Aug 2007, 16:37:36 UTC - in response to Message 620332.  

FYI - we were observing the triplet overflow behavior as soon as these particular files were being split days ago. Usually these are caused by heavy areas of RFI and we work beyond them on our own. Some fires you just let burn, you know? Anyway, we're on it.

- Matt

Great, thanks for letting us know.

Just wanted to be sure you'd got the negative <triplet_thresh>, which Joe seemed to think was more unusual.
ID: 620333 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Work Unit problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.