Message boards :
Number crunching :
Computation Error - Bad Workunit Header
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
LOL... Perpetual motion is not a concern when it comes to WU's on SAH, so records would be moot. The maximum number of errors is set by a project side parameter. In SAH's case the value is 5, so when the sixth error arrives back at the project the WU is canceled. For the one you made reference to, it down to it's last 'chance'. Alinator |
KWSN Ekky Ekky Ekky Send message Joined: 25 May 99 Posts: 944 Credit: 52,956,491 RAC: 67 |
Of course. Silly me, forgot that basic fact. However, does it speed getting rid of these bad units if we abort them or is it better to let them run? After all, with these particular ones, a few seconds is all it takes for them to end. LOL... |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Of course. Silly me, forgot that basic fact. Well you just hit the rub of this particular batch. ;-) As you point out, the defect is such that the task fails in seconds in most cases, so just letting BOINC handle on it's own is definitely an option. However, there have been a few reports of them hanging up when they try to exit on multicore machines in some situations. This makes for a reasonable argument to abort them manually, but the trade off is the time you have to spend ferreting them out of your cache and then canning them. Another factor in manually aborting them is if you run a larger cache it helps get rid of them somewhat sooner from the overall project viewpoint. So those are the options in a nutshell and the choice is up to the user. Alinator |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65771 Credit: 55,293,173 RAC: 49 |
Of course. Silly me, forgot that basic fact. And so I'm of the opinion that We off these defective WUs, The sooner the better, As I just canned 11 on 3 PCs in less time than It took Me to read the thread or to make this post and a few others. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Well, I'm sure most of us NC regulars are on the hunt and canning them as soon as they show up. The reality is the vast majority of them are going to go through the loop and die of natural causes on their own. If it wasn't for the nagging multicore exit bug, it wouldn't make for doodly squat difference one way or the other in the grand scheme of things in this particular case. A scenario where they all ran for a major portion of their estimated time before crapping out would be a different kettle of woodchucks. However, Matt and Eric would have probably decided to take the risk of DB corruption and summarily have canceled them in that case. Alinator |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
A scenario where they all ran for a major portion of their estimated time before crapping out would be a different kettle of woodchucks. However, Matt and Eric would have probably decided to take the risk of DB corruption and summarily have canceled them in that case. Though it took weeks to persuade Matt'n'Eric to cancel the Splitsville Evercrunch Specials - though to be fair, Eric was on holiday when they struck, and fearsomely busy when he got back. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
LOL... Yep, and in Eric's case that's a pretty major factor to account for! :-) IIRC, in Matt's case splitter issues like that aren't really his bailiwick, and he was up to his belt buckle in alligators (and only had a broken baseball bat) with other issues at the time. ;-) But you're right... it was an 'exciting' couple of weeks for a lot of folks. :-D Alinator |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65771 Credit: 55,293,173 RAC: 49 |
In the meantime It just gives us something else to do. :D Oh well, I'm off lookin for My Elmer hat, Now where did It go? ;) The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Morris Send message Joined: 11 Sep 01 Posts: 57 Credit: 9,077,302 RAC: 29 |
Maybe this is slightly off-topic, but anyway .... I've found that one of my UNattended computer got at least one (this, this and also this one) of the wicked WU 13fe08ac, and is most probably idling for that reason ... The question is, is there anything i can do WITHOUT having access to the remote computer ? Most probably no ... i will have to wait till easter ... :( M. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmmm.... I'd have to say you are correct and this host is just twiddling its thumbs right now. I'm going to assume you have BOINC installed as a service, so can't you just give someone a call and have them reboot the machine? Just turning it on and off should the trick. In any event, when you do have access to it the next time, you may want to set it up so you can get remote access to at least BOINC if possible. Being able to VNC into it would be even better still. Alinator |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
... In which situations are hanging the multicore boxes? At my Quad is everything running well, without user activity.. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Presumably this is a re-occurrence of the multiple tasks exiting almost simultaneously issue we saw early on in the MB rollout. It seems to be far less prevalent this time around, but then it wasn't like it was ubiquitous then either. Alinator |
Morris Send message Joined: 11 Sep 01 Posts: 57 Credit: 9,077,302 RAC: 29 |
Hmmmm.... Not exactly true, Alinator, on that puter i'm not admin, so i couldn't install boinc as a service... Maybe i can call some1 at the office and let them reboot the machine. I would like to manage it with VNC, i've been messing aroung a bit with RealVNC lately, and i'm having fun with that, but unluckily company safety policy does not allow direct access from outside, so firewall, hidden IP address, and all that crap... Thanks anyway for the info M. [/added]Some time ago I've configured boinc for remote management, and it is working fine as long as i am connected to company lan, but this is not the case, darn .... [added] |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmmm... In that case a reboot won't help, since you don't probably want to give out your user account password to someone else. :-( Alinator |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
However, there have been a few reports of them hanging up when they try to exit on multicore machines in some situations. Not only multicore machines. I got two of them and neither one exited properly. The first one wasted 20 hours of crunch time before I killed it. The second one not that much since I played with the queue so I could see how it behaves. I did attach debugger to the first one and found out that it was stuck in more or less infinite loop in BOINC API, I think it was write_init_data function (sorry, I didn't write it down). If someone tells me what BOINC API version was used to compile 2.4V I could take another look at it with sources and try to find out why it was stuck in that loop. -Juha |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65771 Credit: 55,293,173 RAC: 49 |
However, there have been a few reports of them hanging up when they try to exit on multicore machines in some situations. 2.4V is a Seti app, Boinc is different number between about 4.45 to 6.10 or so. Crunch3r might be able to help If It's Seti related or somebody here for Boinc Probably. No I can't help, Just trying to clarify something. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Morris Send message Joined: 11 Sep 01 Posts: 57 Credit: 9,077,302 RAC: 29 |
No prob for password, i can easily give, no secrets on the puter.. the only thing i dont want to happen is that my boss aswer the call and i ask him to restart the company computer crunching Boinc 24/7 :D I need my salary, ya know ... :) M. |
[KWSN]John Galt 007 Send message Joined: 9 Nov 99 Posts: 2444 Credit: 25,086,197 RAC: 0 |
As it has been said many times: Run SETI@home only on computers that you own, or for which you have obtained the owner's permission. Some companies and schools have policies that prohibit using their computers for projects such as SETI@home. ...taken from the SETI@Home info page. Clk2HlpSetiCty:::PayIt4ward |
Morris Send message Joined: 11 Sep 01 Posts: 57 Credit: 9,077,302 RAC: 29 |
Yes, i've read the notice many times, but my company does not expressly prohibit that .... Don't worry John, in a certain way i OWN that 'puter, meaning it is under my responsibility .... instead of being there idle, while i'm abroad, that 'puter is helping the cruncher's cause. It normally runs just when it is idling, no harm in that, right ? Better than playing spider, or surfing the web the whole day ... M. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.