Computation Error

Message boards : Number crunching : Computation Error
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile James Butler
Volunteer tester
Avatar

Send message
Joined: 26 Jul 11
Posts: 139
Credit: 6,266
RAC: 0
United Kingdom
Message 1132659 - Posted: 27 Jul 2011, 19:15:06 UTC

Unfortinatly i dont know where to start this thread so i am hoping this is the right place, i am new to boinc and the projects, so far things have been going great but NFS@Home have given the status Computation Error, is there anything i can do to fix this, or have i done something wrong ???
ID: 1132659 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1132669 - Posted: 27 Jul 2011, 19:45:57 UTC - in response to Message 1132659.  
Last modified: 27 Jul 2011, 19:48:46 UTC

Eh, this is Seti@home, none of your Seti tasks have Errored, and it is too soon to be able to see user id's at other projects you're running, a link to your NFS@Home host or user id might help,

All tasks for computer 6115874

Claggy
ID: 1132669 · Report as offensive
Profile James Butler
Volunteer tester
Avatar

Send message
Joined: 26 Jul 11
Posts: 139
Credit: 6,266
RAC: 0
United Kingdom
Message 1132674 - Posted: 27 Jul 2011, 20:02:37 UTC - in response to Message 1132659.  
Last modified: 27 Jul 2011, 20:02:50 UTC

user id 9224 at NSF
ID: 1132674 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1132676 - Posted: 27 Jul 2011, 20:04:22 UTC - in response to Message 1132659.  
Last modified: 27 Jul 2011, 20:07:05 UTC

And on the NFS@Home Message boards already are several threads about errors while computing.

Gruß,
Gundolf
[edit]And you haven't reported the erroneous task yet, so we can't see the stderr output.[/edit]
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 1132676 · Report as offensive
Profile James Butler
Volunteer tester
Avatar

Send message
Joined: 26 Jul 11
Posts: 139
Credit: 6,266
RAC: 0
United Kingdom
Message 1132677 - Posted: 27 Jul 2011, 20:05:28 UTC - in response to Message 1132676.  

Thanks
ID: 1132677 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1132680 - Posted: 27 Jul 2011, 20:16:00 UTC - in response to Message 1132674.  
Last modified: 27 Jul 2011, 20:18:06 UTC

user id 9224 at NSF
These are your tasks at NFS@home, none are shown to have errored (yet):

All tasks for computer 18693

Claggy
ID: 1132680 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1132743 - Posted: 27 Jul 2011, 22:56:32 UTC

Well, he may not have any errors in this project but, I do. Seven of them, but potentially, several dozen more. Its not hard to see the reason why. They have errored, because the WUs took too long, which is not surprising, given how long the estimated time to run them was in BOINC Manager. From what I can see, the ones that errored at around 1500 secs, should've run for roughly twice that length of time,to complete, but the estimated time in BOINC manager for those tasks is an astonishingly quick, 2m 31s or 2m 32s! Better still, is the one that ran for about 420 secs....shorties usually run for about 800 secs, but in BOINC manager the time was estimated at an eye-wateringly rapid, 42 secs!

I know GPUs are fast, but how were the estimated times got so badly wrong? When crunching gets that fast, we'll probably all have to take up knitting or something like that. The thing that slightly annoys me, is that I now have an 'error history' and its nothing to do with my end of things. Oh well, I'll finish all the non-GPU WUs and then do a detach, to purge all the GPU WUs.



Don't take life too seriously, as you'll never come out of it alive!
ID: 1132743 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1132873 - Posted: 28 Jul 2011, 7:01:15 UTC
Last modified: 28 Jul 2011, 7:01:59 UTC

Look at your host details Iona.
Your APR is 1133 whats way to high.

You can try Freds resheduler to avoid -177s.


With each crime and every kindness we birth our future.
ID: 1132873 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1133057 - Posted: 28 Jul 2011, 17:57:16 UTC - in response to Message 1132873.  

Indeed, it is high, but it wasn't that high before....not that I have any idea how to change that or even if it is something I can change. By my own reckoning, using the GPU on WUs was in the order of 4 or 5 times faster than the CPU, so the APR is about 10 times higher than it should be. I think I'll have to give the resheduler a miss, as potentially adding another 65 or so CPU tasks, does have a downside! Maybe I'll just give up using the GPU....



Don't take life too seriously, as you'll never come out of it alive!
ID: 1133057 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 1133083 - Posted: 28 Jul 2011, 18:56:51 UTC - in response to Message 1133057.  

I think I'll have to give the resheduler a miss, as potentially adding another 65 or so CPU tasks, does have a downside! Maybe I'll just give up using the GPU....



As I understand it, one of the features of the rescheduler tool is to avoid the -177's without swapping any WU's between CPU and GPU. I think this is what was being suggested.

F.
ID: 1133083 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1133097 - Posted: 28 Jul 2011, 20:15:58 UTC - in response to Message 1133083.  

Thanks, Mike and Fred W - I've decided to detach and do things the hard way, with a CPU, from now. Don't get me wrong, the actual ATI apps ran very well; it was BOINC that messed things up, as far as I'm concerned. A pity. In any case, as good as the Gelid cooler is, at cooling the GPU, it is really bad for 'adding' heat to the whole system, as it does not exhaust that heat from the GPU, outside the case! I've even clocked the card settings down by some 75 - 100 Mhz on the GPU Core and GPU Mem, yet, the heat from the card would still turn the room the PC is in, into a sauna after a few hours.....that was even running with the sides off the case and aiming a desk fan at the insides! I've been looking at getting an E8400 or similar for a while now, purely because this E6550 just can't move things around fast enough for the 4890 to really do it's stuff in games, so, its still experience. Its also one of the reasons that I never let the other PC run GPU tasks on it's 4870.






Don't take life too seriously, as you'll never come out of it alive!
ID: 1133097 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1133103 - Posted: 28 Jul 2011, 20:41:39 UTC - in response to Message 1133097.  
Last modified: 28 Jul 2011, 20:59:03 UTC

Detaching won't help, your host will still have the same host ID, and the server will still have those High APR values for the ATI MB app,
the only way to lower the APR values is to complete work without running out of time,
eithier by manually adding a Zero to each <rsc_memory_bound> entry for each ATI MB entry, or use Fred's tool to do it for you,
(That is if you decide to run the ATI MB app in the future, like next winter)

There is a Changeset that might do this, but it is new, i don't know to what extent the values are reset, and whether it is applied here yet.

Claggy
ID: 1133103 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1133111 - Posted: 28 Jul 2011, 20:59:12 UTC - in response to Message 1133103.  

Detaching won't help, your host will still have the same host ID, and the server will still have those High APR values for the ATI MB app,
the only way to lower the APR values is to complete work without running out of time,
eithier by manually adding a Zero to each <rsc_memory_bound> entry for each ATI MB entry, or use Fred's tool to do it for you,

There is a Changeset that might do this, but it is new, i don't know to what extent the values are reset, and whether it is applied here yet.

Claggy

That changeset is 'admin web' - i.e. the administrative control interface that project administrators see on their servers. Not something us mere mortals have access to.

The easiest way to reset those metrics is to manually force a host renumbering. Finish and report any work in progress, and shut down BOINC.

Open client_state.xml for editing with notepad or similar.
Find <master_url>http://setiathome.berkeley.edu/ (our project)
Within that project, find <rpc_seqno>, and REDUCE the number.

Start BOINC again, and allow it to fetch work. The server should detect that something fishy has been going on, assign a new HostID, and send you some work with default settings. You can safely merge the new computer with the old record with the same name - credit, RAC etc. will be copied forward, but APR and the other CreditNew values won't be copied. Bug or feature? I'll let you decide.
ID: 1133111 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1133422 - Posted: 29 Jul 2011, 10:57:00 UTC - in response to Message 1133103.  

Thanks for the info, Richard and Claggy. As you may have noticed, I've already done the detach and am just going to run with the CPU on MB and AP WUs, with the GPU only running AP....for some reason, the AP timings were about right. Would I be correct in thinking that a sufficiently high percentage of 'overflow' results that were validated as being 'good', could have helped to raise the APR to the ludicrous level that it is at, for ATI GPU MB, given the relatively small number of GPU tasks completed? Pure conjecture on my part, but logical.

As to the heat problem, I am now making some crude shrouding/baffles for the 4890, to 'encourage' the exhausted heat from the card to go towards the rear extractor fan (which has been 'beefed-up' to a Scythe unit that moves 50% more air than the old unit). If that doesn't do the trick, to my liking, I'll make some holes in the side of the case and strategically fit a pair of 120mm Scythe fans to extract hte warmed air. Once again, my thanks to everyone for their valuable assistance.

Down, but not, out.....






Don't take life too seriously, as you'll never come out of it alive!
ID: 1133422 · Report as offensive

Message boards : Number crunching : Computation Error


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.