Message boards :
Number crunching :
-177 (0xffffffffffffff4f) Faults
Message board moderation
Author | Message |
---|---|
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Overnight this computer has downloaded 15 version 603 MB work units. All errored out with the same fault code of -177 (0xffffffffffffff4f). I have now removed version 603 MB from my app_info so that no more will be requested. I would be happy to add the 603 info back into the app_info again if anyone has any trouble shooting they want to do. Of the 15 errored out work units, 10 failed at a run time of 0.00. 3 errored with a run time of exactly 1,533.89. 1 errored out at exactly 1,534.45 and 1 errored out at exactly 1,534.22 seconds. This all concerns this computer only. I am convinced that this is NOT a problem on my end. 3 errors at exactly the same time and 10 errors at exactly the same time? Not a hardware problem here. And again this computer was successfully crunching 603 work units before the server changes earlier this week. Again if anyone at Berkeley is interested, I'm still here. I have copy's of the error's that report to Microsoft available if intersted. [edit] More info here from yesterday. Boinc....Boinc....Boinc....Boinc.... |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Of the 15 errored out work units, 10 failed at a run time of 0.00. 3 errored with a run time of exactly 1,533.89. 1 errored out at exactly 1,534.45 and 1 errored out at exactly 1,534.22 seconds. Pretty strange to get "Maximum elapsed time exceeded" in such a short run time. Perhaps you should download a new copy of the 6.03 exe. FWIW, did you check your app_info.xml for <flops> statements, and what is your DCF on that host? Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
This is a completely new install from yesterday evening. No flops are in the app_info file. Yes I copied a new exe into the work directory but the errors still coming. Crunches AP just fine. [edit] DCF now is .9598 Boinc....Boinc....Boinc....Boinc.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Not the application. This is eerily reminiscent of a sequence which took place at SETI Beta during testing, reported to the boinc_alpha buglist. "Richard Haselgrove" wrote: I got a block of SETI Beta tasks yeasterday which all errored out with "David Anderson" wrote: I fixed the problem. As Richard suggested, "Richard Haselgrove" wrote: David got this sorted off-list yesterday - according to changeset 21671, there was a problem with resent tasks on anonymous platform. "David Anderson" wrote: possibly fixed now. "Richard Haselgrove" wrote: Yes, tasks issued since around 19:30 UTC today have had plausible (if to my eye slightly low) fpops_est values So, Beta testing can put a bandaid over some problems. But it sounds as if either (a) the bandaid hasn't been copied to the main project, or (b) yet a third variant of the problem has surfaced. But I'm pretty sure it's a server problem - check those <rsc_fpops_bound> values to be certain. I've got to set off for a 3-hour cross-country drive now, and I don't have the time or enough detail to report it now. But if people can check and post their findings while I'm en-route, I'll check in once I've arrived and found a computer to fire up. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I can't find any <rsc_fpops_bound> with a 603 app since I don't have any at the moment. Will reenable and report as soon as I can catch a 603. Boinc....Boinc....Boinc....Boinc.... |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Just found this computer had the same fault on June 15 on nearly 40 work units. This seems to be a serious problem, among many problems after the server upgrade. [edit] Here is the info from the client_state for some of the 603 work units waiting to crunch. <rsc_fpops_bound>61671868056682.000000</rsc_fpops_bound> <rsc_fpops_bound>211269334891737.310000</rsc_fpops_bound> <rsc_fpops_bound>61671868056682.000000</rsc_fpops_bound> <rsc_fpops_bound>211445572728336.590000</rsc_fpops_bound> <rsc_fpops_bound>211445572728336.590000</rsc_fpops_bound> <rsc_fpops_bound>211445572728336.590000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> I can send the entire clien_state.xml file if anyone is interested. Boinc....Boinc....Boinc....Boinc.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Here is the info from the client_state for some of the 603 work units waiting to crunch. Those ones are OK (by eye - not got a calculator out to check). The "smoking gun" would be to find one which displayed a zero, or ridiculously low, "To completion" time in Task Manager before it even started, then check the <rsc_fpops_bound> for that exact task. I can send the entire clien_state.xml file if anyone is interested. No thanks, I'm on a rustic computer, on the end of a rustic telephone line, in the depths of the country. Back to full equipment on Monday, if it's still a problem then. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Have a good weekend Richard. Thanks for your help. Boinc....Boinc....Boinc....Boinc.... |
Dave Lewis Send message Joined: 12 Apr 99 Posts: 34 Credit: 53,432,603 RAC: 108 |
My computer also generated a ton of these errors too but they involve v6.08 cuda wu's. I haven't checked for a week or so and I was really surprised to find this. I had to detach in early May because of a failing hard drive and when I set my system up again with stock software everything looked to be working okay until I noticed the errors just now. Any suggestions appreciated. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Any suggestions appreciated. See Richard Haselgrove's posts in this same thread. Gruß, Gundolf |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I'm now on my K6-2+ / 524 MHz computer It has 4 tasks, uses opt app Last download: 17-Jun-2010 14:44:02 [SETI@home] Finished download of 18mr10aa.31979.20517.15.10.129 I see that rsc_fpops_bound is exactly 10 times the rsc_fpops_est: <rsc_fpops_est>159697111958278.000000</rsc_fpops_est> <rsc_fpops_bound>1596971119582780.000000</rsc_fpops_bound> <rsc_fpops_est>160566093629472.000000</rsc_fpops_est> <rsc_fpops_bound>1605660936294720.000000</rsc_fpops_bound> <rsc_fpops_est>163980079177073.000000</rsc_fpops_est> <rsc_fpops_bound>1639800791770730.000000</rsc_fpops_bound> <rsc_fpops_est>161006148967213.000000</rsc_fpops_est> <rsc_fpops_bound>1610061489672130.000000</rsc_fpops_bound> <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>KWSN_2.4V_MMX_MB.exe</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>528</version_num> <file_ref> <file_name>KWSN_2.4V_MMX_MB.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... That's true, the splitter uses that ratio and BOINC's attempt to do per-application estimates server-side adjust both by the same amount. However, in the core client the Duration Correction Factor is applied only to estimates, not the bound. With almost all hosts having DCF around 0.2 the effective bound was about 50 times the estimate. With the server-side adjustment effectively trying to drive DCF toward 1.0, the allowance should reduce to around 10. If BOINC's adjustments were reasonably accurate that would be adequate, but at least for now they are not that accurate. Because the bound is merely intended to keep a corrupted task or application from running forever, increasing it to give your host time to do the work won't harm the project in any way. More info in my post in Some changes made to this recent BOINC update BUT. Joe |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Just got one of these myself. Task 1638222570 if any one wants to look at the details. DCF was reading 7.8. I've reset it back to 1. I've still got the <flops> values in the app_info file for this box. Whats the current thinking on this, are flops values in or out ? And yes the protection is working, Max tasks for Anon Platform, nvidia GPU has been reset back to 99 from 208. :-P Brodo |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I received 11 more error's today, this time on cuda work, just seconds before they would have ended normaly. example here Must be many other crunchers in this predicament. Boinc....Boinc....Boinc....Boinc.... |
hiamps Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0 |
Yep, same here. Official Abuser of Boinc Buttons... And no good credit hound! |
RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0 |
ditto running AK_v8b_win_x64_SSE41 |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
ditto You still have MB (NON cuda) work??? I have not received one all weekend. Nothing except cuda and AP work all weekend. And not one VLAR that could be rescheduled to the cpu's. Looks like they cherry picked the tapes (disks) for the weekend to minimize problems. Boinc....Boinc....Boinc....Boinc.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Finally catching up after my weekend away. Looks like my Fermi had about 30 of these: All tasks for computer 4292666 - all being stopped in their tracks at around 5 minutes. I had flops correction in place at the time, but no effective VLAR catcher: now I've swapped that over - no flops entry in app_info, and ReSchedule installed. All I need now are some tasks to try it out - stuffed full of Beta at the moment. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
wish I could get beta In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Area 51 Send message Joined: 31 Jan 04 Posts: 965 Credit: 42,193,520 RAC: 0 |
|
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.