-177 (0xffffffffffffff4f) Faults

Author	Message
Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1005703 - Posted: 18 Jun 2010, 13:49:16 UTC Last modified: 18 Jun 2010, 13:55:03 UTC Overnight this computer has downloaded 15 version 603 MB work units. All errored out with the same fault code of -177 (0xffffffffffffff4f). I have now removed version 603 MB from my app_info so that no more will be requested. I would be happy to add the 603 info back into the app_info again if anyone has any trouble shooting they want to do. Of the 15 errored out work units, 10 failed at a run time of 0.00. 3 errored with a run time of exactly 1,533.89. 1 errored out at exactly 1,534.45 and 1 errored out at exactly 1,534.22 seconds. This all concerns this computer only. I am convinced that this is NOT a problem on my end. 3 errors at exactly the same time and 10 errors at exactly the same time? Not a hardware problem here. And again this computer was successfully crunching 603 work units before the server changes earlier this week. Again if anyone at Berkeley is interested, I'm still here. I have copy's of the error's that report to Microsoft available if intersted. [edit] More info here from yesterday. Boinc....Boinc....Boinc....Boinc.... ID: 1005703 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1005709 - Posted: 18 Jun 2010, 14:06:42 UTC - in response to Message 1005703. Of the 15 errored out work units, 10 failed at a run time of 0.00. 3 errored with a run time of exactly 1,533.89. 1 errored out at exactly 1,534.45 and 1 errored out at exactly 1,534.22 seconds. Pretty strange to get "Maximum elapsed time exceeded" in such a short run time. Perhaps you should download a new copy of the 6.03 exe. FWIW, did you check your app_info.xml for <flops> statements, and what is your DCF on that host? GruÃŸ, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours ID: 1005709 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1005718 - Posted: 18 Jun 2010, 14:17:18 UTC Last modified: 18 Jun 2010, 14:18:30 UTC This is a completely new install from yesterday evening. No flops are in the app_info file. Yes I copied a new exe into the work directory but the errors still coming. Crunches AP just fine. [edit] DCF now is .9598 Boinc....Boinc....Boinc....Boinc.... ID: 1005718 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 1005722 - Posted: 18 Jun 2010, 14:19:54 UTC Not the application. This is eerily reminiscent of a sequence which took place at SETI Beta during testing, reported to the boinc_alpha buglist. "Richard Haselgrove" wrote: I got a block of SETI Beta tasks yeasterday which all errored out with 01-Jun-2010 00:31:11 [SETI@home Beta Test] Aborting task 18dc09aa.22310.12337.5.13.19_1: exceeded elapsed time limit 0.000000 (details followed) "David Anderson" wrote: I fixed the problem. As Richard suggested, it affected only jobs that were being resent to a client using anonymous platform. -- David "Richard Haselgrove" wrote: David got this sorted off-list yesterday - according to changeset 21671, there was a problem with resent tasks on anonymous platform. Except - today, I'm getting the same problem on tasks which are not resent (but are anonymous platform - host 23491) Tasks show zero time To completion' in BOINC Manager, and have the same <rsc_fpops_est>0.000000</rsc_fpops_est> <rsc_fpops_bound>0.000000</rsc_fpops_bound> "David Anderson" wrote: possibly fixed now. -- David "Richard Haselgrove" wrote: Yes, tasks issued since around 19:30 UTC today have had plausible (if to my eye slightly low) fpops_est values So, Beta testing can put a bandaid over some problems. But it sounds as if either (a) the bandaid hasn't been copied to the main project, or (b) yet a third variant of the problem has surfaced. But I'm pretty sure it's a server problem - check those <rsc_fpops_bound> values to be certain. I've got to set off for a 3-hour cross-country drive now, and I don't have the time or enough detail to report it now. But if people can check and post their findings while I'm en-route, I'll check in once I've arrived and found a computer to fire up. ID: 1005722 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1005728 - Posted: 18 Jun 2010, 14:25:25 UTC - in response to Message 1005722. I can't find any <rsc_fpops_bound> with a 603 app since I don't have any at the moment. Will reenable and report as soon as I can catch a 603. Boinc....Boinc....Boinc....Boinc.... ID: 1005728 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1005743 - Posted: 18 Jun 2010, 15:07:45 UTC Last modified: 18 Jun 2010, 15:21:18 UTC Just found this computer had the same fault on June 15 on nearly 40 work units. This seems to be a serious problem, among many problems after the server upgrade. [edit] Here is the info from the client_state for some of the 603 work units waiting to crunch. <rsc_fpops_bound>61671868056682.000000</rsc_fpops_bound> <rsc_fpops_bound>211269334891737.310000</rsc_fpops_bound> <rsc_fpops_bound>61671868056682.000000</rsc_fpops_bound> <rsc_fpops_bound>211445572728336.590000</rsc_fpops_bound> <rsc_fpops_bound>211445572728336.590000</rsc_fpops_bound> <rsc_fpops_bound>211445572728336.590000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> <rsc_fpops_bound>211405094592406.970000</rsc_fpops_bound> I can send the entire clien_state.xml file if anyone is interested. Boinc....Boinc....Boinc....Boinc.... ID: 1005743 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 1005797 - Posted: 18 Jun 2010, 17:29:30 UTC - in response to Message 1005743. Here is the info from the client_state for some of the 603 work units waiting to crunch. Those ones are OK (by eye - not got a calculator out to check). The "smoking gun" would be to find one which displayed a zero, or ridiculously low, "To completion" time in Task Manager before it even started, then check the <rsc_fpops_bound> for that exact task. I can send the entire clien_state.xml file if anyone is interested. No thanks, I'm on a rustic computer, on the end of a rustic telephone line, in the depths of the country. Back to full equipment on Monday, if it's still a problem then. ID: 1005797 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1005799 - Posted: 18 Jun 2010, 17:31:02 UTC - in response to Message 1005797. Have a good weekend Richard. Thanks for your help. Boinc....Boinc....Boinc....Boinc.... ID: 1005799 ·

Dave Lewis Send message Joined: 12 Apr 99 Posts: 34 Credit: 53,432,603 RAC: 108	Message 1005808 - Posted: 18 Jun 2010, 17:53:06 UTC My computer also generated a ton of these errors too but they involve v6.08 cuda wu's. I haven't checked for a week or so and I was really surprised to find this. I had to detach in early May because of a failing hard drive and when I set my system up again with stock software everything looked to be working okay until I noticed the errors just now. Any suggestions appreciated. ID: 1005808 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1005812 - Posted: 18 Jun 2010, 18:00:19 UTC - in response to Message 1005808. Any suggestions appreciated. See Richard Haselgrove's posts in this same thread. GruÃŸ, Gundolf ID: 1005812 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1005842 - Posted: 18 Jun 2010, 19:29:34 UTC - in response to Message 1005743. Last modified: 18 Jun 2010, 19:39:50 UTC I'm now on my K6-2+ / 524 MHz computer It has 4 tasks, uses opt app Last download: 17-Jun-2010 14:44:02 [SETI@home] Finished download of 18mr10aa.31979.20517.15.10.129 I see that rsc_fpops_bound is exactly 10 times the rsc_fpops_est: <rsc_fpops_est>159697111958278.000000</rsc_fpops_est> <rsc_fpops_bound>1596971119582780.000000</rsc_fpops_bound> <rsc_fpops_est>160566093629472.000000</rsc_fpops_est> <rsc_fpops_bound>1605660936294720.000000</rsc_fpops_bound> <rsc_fpops_est>163980079177073.000000</rsc_fpops_est> <rsc_fpops_bound>1639800791770730.000000</rsc_fpops_bound> <rsc_fpops_est>161006148967213.000000</rsc_fpops_est> <rsc_fpops_bound>1610061489672130.000000</rsc_fpops_bound> <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>KWSN_2.4V_MMX_MB.exe</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>528</version_num> <file_ref> <file_name>KWSN_2.4V_MMX_MB.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1005842 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1005859 - Posted: 18 Jun 2010, 20:16:36 UTC - in response to Message 1005842. ... I see that [b]rsc_fpops_bound is exactly 10 times the rsc_fpops_est ... That's true, the splitter uses that ratio and BOINC's attempt to do per-application estimates server-side adjust both by the same amount. However, in the core client the Duration Correction Factor is applied only to estimates, not the bound. With almost all hosts having DCF around 0.2 the effective bound was about 50 times the estimate. With the server-side adjustment effectively trying to drive DCF toward 1.0, the allowance should reduce to around 10. If BOINC's adjustments were reasonably accurate that would be adequate, but at least for now they are not that accurate. Because the bound is merely intended to keep a corrupted task or application from running forever, increasing it to give your host time to do the work won't harm the project in any way. More info in my post in Some changes made to this recent BOINC update BUT. Joe ID: 1005859 ·

Terror Australis Volunteer tester Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44	Message 1006098 - Posted: 19 Jun 2010, 8:13:20 UTC Just got one of these myself. Task 1638222570 if any one wants to look at the details. DCF was reading 7.8. I've reset it back to 1. I've still got the <flops> values in the app_info file for this box. Whats the current thinking on this, are flops values in or out ? And yes the protection is working, Max tasks for Anon Platform, nvidia GPU has been reset back to 99 from 208. :-P Brodo ID: 1006098 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1006770 - Posted: 21 Jun 2010, 3:19:55 UTC I received 11 more error's today, this time on cuda work, just seconds before they would have ended normaly. example here Must be many other crunchers in this predicament. Boinc....Boinc....Boinc....Boinc.... ID: 1006770 ·

hiamps Volunteer tester Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0	Message 1006777 - Posted: 21 Jun 2010, 3:51:51 UTC - in response to Message 1006770. Yep, same here. Official Abuser of Boinc Buttons... And no good credit hound! ID: 1006777 ·

RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0	Message 1006789 - Posted: 21 Jun 2010, 4:49:19 UTC - in response to Message 1006777. ditto running AK_v8b_win_x64_SSE41 ID: 1006789 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1006794 - Posted: 21 Jun 2010, 5:36:56 UTC - in response to Message 1006789. ditto running AK_v8b_win_x64_SSE41 You still have MB (NON cuda) work??? I have not received one all weekend. Nothing except cuda and AP work all weekend. And not one VLAR that could be rescheduled to the cpu's. Looks like they cherry picked the tapes (disks) for the weekend to minimize problems. Boinc....Boinc....Boinc....Boinc.... ID: 1006794 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 1007034 - Posted: 21 Jun 2010, 19:57:43 UTC Finally catching up after my weekend away. Looks like my Fermi had about 30 of these: All tasks for computer 4292666 - all being stopped in their tracks at around 5 minutes. I had flops correction in place at the time, but no effective VLAR catcher: now I've swapped that over - no flops entry in app_info, and ReSchedule installed. All I need now are some tasks to try it out - stuffed full of Beta at the moment. ID: 1007034 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1007037 - Posted: 21 Jun 2010, 20:02:52 UTC - in response to Message 1007034. wish I could get beta In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1007037 ·

Area 51 Send message Joined: 31 Jan 04 Posts: 965 Credit: 42,193,520 RAC: 0	Message 1008843 - Posted: 27 Jun 2010, 6:15:03 UTC Last modified: 27 Jun 2010, 6:15:15 UTC What is the current thinking on these errors? I have accumulated 118 over-night! I' had comms disabled, so I could re-process them.... Any thoughts? ID: 1008843 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.