Message boards :
Number crunching :
Really strange problem
Message board moderation
Author | Message |
---|---|
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Like many of us crunchers I have 5 computers running in my home on a LAN. I use Boinc View and Boinc Manager to log into the remote compters and check up on Boinc. One particular box suddenly started having problems today. If I use Boinc Manager to log into Boinc on that particular box it displays about 8 or 10 work units in the work list and then Boinc Manager locks up. I can log into that box by using a remote desktop connection and run Boinc Manager on that machine, it will lock up Boinc Manager at the same point. Checking task manager on that box shows 4 work units being crunched and they continue to crunch normally. Boinc View will read the box one time then fail to read it again at the next polling. The work list in Boinc View is stalled at the same point. Anybody seen this before?? Boinc....Boinc....Boinc....Boinc.... |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Same problem this morning. I can't figure a way to get into Boinc and tell it to do anything. Can't order no new work, reset or anything else. I can only start and stop the service. Can't do anything else except let it run. Boinc....Boinc....Boinc....Boinc.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I suggest you use boinccmd (or boinc_cmd if your system pre-dates the name change) to set nomorework on every project on that host, and report anything completed once you see task manager drop to idle on all cores. Then, stop the service and examine the entrails in client_state.xml: or just use boinccmd again to reset all projects. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Thanks Richard..........I get an error message about unrecognized command. I tried...."boinccmd nomorework" Anyone know the exact syntax to stop work requests? Never mind......I figured it out. At least it ran the command without an error message. Hopefully it will not request more work and will run down the cache. Thanks Richard. It is this computer number 3966329 in case anyone is interested. Boinc....Boinc....Boinc....Boinc.... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
That's why I made the boinccmd a clickable link into the Wiki where the syntax is defined in full! It'll be something like boinccmd --project http://setiathome.berkeley.edu nomorework One more though: if you do succeed in stopping new work, and flushing the queue(s) down to zero, then, after reporting (see the Wiki again!) whatever is still shown as 'in progress' on the project task list(s) may give you a clue which project(s) need resetting. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Thanks Richard. At least over the last night it continued to run normally which it would do running as a service. At least now I can let the cache run down and when it empty's I can reset it or something. With a 7 day cache I didn't want to reset it now and have several hundred people upset with me. Still an interesting situation though. To lose control of Boinc. Boinc....Boinc....Boinc....Boinc.... |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
That is very strange. Two things I always ask: 1) Has anything changed on your workstation or the box in question? 2) Have you rebooted the box in question since this event occurred? Other than that, seems like you're on the right track to fixing it. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Sounds like greeblies in your tcp/ip stack. I find beer helps. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
The computer in question runs without keyboard and monitor. Sits in the corner of the dinning room creating much needed warmth at this time of year. Yes, I logged into the computer with remote desktop and rebooted after I discovered this problem. It is still running normally except I am blind with respect to Boinc. Boinc....Boinc....Boinc....Boinc.... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Which version of BOINC and for that matter, which version of BOINC View? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
|
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I use Boinc version 5.10.28 which is then overwritten with Crunch3r's Boinc 6.1.0.32 V5. I did a removal of Boinc and then reinstalled but the results were the same. I believe that some corruption of the Client_State file happened after yesterday's scheduled outage. If this is true it won't do any good to revert to the Client_State_Previous file because by now the corruption would be there also. The box is crunching. I will just let it go until the cache is done or it finally runs into the corruption in the state file. I had tried removing 4 work units from the projects folder that I thought might be related. Upon restarting the service the missing work was downloaded again and the problem was not solved. Just don't know what to do now except let it run. I certainly do not want to abort all this work just yet. Boinc....Boinc....Boinc....Boinc.... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Can you check if there is any information on this in the stderrgui.txt and stdoutgui.txt files? |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
From stdoutgui [12/09/08 21:03:17] TRACE [2948]: RPC_CLIENT::init boinc_socket returned 516 [12/09/08 21:03:17] TRACE [2948]: RPC_CLIENT::init connect returned -1 [12/09/08 21:03:17] TRACE [2948]: RPC_CLIENT::init attempting connect [12/09/08 21:03:18] TRACE [2948]: RPC_CLIENT::init_poll sock = 516 [12/09/08 21:03:18] TRACE [2948]: RPC_CLIENT::init_poll connected to port 31416 [12/09/08 21:03:18] TRACE [2948]: CAN'T FIND PROJECT http://setiathome.berkeley.edu/ [12/09/08 22:06:49] TRACE [2848]: RPC_CLIENT::init boinc_socket returned 516 [12/09/08 22:06:49] TRACE [2848]: RPC_CLIENT::init connect returned -1 [12/09/08 22:06:49] TRACE [2848]: RPC_CLIENT::init attempting connect [12/09/08 22:06:49] TRACE [2848]: RPC_CLIENT::init_poll sock = 516 [12/09/08 22:06:49] TRACE [2848]: RPC_CLIENT::init_poll sock = 516 [12/09/08 22:06:49] TRACE [2848]: RPC_CLIENT::init_poll connected to port 31416 [12/09/08 22:06:49] TRACE [2848]: CAN'T FIND PROJECT http://setiathome.berkeley.edu/ From stderrgui and seems to occur everytime I try to run Boinc Manager. BOINC Windows Runtime Debugger Version 5.10.28 Dump Timestamp : 12/09/08 20:44:18 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004A0DE4 read attempt to address 0x00000064 Engaging BOINC Windows Runtime Debugger... ******************** (What follows is very long............ If anyone really wants to see the file send me a PM with an email address and I can send it along.) Boinc....Boinc....Boinc....Boinc.... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
PM sent. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Files Sent Boinc....Boinc....Boinc....Boinc.... |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Just spent a couple of hours emailing with Ageless. Due to his patience and wonderful trouble shooting I have located and replaced the offending file in the Boinc directory. Microsoft.VC80.CRT.manifest.dll was at fault for the entire mess. I copied the file from another computer over the offending file, file sizes the same before and after. Ageless spends countless hours here trouble shooting and helping folks and I for one do not have the words to thank him enough for his help. I did not want to dump all the cached work and start over and with his help I didn't have to. Thankyou Ageless Boinc....Boinc....Boinc....Boinc.... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
You're welcome. I'll now go sit here with the window open, waiting for the blush to recede. :-) |
Sirius B Send message Joined: 26 Dec 00 Posts: 24911 Credit: 3,081,182 RAC: 7 |
This is what I like about the N/C board, terrific help whenever needed. Well done Ageless. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
You're welcome. . . . well Sir - looks like You're going to be on top: 'Kudos system on the BOINC forums' - Well done [Kudo's to You] BOINC Wiki . . . Science Status Page . . . |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.