Message boards :
Number crunching :
Help with strange computer problem.
Message board moderation
Author | Message |
---|---|
NewtonianRefractor Send message Joined: 19 Sep 04 Posts: 495 Credit: 225,412 RAC: 0 |
I have this strange computer problem where sometimes All the work-units running concurrently get wasted, and all WU started after that get wasted too. I also can not start some programs after this error, while other ones start fine. The computer was 48 hour prime 95 stable, so I need help tracking this down. It happened around Christmas while I was crunching for ClimatePrediction, and It happened again now while I was crunching for WorldCommunityGrid. Around Christmas time i was out of town for 2 weeks, and when I got back the computer was still running in this strange state. A reboot fixed the problem. This time I was able to record a video of the problem right after it occurred, as I was using the computer when it happened. Here is a link to the video. http://www.youtube.com/watch?v=otj8_ldd8HY It is a little long, 5 minutes, but I tried to show the strange behavior of the computer right after the problem occurred. Any input would be appreciated. |
Robert Ribbeck Send message Joined: 7 Jun 02 Posts: 644 Credit: 5,283,174 RAC: 0 |
Do you have the latest windows updates?? What virus & spyware programs are you using ? |
Mike Bader Send message Joined: 18 May 99 Posts: 231 Credit: 20,366,214 RAC: 33 |
Sounds like a memory or virtual memory problem. What is using the most CPU? Try not using IE beta. Video drivers? What version of BOINC? Are you using your GPU for BOINC? Try closing all other programs. Try loading firefox in safe mode. FF safe mode not windows safe mode. What version of FF? Mike Bader BOINC V7.16.5 http://setiathome.berkeley.edu/team_join_form.php?id=5 - Join Our International Team [img]http://boinc.mundayweb.com/one/stats.php? |
Helli_retiered Send message Joined: 15 Dec 99 Posts: 707 Credit: 108,785,585 RAC: 0 |
LOL Really? That was the longest time that someone run Prime95 i ever read. :D Helli A loooong time ago: First Credits after SETI@home Restart |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
So your RAM is good how about doing a checkdisk and see if the HDD isn't failing In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Yeah I normally only run a 12-24 hour test, but I make sure I time the longest stretch during sleep and work lol. I've seen people run it a week at a time. I don't really see why though. Now the server I built had the same outcome of trashing work units out of nowhere. I took out my video card and re-seated them and it seemed to have fixed it. Maybe you could try that? Traveling through space at ~67,000mph! |
NewtonianRefractor Send message Joined: 19 Sep 04 Posts: 495 Credit: 225,412 RAC: 0 |
So My computer is fully up to date as I use Secunia PSI for patch control. My drivers also up to date, I checked with the manufacturers website. For Boinc I am running version 6.10.58 For my main drive I am running 2 x 1TB Western Digital Drivers in raid 0. I just ran checkdisk with no errors. I have a HIS Radeon 5770 which I used for collatz conjecture for about a week, but don't anymore. Since my computer was Prime95 stable, do you guys think this is a hardware issue not caught by Prime95, or is this some kind of software/driver issue? How Would I go about testing this? At this point the problem occurs just infrequently enough to be hard to diagnose, but yet often enough to be a concern. I am open to any suggestions. |
Dimly Lit Lightbulb 😀 Send message Joined: 30 Aug 08 Posts: 15399 Credit: 7,423,413 RAC: 1 |
I had a similar problem with programs crashing on start on others stopping responding, and like Mike Bader suggested it was a virtual memory problem. Windows sidebar was using over TWO gig of virtual memory. Every couple of days I restart sidebar, so far no issues. |
NewtonianRefractor Send message Joined: 19 Sep 04 Posts: 495 Credit: 225,412 RAC: 0 |
I had a similar problem with programs crashing on start on others stopping responding, and like Mike Bader suggested it was a virtual memory problem. Windows sidebar was using over TWO gig of virtual memory. Every couple of days I restart sidebar, so far no issues. I don't think that's it. I have 4 GB of ram and 6 GB of pagefile (windows recommended amount). On second thought, here is a screenshot of process explorer with all of the processes running right now. Does that look correct? |
Robert Ribbeck Send message Joined: 7 Jun 02 Posts: 644 Credit: 5,283,174 RAC: 0 |
I see no antivirus or anti spyware active Try scanning for malware It is after all widows |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
I see 3, @ ~33%, processes, WCGrid and 2 others (ProcessExplorer and M.S.AntiMalware). Expected to see 4 at ~25%, or 2 @ ~50%, but 3? Or 1 core not visible to system?? IMO impossible, or something is really messed up, like your CPU ! EDIT See, you are using a (C2D?) T2500, which is the troubled host? Host 5716726 ? <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Windows optimized S@H Enhanced application by Alex Kan Version info: SSE3x (AMD/Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE3x Win32 Build 76 , Ported by : Jason G, Raistmer, JDWhale CPUID: Genuine Intel(R) CPU T2500 @ 2.00GHz Speed: 2 x 1995 MHz Cache: L1=64K L2=2048K Features: MMX SSE SSE2 SSE3 |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Is this your machine? AuthenticAMD - AMD Athlon(tm) II X3 445 Processor Check out this utility: Control Panel -- Action Center -- Under Maintenance, click on "View Reliability History" For the days where you experienced this problem, see if there are any windows events or error that show what happened. Also, I strongly recommend that you suspend all BOINC computations and then exit BOINC completely. Leave it off until you get this sorted out. Does this crazy situation fix itself when you reboot? Does it take time to get strange, or is there a trigger to make it strange? Seems like someone/something kicked your computer and the video card or memory or something is loose inside the case. If all the hardware is OK, then your problems are probably software corruption. |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
I had the same issue when I decided to build my new server machine. It went on for about 3 days or so trashing about 1 in 4 WU's put through cuda. Since then I've ran for over a week with no errored units. I even tried putting a second power supply in the tower just for the two video cards, running malware and anti-virus wasn't needed since it was a fresh install. Even checked with power settings etc. Nothing helped, it just stopped erroring the units. Beginning to wonder if it was my issue to begin with and not some malformed WU's or something to that effect. Maybe trashed during the download, they all came during the 4th of January? Traveling through space at ~67,000mph! |
Dimly Lit Lightbulb 😀 Send message Joined: 30 Aug 08 Posts: 15399 Credit: 7,423,413 RAC: 1 |
I don't think that's it. I have 4 GB of ram and 6 GB of pagefile (windows recommended amount). Your right I don't think that is it, and everything there looks OK. |
NewtonianRefractor Send message Joined: 19 Sep 04 Posts: 495 Credit: 225,412 RAC: 0 |
Is this your machine? Yes, that is the computer. The situation fixes itself completely after a restart. It happens suddenly: there is some moment when I started to hear the Hard Drives start to seek as the WorldCommunity grid started to load 3 new workunits into memory at the same time(as the previous ones crashed). This is the moment that the corruption in IE started to happen. I was not using Firefox at the time, but when I tried to load it it would crash on start. Over the weekend I will try to reinstall the operating system and clean and reseat the video card and the ram. (I need to do a dust cleaning anyway). |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I finally sat down and watched your video. this looks to be a bad motherboard. it affects your graphics, HDD, and, programs ability to load into ram. I would try either updating the bios on that board or installing your parts into another system. At first I thought maybe you had a corrupted HDD but that wouldnt cause the crazy graphics. Also how much RAM do you have. Some WCG WU's requite a 1Gig or more each. if you dont have enough RAM you'll start hearing your HDD being used as virtual ram and making a lot of noise. Try running mydefrag. See if it comes up with any corrupted segments or if it fails to start altogether. try this with FF. unistall restart pc then reinstall FF. by restarting you make the computer install on a different part of the disk. uninstalling and reinstalling just paints it over the same directories and disk area. If the HDD is bad restarting and reinstalling may help In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
The graphic anomalies definitely point to a video card. With all the other issues you are having though it does sound like a motherboard issue. However have you tried putting a different video card in the machine? Have you tried starting BOINC cpu only and see what it does? A single bad part in a machine can cause some really weird situations to happen. The programs not wanting to open at all really really points to a motherboard issue, especially when paired up with the artifacts you are seeing. Traveling through space at ~67,000mph! |
NewtonianRefractor Send message Joined: 19 Sep 04 Posts: 495 Credit: 225,412 RAC: 0 |
The graphic anomalies definitely point to a video card. With all the other issues you are having though it does sound like a motherboard issue. However have you tried putting a different video card in the machine? Have you tried starting BOINC cpu only and see what it does? A single bad part in a machine can cause some really weird situations to happen. The programs not wanting to open at all really really points to a motherboard issue, especially when paired up with the artifacts you are seeing. This was my first computer build, so I do not have any spare parts at all. If it is the motherboard that sucks, because this happens so infrequently that quite frankly it will be more of a hassle for me to get it replaced |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Repair it yourself, it's relatively cheap and easy to do just take your time putting in the cpu and seating the heatsink. The rest is pretty much plug and play. If all else fails go to walmart and buy you a cheap-o video card to test with then take it back the next day or two and tell them it doesn't work with your computer. Not exactly the least sleezy way to do it but it works. Traveling through space at ~67,000mph! |
NewtonianRefractor Send message Joined: 19 Sep 04 Posts: 495 Credit: 225,412 RAC: 0 |
What I meant to say was that this kind of error happened 3 times since I put the computer together. It happened at the beginning of November, I dismissed it as related to aggressively overclocking, and turned down the CPU frequency by from 3.7 GHz to 3.6 GHz. Then it happened again over Christmas vacation around December 22. I was only able to deal with it when I got back around January 2nd. I decided to stop Boinc and do a 48 hour Prime95 test to see if the error will reproduce. The computer ran the Prime95 test just fine, no error. Then the error happened again as I was running Boinc on January 11th. With the occurrence of about 1 a month on average I don't know how to track this down. But every time the error occurred I had to do a reboot and everything went back to normal. This prevents me from leaving the computer unattended for long periods of time, and it also prevents me from running ClimatePrediction.net because some of those workunits take 3 weeks to complete. The error in December trashed workunits that had about 20 days of computation done on them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.