Help with strange computer problem.

Message boards : Number crunching : Help with strange computer problem.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 8 · Next

AuthorMessage
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 1065846 - Posted: 12 Jan 2011, 7:53:57 UTC
Last modified: 12 Jan 2011, 8:32:01 UTC

I have this strange computer problem where sometimes All the work-units running concurrently get wasted, and all WU started after that get wasted too.

I also can not start some programs after this error, while other ones start fine.

The computer was 48 hour prime 95 stable, so I need help tracking this down.

It happened around Christmas while I was crunching for ClimatePrediction, and It happened again now while I was crunching for WorldCommunityGrid.

Around Christmas time i was out of town for 2 weeks, and when I got back the computer was still running in this strange state. A reboot fixed the problem.

This time I was able to record a video of the problem right after it occurred, as I was using the computer when it happened.

Here is a link to the video.

http://www.youtube.com/watch?v=otj8_ldd8HY

It is a little long, 5 minutes, but I tried to show the strange behavior of the computer right after the problem occurred.

Any input would be appreciated.
ID: 1065846 · Report as offensive
Robert Ribbeck
Avatar

Send message
Joined: 7 Jun 02
Posts: 644
Credit: 5,283,174
RAC: 0
United States
Message 1065864 - Posted: 12 Jan 2011, 11:39:47 UTC - in response to Message 1065846.  

Do you have the latest windows updates??
What virus & spyware programs are you using ?
ID: 1065864 · Report as offensive
Profile Mike Bader Project Donor
Volunteer tester
Avatar

Send message
Joined: 18 May 99
Posts: 231
Credit: 20,366,214
RAC: 33
Message 1065873 - Posted: 12 Jan 2011, 12:33:18 UTC - in response to Message 1065864.  

Sounds like a memory or virtual memory problem.
What is using the most CPU?
Try not using IE beta.
Video drivers?
What version of BOINC?
Are you using your GPU for BOINC?
Try closing all other programs.
Try loading firefox in safe mode.
FF safe mode not windows safe mode.
What version of FF?

Mike Bader
BOINC V7.16.5
http://setiathome.berkeley.edu/team_join_form.php?id=5 - Join Our International Team
[img]http://boinc.mundayweb.com/one/stats.php?
ID: 1065873 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1065881 - Posted: 12 Jan 2011, 13:35:29 UTC - in response to Message 1065846.  


...
The computer was 48 hour prime 95 ....


LOL

Really? That was the longest time that someone run Prime95 i ever read. :D

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1065881 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1065897 - Posted: 12 Jan 2011, 15:11:54 UTC - in response to Message 1065881.  

So your RAM is good how about doing a checkdisk and see if the HDD isn't failing


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1065897 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1065898 - Posted: 12 Jan 2011, 15:14:04 UTC - in response to Message 1065881.  


...
The computer was 48 hour prime 95 ....


LOL

Really? That was the longest time that someone run Prime95 i ever read. :D

Helli


Yeah I normally only run a 12-24 hour test, but I make sure I time the longest stretch during sleep and work lol. I've seen people run it a week at a time. I don't really see why though.

Now the server I built had the same outcome of trashing work units out of nowhere. I took out my video card and re-seated them and it seemed to have fixed it. Maybe you could try that?
Traveling through space at ~67,000mph!
ID: 1065898 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 1066008 - Posted: 12 Jan 2011, 21:16:07 UTC

So My computer is fully up to date as I use Secunia PSI for patch control. My drivers also up to date, I checked with the manufacturers website.

For Boinc I am running version 6.10.58


For my main drive I am running 2 x 1TB Western Digital Drivers in raid 0. I just ran checkdisk with no errors.

I have a HIS Radeon 5770 which I used for collatz conjecture for about a week, but don't anymore.

Since my computer was Prime95 stable, do you guys think this is a hardware issue not caught by Prime95, or is this some kind of software/driver issue?

How Would I go about testing this?

At this point the problem occurs just infrequently enough to be hard to diagnose, but yet often enough to be a concern.

I am open to any suggestions.
ID: 1066008 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1066013 - Posted: 12 Jan 2011, 21:21:15 UTC

I had a similar problem with programs crashing on start on others stopping responding, and like Mike Bader suggested it was a virtual memory problem. Windows sidebar was using over TWO gig of virtual memory. Every couple of days I restart sidebar, so far no issues.
ID: 1066013 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 1066030 - Posted: 12 Jan 2011, 22:38:09 UTC - in response to Message 1066013.  
Last modified: 12 Jan 2011, 22:47:04 UTC

I had a similar problem with programs crashing on start on others stopping responding, and like Mike Bader suggested it was a virtual memory problem. Windows sidebar was using over TWO gig of virtual memory. Every couple of days I restart sidebar, so far no issues.


I don't think that's it. I have 4 GB of ram and 6 GB of pagefile (windows recommended amount).

On second thought, here is a screenshot of process explorer with all of the processes running right now. Does that look correct?
ID: 1066030 · Report as offensive
Robert Ribbeck
Avatar

Send message
Joined: 7 Jun 02
Posts: 644
Credit: 5,283,174
RAC: 0
United States
Message 1066196 - Posted: 13 Jan 2011, 14:22:45 UTC - in response to Message 1066030.  

I see no antivirus or anti spyware active
Try scanning for malware
It is after all widows
ID: 1066196 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1066203 - Posted: 13 Jan 2011, 14:59:16 UTC - in response to Message 1066196.  
Last modified: 13 Jan 2011, 15:43:00 UTC

I see 3, @ ~33%, processes, WCGrid and 2 others (ProcessExplorer and M.S.AntiMalware).
Expected to see 4 at ~25%, or 2 @ ~50%, but 3?
Or 1 core not visible to system?? IMO impossible, or something is really messed up, like your CPU !

EDIT
See, you are using a (C2D?) T2500, which is the troubled host?
Host 5716726 ?

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSE3x (AMD/Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE3x Win32 Build 76 , Ported by : Jason G, Raistmer, JDWhale

CPUID: Genuine Intel(R) CPU T2500 @ 2.00GHz
Speed: 2 x 1995 MHz
Cache: L1=64K L2=2048K
Features: MMX SSE SSE2 SSE3

ID: 1066203 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1066217 - Posted: 13 Jan 2011, 15:44:11 UTC

Is this your machine?
AuthenticAMD - AMD Athlon(tm) II X3 445 Processor

Check out this utility:
Control Panel -- Action Center -- Under Maintenance, click on "View Reliability History"

For the days where you experienced this problem, see if there are any windows events or error that show what happened.

Also, I strongly recommend that you suspend all BOINC computations and then exit BOINC completely. Leave it off until you get this sorted out. Does this crazy situation fix itself when you reboot? Does it take time to get strange, or is there a trigger to make it strange? Seems like someone/something kicked your computer and the video card or memory or something is loose inside the case. If all the hardware is OK, then your problems are probably software corruption.
ID: 1066217 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1066278 - Posted: 13 Jan 2011, 19:38:19 UTC
Last modified: 13 Jan 2011, 19:39:44 UTC

I had the same issue when I decided to build my new server machine. It went on for about 3 days or so trashing about 1 in 4 WU's put through cuda. Since then I've ran for over a week with no errored units. I even tried putting a second power supply in the tower just for the two video cards, running malware and anti-virus wasn't needed since it was a fresh install. Even checked with power settings etc. Nothing helped, it just stopped erroring the units. Beginning to wonder if it was my issue to begin with and not some malformed WU's or something to that effect. Maybe trashed during the download, they all came during the 4th of January?
Traveling through space at ~67,000mph!
ID: 1066278 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1066337 - Posted: 13 Jan 2011, 22:41:05 UTC - in response to Message 1066030.  

I don't think that's it. I have 4 GB of ram and 6 GB of pagefile (windows recommended amount).

On second thought, here is a screenshot of process explorer with all of the processes running right now. Does that look correct?

Your right I don't think that is it, and everything there looks OK.
ID: 1066337 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 1066350 - Posted: 13 Jan 2011, 23:12:09 UTC - in response to Message 1066217.  

Is this your machine?
AuthenticAMD - AMD Athlon(tm) II X3 445 Processor

Check out this utility:
Control Panel -- Action Center -- Under Maintenance, click on "View Reliability History"

For the days where you experienced this problem, see if there are any windows events or error that show what happened.

Also, I strongly recommend that you suspend all BOINC computations and then exit BOINC completely. Leave it off until you get this sorted out. Does this crazy situation fix itself when you reboot? Does it take time to get strange, or is there a trigger to make it strange? Seems like someone/something kicked your computer and the video card or memory or something is loose inside the case. If all the hardware is OK, then your problems are probably software corruption.


Yes, that is the computer. The situation fixes itself completely after a restart. It happens suddenly: there is some moment when I started to hear the Hard Drives start to seek as the WorldCommunity grid started to load 3 new workunits into memory at the same time(as the previous ones crashed). This is the moment that the corruption in IE started to happen. I was not using Firefox at the time, but when I tried to load it it would crash on start.

Over the weekend I will try to reinstall the operating system and clean and reseat the video card and the ram. (I need to do a dust cleaning anyway).



ID: 1066350 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1066408 - Posted: 14 Jan 2011, 1:51:08 UTC - in response to Message 1066350.  

I finally sat down and watched your video. this looks to be a bad motherboard. it affects your graphics, HDD, and, programs ability to load into ram. I would try either updating the bios on that board or installing your parts into another system. At first I thought maybe you had a corrupted HDD but that wouldnt cause the crazy graphics.

Also how much RAM do you have. Some WCG WU's requite a 1Gig or more each. if you dont have enough RAM you'll start hearing your HDD being used as virtual ram and making a lot of noise.

Try running mydefrag. See if it comes up with any corrupted segments or if it fails to start altogether.

try this with FF. unistall restart pc then reinstall FF. by restarting you make the computer install on a different part of the disk. uninstalling and reinstalling just paints it over the same directories and disk area. If the HDD is bad restarting and reinstalling may help


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1066408 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1066421 - Posted: 14 Jan 2011, 2:28:57 UTC

The graphic anomalies definitely point to a video card. With all the other issues you are having though it does sound like a motherboard issue. However have you tried putting a different video card in the machine? Have you tried starting BOINC cpu only and see what it does? A single bad part in a machine can cause some really weird situations to happen. The programs not wanting to open at all really really points to a motherboard issue, especially when paired up with the artifacts you are seeing.
Traveling through space at ~67,000mph!
ID: 1066421 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 1066477 - Posted: 14 Jan 2011, 4:24:52 UTC - in response to Message 1066421.  

The graphic anomalies definitely point to a video card. With all the other issues you are having though it does sound like a motherboard issue. However have you tried putting a different video card in the machine? Have you tried starting BOINC cpu only and see what it does? A single bad part in a machine can cause some really weird situations to happen. The programs not wanting to open at all really really points to a motherboard issue, especially when paired up with the artifacts you are seeing.


This was my first computer build, so I do not have any spare parts at all.

If it is the motherboard that sucks, because this happens so infrequently that quite frankly it will be more of a hassle for me to get it replaced
ID: 1066477 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1066495 - Posted: 14 Jan 2011, 6:08:07 UTC - in response to Message 1066477.  


This was my first computer build, so I do not have any spare parts at all.

If it is the motherboard that sucks, because this happens so infrequently that quite frankly it will be more of a hassle for me to get it replaced


Repair it yourself, it's relatively cheap and easy to do just take your time putting in the cpu and seating the heatsink. The rest is pretty much plug and play. If all else fails go to walmart and buy you a cheap-o video card to test with then take it back the next day or two and tell them it doesn't work with your computer. Not exactly the least sleezy way to do it but it works.
Traveling through space at ~67,000mph!
ID: 1066495 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 1066516 - Posted: 14 Jan 2011, 8:50:22 UTC - in response to Message 1066495.  
Last modified: 14 Jan 2011, 8:51:34 UTC


This was my first computer build, so I do not have any spare parts at all.

If it is the motherboard that sucks, because this happens so infrequently that quite frankly it will be more of a hassle for me to get it replaced


Repair it yourself, it's relatively cheap and easy to do just take your time putting in the cpu and seating the heatsink. The rest is pretty much plug and play. If all else fails go to walmart and buy you a cheap-o video card to test with then take it back the next day or two and tell them it doesn't work with your computer. Not exactly the least sleezy way to do it but it works.


What I meant to say was that this kind of error happened 3 times since I put the computer together. It happened at the beginning of November, I dismissed it as related to aggressively overclocking, and turned down the CPU frequency by from 3.7 GHz to 3.6 GHz. Then it happened again over Christmas vacation around December 22. I was only able to deal with it when I got back around January 2nd. I decided to stop Boinc and do a 48 hour Prime95 test to see if the error will reproduce. The computer ran the Prime95 test just fine, no error.

Then the error happened again as I was running Boinc on January 11th.

With the occurrence of about 1 a month on average I don't know how to track this down. But every time the error occurred I had to do a reboot and everything went back to normal.

This prevents me from leaving the computer unattended for long periods of time, and it also prevents me from running ClimatePrediction.net because some of those workunits take 3 weeks to complete. The error in December trashed workunits that had about 20 days of computation done on them.
ID: 1066516 · Report as offensive
1 · 2 · 3 · 4 . . . 8 · Next

Message boards : Number crunching : Help with strange computer problem.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.