Experiencing randow stops, system doen't respond to anything.

Message boards : Number crunching : Experiencing randow stops, system doen't respond to anything.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1031016 - Posted: 4 Sep 2010, 23:45:08 UTC

Have my QX9650 rig with 1 470 running without a case.
It not even has a sound card, it's a pure cruncher.

Today, I wanna look at the tasks being run, no responce, system appears up and running, don't see anything on the monitor, though.
Running BOINC 6.10.58 (64BIT)on WIN XP64Pro (SP2), the odd thing is, no fault messages or Blue Screens Off Dead, either.
I get the feeling that, maybe the +12V has too little juice, it 'expects 24Amp's, PSU has 4x12V, 17Amp's each and have 1 cable connected, which was advised.
Just gonna try 2x12V, 17A=34A Max.

ID: 1031016 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1031038 - Posted: 5 Sep 2010, 1:19:20 UTC - in response to Message 1031016.  

Have my QX9650 rig with 1 470 running without a case.
It not even has a sound card, it's a pure cruncher.

Today, I wanna look at the tasks being run, no responce, system appears up and running, don't see anything on the monitor, though.
Running BOINC 6.10.58 (64BIT)on WIN XP64Pro (SP2), the odd thing is, no fault messages or Blue Screens Off Dead, either.
I get the feeling that, maybe the +12V has too little juice, it 'expects 24Amp's, PSU has 4x12V, 17Amp's each and have 1 cable connected, which was advised.
Just gonna try 2x12V, 17A=34A Max.


Is it by chance in hibernation? Did you try to recycle the machine?
ID: 1031038 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1031127 - Posted: 5 Sep 2010, 11:15:53 UTC - in response to Message 1031038.  

Hi, no it isn't hibernation, that's an unusual setting for my hosts.
They run 24/7.
Since this host is case-less, the sound-card had no good connection to
massa or zero and a pop-up, what device I had put in (auto-detcting of
speakers; mike, etc. came up.
Don't know for sure, if this was the reason, but it didn't happen again.
Could also been BOINC (6.10.58;64BIT), which lost it's connection to Local Host, due to the large amount of MB tasks and Einstein tasks (1400 together).

ID: 1031127 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1031148 - Posted: 5 Sep 2010, 13:23:45 UTC - in response to Message 1031127.  

Are you running an antivirus on it? Maybe you could set the AV to ignore the BOINC folder. Mine was causing WUs to hang on my GPU and finally error out with a -1 error.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1031148 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1031155 - Posted: 5 Sep 2010, 14:01:29 UTC - in response to Message 1031148.  

I do have AV set to skip the BOINC data folder.
This could have been a reason, I came across, last year, so it is
something to bear in mind.

ID: 1031155 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 1031572 - Posted: 7 Sep 2010, 9:34:50 UTC - in response to Message 1031155.  
Last modified: 7 Sep 2010, 9:55:22 UTC

I've had a few recurring random lock ups varying from a few mins to half a day over the last couple of weeks myself.

I havent 100% tracked down the causes but these were the fixes.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5516356
Randomly freezing after months of working OK but with the screen still displaying. Tried various things, swapped GPUs and tried variouds combinations of the memory, upgraded video driver but still did it. Went in to BIOS and fixed the memory speed to match the RAMs spec and hasnt locked up since. I think the Auto overclock was pushing the memory too far.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4841051 Randomly freezing after months of working OK but with the screen still displaying. Swapped GPUs (GTX470), upgrtaded video drivers and still the same. Checked and the fan speed on the GPUS had reverted to AUTO - I normally have it set to 100%. Set to 100% and hasnt locked up since.


John.

[Edit] Forgot a third one that happened yesterday. One machine repeatedly locked up (clock stopped) but then unfroze after a few minutes - keyboard and mouse not active so couldn't find out what was happening. Rebooted and been OK since.

I suspect the gremlins are at work ...........
GPU Users Group



ID: 1031572 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1031586 - Posted: 7 Sep 2010, 12:45:35 UTC

It's not your case, I guess, but since I'm running 470 SLI, I got occasional lockups too. Solved them by increasing QPI PLL voltage(its X58 specific), PCI-e slots voltage to 1.55 V from stock 1.5 V and PCI speed to 102 MHz. Maybe only one of thee did the job, but I'll found out later which one exactly.
GPU Core overclocking over its abilities can lead to the same behavior too. I tested mine up to 850 MHz(they are water cooled) and experienced this great many times :) its different than behavior of other chips I had - they mostly declock when I overdo it.
ID: 1031586 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1031590 - Posted: 7 Sep 2010, 13:23:52 UTC - in response to Message 1031586.  

These kind of 'faults', are hard to pin down and running SETI Optimized and CUDA, also optimized.
And no ERROR-Messages, too.
But SETI optimized and many other Projects, do put quite some stress on your puter, it's a bit like running BenchMarks, all the time.
Compaired to 'normal programs, like Word(processing, Excell even CAD/CAM or PhotoShop, only peaks at 100%, more likely much less.
The latest GAMES are probably the best way to compaire the behavior of your host('s), using 2 or 3 monitors, force feedback joy-sticks/steering wheels/peddles. (Web-Cams, etc.)



ID: 1031590 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1031638 - Posted: 7 Sep 2010, 16:28:37 UTC

The 12v supply question you mentioned in your first post is why all of my new rigs are using PSUs with single 12v rails......
You don't have to worry about which connectors you use for which devices to avoid overloading one of the rails on a multiple 12v rail supply.
Many motherboards also require a second PSU connection in addition to the primary power connector...usually a 4 or 6 pin. And will give you problems if that is not hooked up as well.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1031638 · Report as offensive

Message boards : Number crunching : Experiencing randow stops, system doen't respond to anything.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.