CUDA switching off my PC

Questions and Answers : GPU applications : CUDA switching off my PC
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile MikeL

Send message
Joined: 20 May 99
Posts: 21
Credit: 16,598,594
RAC: 6
United Kingdom
Message 942567 - Posted: 24 Oct 2009, 16:11:58 UTC

(repost from BOINC.com)

Ever since I began to use CUDA, my system suffers from sudden switchoffs and reboots. Occassionally I'll get a bluescreen for a second or so before it instant-shutdowns. Cutting down CPU usage has little effect. This ONLY ever occurs when a CUDA task is running, and I only crunch SETI. Switch out the CUDA and I never get a problem.

---
24/10/2009 12:47:26 Processor: 2 GenuineIntel Intel(R) Pentium(R) D CPU 3.40GHz [x86 Family 15 Model 6 Stepping 4]
24/10/2009 12:47:26 Processor features: fpu tsc pae nx sse sse2 mmx
24/10/2009 12:47:26 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
24/10/2009 12:47:26 Memory: 3.00 GB physical, 4.84 GB virtual
24/10/2009 12:47:26 Disk: 232.88 GB total, 142.94 GB free
24/10/2009 12:47:26 Local time is UTC +1 hours
24/10/2009 12:47:26 CUDA device: GeForce GTS 250 (driver version 19107, compute capability 1.1, 1024MB, est. 84GFLOPS)
---

The event log states that its a BOINC app error, and there is some evidence that its a 'memory related problem'.

----
The description for Event ID ( 1 ) in Source ( BOINC ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: BOINC error: 183, Another instance of BOINC is running.
----
Erm. There isn't.
A search of this forum hasn't found anything quite the same so can anyone help?

------
System:
BIONC 6.6.36
XP.
Aging Pentium D 3.4 overclocked to 3655 @ 35degrees nominal during normal crunching. Custom air cooler.

4 gigs of ram, but its 32bit so only sees part of the extra memory.

Coolermaster 700W silent pro PSU running

Nvidia GTS250 card.driver 6.14.11.9107.

I overclock this with VTune for gaming without problems. The CUDA crashes occur even when the card is run in safe mode.

I monitor using Asus own AiBooster, which gives me separate readings for CPU and System. It can be a little flaky, but the readings correlate to Probe2 readings so on balance I trust them but open to anyone with better knowledge! I have an AC Freezer Pro 7 CPU cooler.

According to Vtune, during crunching the 250 stabilises at 50-51 deg C, fan fixed at 50%. Gets much higher with dynamic fan set but I hate heat. Have watched the system several time as it 'pops' and the temp doesn't appear to spike.





space is big
ID: 942567 · Report as offensive
Profile bloodrain
Volunteer tester
Avatar

Send message
Joined: 8 Dec 08
Posts: 231
Credit: 28,112,547
RAC: 1
Antarctica
Message 942714 - Posted: 25 Oct 2009, 5:37:43 UTC - in response to Message 942567.  

could bad ram on card
ID: 942714 · Report as offensive
Profile MikeL

Send message
Joined: 20 May 99
Posts: 21
Credit: 16,598,594
RAC: 6
United Kingdom
Message 942735 - Posted: 25 Oct 2009, 11:21:12 UTC - in response to Message 942714.  

Any tools u know of to analyse it?
space is big
ID: 942735 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 942807 - Posted: 25 Oct 2009, 20:44:09 UTC - in response to Message 942735.  

Any tools u know of to analyse it?


[http://setiathome.berkeley.edu/forum_thread.php?id=54956]

ID: 942807 · Report as offensive
Profile MikeL

Send message
Joined: 20 May 99
Posts: 21
Credit: 16,598,594
RAC: 6
United Kingdom
Message 945116 - Posted: 4 Nov 2009, 22:16:42 UTC

OK, well a slight delay on this issue while Windoze crashes and burns.

Reinstalled, nice clean system. Boinc/SETI back in, no problems.

2-3 days of occasional overclocked gaming, no problems.

Palit GTS250
set defaults:
745/1934/1000

Run CUDA, 10 mins later 'pop'.

So, dld & run Memtest.
>MemtestG80 768 100

Temp stabilised at 57 deg, already above what Seti CUDA gives out.
No errors in 28 iterations...

Then 'pop' - PC switches off. No bluescreen, no temp spike. ZERO errors on memtest console.

Feh.

Given the system just trips itself off could this be an issue with the card drawing too much current? How could I monitor that?

To recap I'm running a Coolermaster 700W silent pro PSU, with 2 cables supplying power to the card via both sockets.




space is big
ID: 945116 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 945125 - Posted: 4 Nov 2009, 23:00:03 UTC - in response to Message 945116.  

Given the system just trips itself off could this be an issue with the card drawing too much current? How could I monitor that?

To recap I'm running a Coolermaster 700W silent pro PSU, with 2 cables supplying power to the card via both sockets.


The Coolermaster should be man enough for the job. I would try running GPU-Z with the MemtestG80 (I use 500 iterations for my GTX295 - tend to get a failure between iteration 300 and 400). With GPU-Z you can write the sensors to a log file (including current and temp) and then go back and investigate if/when the system reboots.

F.
ID: 945125 · Report as offensive
Profile MikeL

Send message
Joined: 20 May 99
Posts: 21
Credit: 16,598,594
RAC: 6
United Kingdom
Message 945132 - Posted: 4 Nov 2009, 23:22:47 UTC

Cheers Fred I'll try this next.
space is big
ID: 945132 · Report as offensive
commhealy1999

Send message
Joined: 15 May 99
Posts: 2
Credit: 5,538,891
RAC: 0
United States
Message 945157 - Posted: 5 Nov 2009, 0:51:13 UTC - in response to Message 945132.  

This really sounds like a power supply issue. I would go pick up a new power supply and swap it. If it fixes the problem great if not then return the PS and start over. Wattage is not everything in a power supply and sometimes they just die. BOINC will definitely find the bad ones.
ID: 945157 · Report as offensive
Profile Coleslaw
Volunteer tester
Avatar

Send message
Joined: 20 Mar 07
Posts: 16
Credit: 4,336,830
RAC: 45
United States
Message 945195 - Posted: 5 Nov 2009, 5:09:35 UTC

My 8400GS did this until the 190 or 191 series drivers came out for it. I still have the occasional drivers failure, but the only project that runs regularly without this for my card is Collatz. I would just keep an eye out for better drivers. Also, as mentioned previously, it may be a power issue.
ID: 945195 · Report as offensive
Profile Loter

Send message
Joined: 18 Jul 01
Posts: 23
Credit: 368,078
RAC: 0
Moldova
Message 945206 - Posted: 5 Nov 2009, 6:06:58 UTC

Power supply or heat problem. Try to monitor CPU & GPUs temperature during Boinc run. If temps are okay, then check power.
ID: 945206 · Report as offensive
Andrew Skretvedt

Send message
Joined: 3 Apr 99
Posts: 1
Credit: 583,585
RAC: 0
United States
Message 946565 - Posted: 11 Nov 2009, 18:56:37 UTC

I'll second the opinion of others on this thread and venture that this is (to me) certainly a PSU issue.

Whenever a PC does an unexpected "instant-off", almost certainly what's happened is some manner of transient condition caused the ATX "power-good" signal to be removed from the motherboard connection. If this signal goes missing, an ATX supply is designed to switch off instantly in an effort to avoid damage to the PC (very cheap PSUs, like the Deer brand, are missing some of this logic (even though it's required by the ATX spec), generally with disastrous results for the PC). The PSU can also remove power on its own for its own reasons, if its own internal voltage regulation goes out of spec due to load imbalances on its various voltage rails or whatever. (I think this is you, either the mobo or the PSU decided the power delivered no longer met spec, and called for a damage preventive cut.)

-----

The ultimate solution is a more robust power supply and/or a careful reworking of the power distribution to the various components of the PC requiring power connections, so as to better ensure the load limits of each of the voltage rails making up the PSU's design are respected.

Mere reboots can be power-related (if a struggling supply causes logic components to become unstable, flipping bits in transit, etc.) or can be some other issue entirely like faulty RAM or faults with some other logic component in the picture, or of course driver or other software faults.

Complicating the matter is the fact of your apparently good gaming experience and overclocking success, overclocked gaming certainly puts pressure everywhere in a system.

Power delivery in a modern, hefty GPU and multicore CPU equipped PC is a surprisingly complex challenge these days. It's getting better, but there are still so many ways to assemble a PC which appears to fit with your desired specs, but underneath it all there is some combination of tiny details which make the reality very different.

My sense is that BOINC+cuda is placing a characteristically different load on your PC from what you normally get w/o the cuda portion, or while playing a demanding game.

Cuda (from what little I know) makes the heaviest demands of a videocard possible, by maxing the GPU, the onboard RAM, and data pipelines for indefinitely sustained periods. Gaming is also of course heavy, but not in a sustained way...scenery changes, less complex models come and go, you could stroll into a quiet portion of the game, etc.

Current draw from the card will be at a maximum under a cuda app, the massive data throughput will cause your mobo chipset to be drawing near max, and of course your CPUs may also be loaded down and busy with normal work or additional BOINC apps.

Your PSU's headline wattage rating may seem to indicate this is OK, but sneaky gotcha's may make the wattage available to the the PC for consumption be substantially lower.

PSUs differ in the configuration of their internal "rails", each of which basically can deliver a specified maximum amount of current at a specific voltage. The end-connector leads are wired up to various of these rails to deliver power to the PC components. You cannot pull more current than a specific rail or combination of them can deliver. AND it's usually the case that the rails share a perverse interdependency in that a 12v rail, say, cannot deliver it's maximum current unless a given amount of loading is also present on a combination of the 5v and 3.3v rails. Specific limits vary widely and are unique to the PSU's design.

What power connectors are connected to which internal PSU rails and what the specific loading requirements are is usually very poorly documented, or not at all, requiring a lot of guesswork if you have a power-hungry system.

When working with a modern power-hungry video card, requiring more can one PCIe supplemental power connector, best results are usually had by plugging leads from the PSU which are each tapped into different power rails. If your PSU lacks sufficient PCIe supplemental power plugs, be careful about the molex-to-PCIe power adapters. You'll want to ensure these adapters are connected to the leads from your PSU in such a way as to make sure power draw is as balanced across the available 12v rails as possible (most better PSUs have two 12v rails, and some have more).
ID: 946565 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 946572 - Posted: 11 Nov 2009, 19:49:24 UTC - in response to Message 946565.  

My old Mobo died a grissly death in much the same fashion. there isnt any real way to check the motherboard. this is something where you'll get to replace parts until you've got enough to just about make anothe PC before you find what the problem is


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 946572 · Report as offensive
Gerald King

Send message
Joined: 3 Apr 99
Posts: 36
Credit: 50,054,611
RAC: 133
United States
Message 959934 - Posted: 1 Jan 2010, 6:22:31 UTC - in response to Message 946572.  
Last modified: 1 Jan 2010, 6:23:48 UTC

This is interesting in that my system is powering itself off as well since I started running CUDA. I have a higher end system in that I'm an intense online gamer, 20+ hours a week. MOBO and PS are about a year old, video is less than that. I've never had a problem in the past with power offs, I have been running SETI for almost as long as they have been public.

I installed Windows 7 64 bit on Wednesday, 30 Dec 09, on a brand new hard drive. Have 8 gigs of mem, 512mb video card, Quad core processor. I was thinking it had something do with Windows 7 and a power save feature. I haven't seen it shut down, I left it running over night and when I got up for work it was off. I turned it back on, went to work and when I came home it was off. I'll keep an eye on it and see if I can pin anything down.

Still trying to decide if I like CUDA or not.
I reject your reality and substitute my own

boincstats.com
ID: 959934 · Report as offensive
Gerald King

Send message
Joined: 3 Apr 99
Posts: 36
Credit: 50,054,611
RAC: 133
United States
Message 959940 - Posted: 1 Jan 2010, 6:43:35 UTC - in response to Message 959934.  

Ya, I could edit my other post but there might be some useful information in it. However, Windows 7 does have a power save feature that Sleeps your system after X minutes. This is cool, especially for someone who likes to leave their system on /cough me /cough. What can I say, I love to run SETI.
I reject your reality and substitute my own

boincstats.com
ID: 959940 · Report as offensive
Profile MikeL

Send message
Joined: 20 May 99
Posts: 21
Credit: 16,598,594
RAC: 6
United Kingdom
Message 961534 - Posted: 7 Jan 2010, 14:30:55 UTC

Its still happening, but now not just in CUDA. In UT3 its regularly popping after 10-20 mins play. It did this a little in Fallout 3 but UT3 must hammer the card more in certain ways.

So, I think its either a problem with the card or with the power supply to the card.

Trouble is the card passes memtest etc - so if I try and send back for a replacement I'm not sure I'll get anywhere. I will be contacting Palit to see if they will send a replacement out. I have an old 7200 that will do in the interim but won't run cuda or the latest games well.

space is big
ID: 961534 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 961555 - Posted: 7 Jan 2010, 16:27:33 UTC - in response to Message 961534.  

perhaps its a PSU problem. underpowered and overheating will shut you down


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 961555 · Report as offensive
Profile bloodrain
Volunteer tester
Avatar

Send message
Joined: 8 Dec 08
Posts: 231
Credit: 28,112,547
RAC: 1
Antarctica
Message 961766 - Posted: 8 Jan 2010, 3:12:11 UTC - in response to Message 961555.  

clean the fan etc in your case. one other thing is see if the video card is getting over heating.
ID: 961766 · Report as offensive
Profile MikeL

Send message
Joined: 20 May 99
Posts: 21
Credit: 16,598,594
RAC: 6
United Kingdom
Message 962311 - Posted: 9 Jan 2010, 18:51:02 UTC

PSU is a Coolermaster 700W job with 2 dedicated pci-e rails going in to the card. That already cost be 100 quid and I know other systems run fine on it. They do tend to be core2 though and mine's a D 3.6. Maybe I should try 800W?

Its certainly not overheating AFAIAC - Rivatuner gives a steady 48deg under load up to switchoff (though to the touch the backplate feels hotter). Setting the fan to 70% gets the temp down but it still pops.
space is big
ID: 962311 · Report as offensive
Profile MikeL

Send message
Joined: 20 May 99
Posts: 21
Credit: 16,598,594
RAC: 6
United Kingdom
Message 962319 - Posted: 9 Jan 2010, 19:12:03 UTC

I'm going to take Andrew's advice next, swop the config of the power rails around a bit and see if that improves matters. Just don't like the idea of having spent 100 quid or more on a PSU that's not up to powering 1 x GTS250!
space is big
ID: 962319 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 967833 - Posted: 3 Feb 2010, 12:59:48 UTC - in response to Message 962319.  

Hi, I'm running CUDA {SETI}(9800GTX+ & QX9650;2GiG DDR2) and CAL (ATI HD5770 & Q6600;2GiG DDR2){Collatz C}. First runs WIN XP64, the latter WIN XP x86 (Pro)
Both have the same MoBo (ASUS P5E), case and PSU, CoolerMaster 450Watt.
The 1st rig runs a year, almost non-stop, they draw 250-320*Watt each.

*The rig with an HD 5770 crunching Collatz C.
Did have a system P4D Dual Core(Smithfield) @ 2.8GHz, which PSU went up in smoke,
before that, it had been crunching 1.5 Years, NO CUDA, first symtoms were reboot's and bad VIDEO display.


Since it's not easy, to measure the (actual)current in one rail, maybe by using a serial resistor (0.01-0.1 Ohm; 10-25Watt) and measuring the amount of (milli)Volt. Then you can do the math, but this serial resistor, will give a drop in Voltage, if it's to big*.

*Over a 0.01 Ohm resistor, 0.5V is measured= I=U/R = 0.5/0.01=50A , consumes 25 Watt. You already loose 0.5V, in this measurement.
NOT recommended, to do this unless, you do know exactly, what your doing!



ID: 967833 · Report as offensive
1 · 2 · Next

Questions and Answers : GPU applications : CUDA switching off my PC


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.