Huge memory usage on GTX 295 and lots of crashed units

Message boards : Number crunching : Huge memory usage on GTX 295 and lots of crashed units
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992747 - Posted: 29 Apr 2010, 18:03:42 UTC

Im having a big problem with this GTX 295
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5253344
The thing is no matter what OS or driver I use, or even how far I underclock the memory or the gpu, the card uses like 514mb of memory each GPU, and produce a lot of errors, and like 2x at day, artifacts and hang up the entire machine.
I try to RMA it, but it seems if the card works in games (right) they won´t trade it :(
Does any one have a clue on what is happening?
ID: 992747 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6663
Credit: 121,090,076
RAC: 0
United States
Message 992748 - Posted: 29 Apr 2010, 18:10:54 UTC - in response to Message 992747.  
Last modified: 29 Apr 2010, 18:42:37 UTC

Im having a big problem with this GTX 295
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5253344
The thing is no matter what OS or driver I use, or even how far I underclock the memory or the gpu, the card uses like 514mb of memory each GPU, and produce a lot of errors, and like 2x at day, artifacts and hang up the entire machine.
I try to RMA it, but it seems if the card works in games (right) they won´t trade it :(
Does any one have a clue on what is happening?


Nope, but mine does the exact same thing. I'm replacing it with two GTX 480's.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 992748 · Report as offensive
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992749 - Posted: 29 Apr 2010, 18:13:14 UTC

It produces several kind of errors, it seems. I have no clue on what they mean either :(
ID: 992749 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 992751 - Posted: 29 Apr 2010, 18:18:34 UTC

I just got off the phone with Gigabyte about the two 295's I RMA'd due to overheating and blackscreens, along with the same errors mentioned above. They sent me an email saying the cards were unrepairable, but that there was no replacement stock available.

They suggested in the email that I should call their customer service and tell them what model of card I would like to replace the two 295's. Long story short...they wanted me to take some refurbed ATI cards. I (not so politely) declined their offer, so they are going to refund my original purchase price.

I know, probably not 100% on topic, but had to let off some steam.....
ID: 992751 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 992753 - Posted: 29 Apr 2010, 18:21:52 UTC - in response to Message 992751.  

I noticed a great deal of -9 errors that occur within 10 seconds of starting a WU. This usually means theirs something wrong with the GPU. I wouldnt be surprised that running seti stresses the GPU into errors.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 992753 · Report as offensive
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992755 - Posted: 29 Apr 2010, 18:28:49 UTC - in response to Message 992753.  

I noticed a great deal of -9 errors that occur within 10 seconds of starting a WU. This usually means theirs something wrong with the GPU. I wouldnt be surprised that running seti stresses the GPU into errors.


Yes, they happens after a first crashed unit and if im not around to stop it, the card will ruin all my cache producing that 0.01 credit errored units :(
ID: 992755 · Report as offensive
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992756 - Posted: 29 Apr 2010, 18:30:50 UTC

Wondering if that can´t be fixed, can I use a cc_config file that let me ignore GPU 0 or GPU 1? But just in case that there is not a better workaround this, because that will let me with a one GPU 295....
ID: 992756 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 992761 - Posted: 29 Apr 2010, 18:44:33 UTC - in response to Message 992756.  

I dont think a cc_config is going to fix a hardware problem. This appears to be beyond the BOINC barrier.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 992761 · Report as offensive
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992764 - Posted: 29 Apr 2010, 19:16:27 UTC

That memory usage is far beyond what my other 295 uses. Besides setting video performance to performance on Win 7, what else can I do to minimize memory usage?
ID: 992764 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 992778 - Posted: 29 Apr 2010, 20:48:13 UTC - in response to Message 992764.  

by better video I assume you mean you've turned Windows 7 aero off?

I'd swap the card out onto one of your systems that is also running the same cards. does it continue to produce bad results?

using over 500mb memory? are you saying the card is using 500Mb of system ram?


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 992778 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 992786 - Posted: 29 Apr 2010, 21:05:30 UTC - in response to Message 992756.  

Wondering if that can´t be fixed, can I use a cc_config file that let me ignore GPU 0 or GPU 1? But just in case that there is not a better workaround this, because that will let me with a one GPU 295....

I don't know if that would fix it, but there's the <ignore_cuda_dev>N</ignore_cuda_dev> option for the cc_config.xml.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 992786 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1647
Credit: 12,921,799
RAC: 89
New Zealand
Message 992838 - Posted: 30 Apr 2010, 3:44:24 UTC

Out of interest how old is your 295?
ID: 992838 · Report as offensive
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992907 - Posted: 30 Apr 2010, 12:25:12 UTC - in response to Message 992778.  

Yes skildude i´ve turned windows aero off and in visual effects, I set it to have better performance, hoping for less video card memory use.
And yes, fitting the card in my other machines, it does the same thing.
No, by using more than 500mb (503 to 514, depending the instaled driver) I mean I use EVGA Precision to oc and monitore the card, and it reports that amount of video card memory used. I know there is plenty of memory available (over 880mb each GPU) but still, using that much memory seems not to be right, and it seems to me its a memory related problem.

TY Gundolf for the tip.

Speedy, this GTX is like 8 months old, about the same as the other I have that is perfect.
ID: 992907 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6663
Credit: 121,090,076
RAC: 0
United States
Message 992909 - Posted: 30 Apr 2010, 13:10:55 UTC

Mine was new last August. It is good you had a good one to compare to. It took a long time for me to be convinced mine is actually defective, as I was not sure what to expect as I had never done GPU crunching before. I didn't even know the 295 was the hottest card out there at the time. With Gatekeepers experience, and yours, I am now sure some of these cards were defective from the get go, but because they play games just fine, Nvidia seems to be paying attention to the other items on their plate.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 992909 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 992914 - Posted: 30 Apr 2010, 13:37:57 UTC

I'd say that the card is shot. If you get repeated results on different machines. This is pretty obvious. It's time to return the card


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 992914 · Report as offensive
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992919 - Posted: 30 Apr 2010, 13:51:11 UTC - in response to Message 992914.  

I'd say that the card is shot. If you get repeated results on different machines. This is pretty obvious. It's time to return the card


I did it. But as scimanstev says, if the card works in games (wich it does), they won´t replace im. The card was almost 45 days in RMA service, and they refuse to change it.
I need to find a game, 3d mark or somethng that show the card is failed, I guess.
But I really will keep trying to make it work like one or two weeks before that...
ID: 992919 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 992921 - Posted: 30 Apr 2010, 13:55:50 UTC - in response to Message 992919.  

there was a gpu stress test that someone has linked a few months ago. perhaps they'll point it out again.

have you looked at GPU-z and checked the GPU temperatures under stress? My fist ATI 5850 ran hot and failed after a couple weeks. My new card runs much cooler even under stress


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 992921 · Report as offensive
Fulvio Cavalli
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 992932 - Posted: 30 Apr 2010, 14:22:09 UTC - in response to Message 992921.  
Last modified: 30 Apr 2010, 14:24:56 UTC

Yep, temperatures are ok, like 70°C under full load, and power is ok too (seventeam 850w power source).
Heh... can´t make that ignore cuda device command work... omg I really sux at software skills...
ID: 992932 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 992937 - Posted: 30 Apr 2010, 14:44:33 UTC - in response to Message 992932.  
Last modified: 30 Apr 2010, 14:45:33 UTC

Heh... can´t make that ignore cuda device command work...

The cc_config.xml file must go into the BOINC data directory (see Messages tab after restart of client).

Gruß,
Gundolf
ID: 992937 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6663
Credit: 121,090,076
RAC: 0
United States
Message 992943 - Posted: 30 Apr 2010, 14:59:49 UTC - in response to Message 992921.  
Last modified: 30 Apr 2010, 15:00:16 UTC

there was a gpu stress test that someone has linked a few months ago. perhaps they'll point it out again.

have you looked at GPU-z and checked the GPU temperatures under stress? My fist ATI 5850 ran hot and failed after a couple weeks. My new card runs much cooler even under stress


The software is called Furmark. Here is a link.

http://www.ozone3d.net/benchmarks/fur/

My card did fine with this intensive test, but won't crunch properly with optimized apps. On regular apps, it works fine.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 992943 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Huge memory usage on GTX 295 and lots of crashed units


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.