Noobie CUDA GPU temperature worry

Message boards : Number crunching : Noobie CUDA GPU temperature worry
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 946215 - Posted: 9 Nov 2009, 22:19:48 UTC - in response to Message 946213.  

... a single error using CUDA for SETI crunching can cause a computation error or maybe prevent your unit validating.


I agree with you, but few errors will not do any harm to SET as project. Result will be sent to another comp, and you have choice: to buy a new card or use that for playing :)

I am cruncher :)
I LOVE SETI BOINC :)
ID: 946215 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 946220 - Posted: 9 Nov 2009, 22:57:50 UTC - in response to Message 946213.  

Let some of you say to me: how manufacturer of card may know that you use your GPU for CUDA, and not for playing? :))))
I got 9800 : and temperature under CUDA is 58°C.


The issue really (especially if you don't tell them!) is not how they know what you have been running but will they agree there is a fault.

If you are gaming & you get a few pixel errors, no biggie: a single error using CUDA for SETI crunching can cause a computation error or maybe prevent your unit validating.

If the manufacturer tests the card for a typical gaming use, they will find "no fault" with it if there are just some pixel errors or a few artifacts. There may be enough wrong with it to never give you a valid or complete unit to report.

Exactly the problem I have with my XFX GTX295. Although I told the supplier (Scan computers) that it was used for CUDA processing and how I had localised the fault (using MemtestG80) they tested only with Games and reported back NFF. After a discussion on the phone during which I reiterated HOW I had located the fault, they agreed to return it to XFX anyway and XFX have also reported NFF. Having got it back, I can still get MemtestG80 to log a memory fault in less than 500 iterations. I have now loaded the latest NVidia drivers and it is throwing only 2 or 3 errors per day (all from GPU #2) on S@H and non on Milkyway so I will keep it crunching (with apologies to my wingmen) until it REALLY falls over in the hope that that will happen before the warranty expires.

F.
ID: 946220 · Report as offensive
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 946221 - Posted: 9 Nov 2009, 23:04:24 UTC - in response to Message 946220.  


... I have now loaded the latest NVidia drivers and it is throwing only 2 or 3 errors per day (all from GPU #2) on S@H and non on Milkyway so I will keep it crunching (with apologies to my wingmen) until it REALLY falls over ...
F.


Every computer part can be defective. But how many card work flawless? That is question that nobody asks? Since CUDA is stress for GPU it will like in your case be half defective, but I think it is only small percent of all cards.
So we can say: you just have bad luck.

I am cruncher :)
I LOVE SETI BOINC :)
ID: 946221 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 946225 - Posted: 9 Nov 2009, 23:20:15 UTC - in response to Message 946221.  
Last modified: 9 Nov 2009, 23:20:43 UTC


Every computer part can be defective. But how many card work flawless? That is question that nobody asks? Since CUDA is stress for GPU it will like in your case be half defective, but I think it is only small percent of all cards.
So we can say: you just have bad luck.

I agree that it is bad luck :-( but I would make 2 points:

1. The GTX295 crunched perfectly for over 3 months (no errors at all) before going faulty. Note that, despite the noise, I keep the GPU fan running at 100% while crunching to control the temps - and have taken the side off the case until I get round to cutting a hole in the side and adding a 120mm fan directly over the graphics card.

2. Almost invariably, where a CUDA cruncher is returning errors from a GTX295 card, all the errors will come from the same GPU (and, curiously it almost always seems to be GPU#2). I would bet that all, like mine, are the older 2-board version - so I believe there is a fundamental flaw with the older 295's when used for crunching.

F.
ID: 946225 · Report as offensive
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 946228 - Posted: 9 Nov 2009, 23:31:49 UTC - in response to Message 946225.  

...
2. Almost invariably, where a CUDA cruncher is returning errors from a GTX295 card, all the errors will come from the same GPU (and, curiously it almost always seems to be GPU#2). I would bet that all, like mine, are the older 2-board version - so I believe there is a fundamental flaw with the older 295's when used for crunching.
F.


So another processor GPU#1 is working ok? He also crunch 24/7, but he is not broken?
That is what I say to you: bad luck :(
Nothing is perfect, and we all know that
I am cruncher :)
I LOVE SETI BOINC :)
ID: 946228 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 946235 - Posted: 9 Nov 2009, 23:45:17 UTC - in response to Message 946228.  

...
2. Almost invariably, where a CUDA cruncher is returning errors from a GTX295 card, all the errors will come from the same GPU (and, curiously it almost always seems to be GPU#2). I would bet that all, like mine, are the older 2-board version - so I believe there is a fundamental flaw with the older 295's when used for crunching.
F.


So another processor GPU#1 is working ok? He also crunch 24/7, but he is not broken?
That is what I say to you: bad luck :(
Nothing is perfect, and we all know that

Yes - but it is the kind of bad luck that RMA was designed for (or so I thought!!)

F.
ID: 946235 · Report as offensive
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 946241 - Posted: 10 Nov 2009, 0:00:20 UTC - in response to Message 946235.  

Hope that new board revisions as chip revisions will be better.
I am cruncher :)
I LOVE SETI BOINC :)
ID: 946241 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 946512 - Posted: 11 Nov 2009, 7:23:36 UTC - in response to Message 946185.  

Small add-on :

Seems some stock GPU fans don't like 24/7 ops as well.
The fan on my Club3D HD4890 "Superclocked Edition" now begins to make rattling noises in the speed regime it is now usually in (~50% rpm).

That's just a mere 2 weeks into GPU crunching with the otherwise fairly new card (<3 months), due to its rather frequent speed changes to the fan (it has a rather sensitive fan control logic that will quickly alternate speed levels depending on GPU load).

...another example how 24/7 ops can easily hurt the Video card, in my case caused by a ~1.50$ stock cooler part - the fan which obviously isn't upto that job due to cheap design.

I'll have to look into an alternative GPU cooler, otherwise I see that Video Card going dead in less than 14 days, caused by the weakest part in the chain.

The kids came in complaining thier computer no longer worked. I checked my boinc computer list and seti was still running but no video. They said it hadn't worked for a couple days. I opened it up and it was so full of dust and a huge dust bunny had clogged the video card all up on an ATI hd4350. I blew it out but figured it was dead, to my surprise it booted right up and works great. The kids are back to games and youtube.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 946512 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 946588 - Posted: 11 Nov 2009, 21:09:15 UTC - in response to Message 946188.  
Last modified: 11 Nov 2009, 21:15:48 UTC

Hi Sutaru, I know your post is quite an old one, but here in the UK, EVGA are also offering a 10 year warranty on their GTX260's. I actually ordered one because of the warranty, and because it was clocked a lot higher than a standard GTX260, but the online supplier took too many orders for it and it went out of stock. So he let me change to the Gigabyte super-overclock version, which was clocked even higher than the EVGA, but only comes with a 3 year warranty. I'd check direct with EVGA if I were you, and the place you purchased them from. I hope you haven't purchased a 2 year warranty extension for nothing...

regards, Gizbar.


AFAIK, IIRC, I looked on the US EVGA site and there they wrote 10 year warranty only for USA and Canada.
Maybe will look to the european EVGA site (also in english), maybe they write there also in Europe.. ;-)

But I hope I'll never need the warranty. ;-)

Yes, I looked around and GIGABYTE have the highest OC for a GTX260-216.
I bought the GIGABYTE GTX260(-216) SOC. It's a new release.

But I'm little confused.

A stock GTX260-216 have 576/1242/999 [GPU/shader/RAM].
The EVGA GTX260-216 SSC have 675/1458/1152.

The GIGABYTE GTX260(-216) SOC have on the GIGABYTE site: 680/1466/1250.
In a report in internet and GPU-Z say: 680/1500/1250.
The opt._CUDA_6.08_V12_app say 1512 shader.

I looked to your PC, you have also the GIGABYTE GTX260(-216) SOC?
The CUDA_app say at yours 'only' 1500 shader speed.

Hmm.. confused.. what's correct?

What should you/I believe?

If we have both the same GPU, why I have different shader speed in the CUDA_app output?
Also why we have higher speed at the shader, than the GIGABYTE site say?

What you see if you look with GPU-Z?

ID: 946588 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 946591 - Posted: 11 Nov 2009, 21:25:39 UTC


Ahh.. BTW..

I can compare the EVGA GTX260-216 SSC and the GIGABYTE GTX260(-216) SOC.. ;-)

The EVGA have at ~ 82 °C GPU - ~ 60 % RPM fan speed.
The GIGABYTE have at ~ 82 °C GPU - ~ 40 % RPM fan speed.

So the GIGABYTE is quiet but hotter at same ambient.
ID: 946591 · Report as offensive
Profile gizbar
Avatar

Send message
Joined: 7 Jan 01
Posts: 586
Credit: 21,087,774
RAC: 0
United Kingdom
Message 947430 - Posted: 15 Nov 2009, 10:18:50 UTC - in response to Message 946588.  



AFAIK, IIRC, I looked on the US EVGA site and there they wrote 10 year warranty only for USA and Canada.
Maybe will look to the european EVGA site (also in english), maybe they write there also in Europe.. ;-)

But I hope I'll never need the warranty. ;-)

Yes, I looked around and GIGABYTE have the highest OC for a GTX260-216.
I bought the GIGABYTE GTX260(-216) SOC. It's a new release.

But I'm little confused.

A stock GTX260-216 have 576/1242/999 [GPU/shader/RAM].
The EVGA GTX260-216 SSC have 675/1458/1152.

The GIGABYTE GTX260(-216) SOC have on the GIGABYTE site: 680/1466/1250.
In a report in internet and GPU-Z say: 680/1500/1250.
The opt._CUDA_6.08_V12_app say 1512 shader.

I looked to your PC, you have also the GIGABYTE GTX260(-216) SOC?
The CUDA_app say at yours 'only' 1500 shader speed.

Hmm.. confused.. what's correct?

What should you/I believe?

If we have both the same GPU, why I have different shader speed in the CUDA_app output?
Also why we have higher speed at the shader, than the GIGABYTE site say?

What you see if you look with GPU-Z?


Hi Sutaru, just seen your message.

My GPUz reports correctly AFAIK. It reports 680 core, 1500 shaders, and 2500 ram. Gigabyte claim that this is because they cherry-pick the GPU's. I must admit I never looked at the Gigabyte website. Just checked, and it does say 1466, but definitely reporting 1500 from GPUz.

The website I bought it from (Overclockers UK) stated 650 core, 1500 shaders and 2500 ram, which is why I originally went for the EVGA at 675/1466/1152, plus they said 10 year warranty. It was only when I checked the details in the description that I found it was 680/1500/2500.

Not sure why it's reported differently, it's only 12Mhz, reporting error? Or maybe it it running just that shade higher, slight over-overclock?

Where do you get the information on the speed in Cuda App? I checked Boinc Manager and a completed task, and didn't see the speed reported in either.

EVGA in europe state this on their website http://www.evga.de/warranty/ , which might mean it's too late to get the warranty extended for free.

Oh, and my GTX260 reports GPU temp of 72-75c while running Cuda WU.

regards, Gizbar.



A proud GPU User Server Donor!
ID: 947430 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 947449 - Posted: 15 Nov 2009, 15:02:25 UTC - in response to Message 947430.  
Last modified: 15 Nov 2009, 15:05:36 UTC

Hi Sutaru, just seen your message.

My GPUz reports correctly AFAIK. It reports 680 core, 1500 shaders, and 2500 ram. Gigabyte claim that this is because they cherry-pick the GPU's. I must admit I never looked at the Gigabyte website. Just checked, and it does say 1466, but definitely reporting 1500 from GPUz.

The website I bought it from (Overclockers UK) stated 650 core, 1500 shaders and 2500 ram, which is why I originally went for the EVGA at 675/1466/1152, plus they said 10 year warranty. It was only when I checked the details in the description that I found it was 680/1500/2500.

Not sure why it's reported differently, it's only 12Mhz, reporting error? Or maybe it it running just that shade higher, slight over-overclock?

Where do you get the information on the speed in Cuda App? I checked Boinc Manager and a completed task, and didn't see the speed reported in either.
[...]


Thanks for reply!

Every GPU manufacturer which sell OCed GPUs cherry-pick the chips..

GPU-Z report for my GIGABYTE GTX260(-216) SOC 680/1500/1250 [GPU/shader/RAM].

It's manufacturer OCed, no additional selfmade OC.

The opt._CUDA_6.08_V12_app report 1512 MHz shader speed.
Here an example of my GPU:
[http://setiathome.berkeley.edu/result.php?resultid=1419718179]
clockRate = 1512000

Here an example of your GPU:
[http://setiathome.berkeley.edu/result.php?resultid=1418978902]
clockRate = 1500000


Maybe there is a BUG somewhere..
One prog must report the wrong shader speed.

Maybe others could report also their GPU-Z and CUDA_app shader speed, if there are differences?

ID: 947449 · Report as offensive
Profile gizbar
Avatar

Send message
Joined: 7 Jan 01
Posts: 586
Credit: 21,087,774
RAC: 0
United Kingdom
Message 947460 - Posted: 15 Nov 2009, 16:46:22 UTC - in response to Message 947449.  


Thanks for reply!

Every GPU manufacturer which sell OCed GPUs cherry-pick the chips..

GPU-Z report for my GIGABYTE GTX260(-216) SOC 680/1500/1250 [GPU/shader/RAM].

It's manufacturer OCed, no additional selfmade OC.

The opt._CUDA_6.08_V12_app report 1512 MHz shader speed.
Here an example of my GPU:
[http://setiathome.berkeley.edu/result.php?resultid=1419718179]
clockRate = 1512000

Here an example of your GPU:
[http://setiathome.berkeley.edu/result.php?resultid=1418978902]
clockRate = 1500000


Maybe there is a BUG somewhere..
One prog must report the wrong shader speed.

Maybe others could report also their GPU-Z and CUDA_app shader speed, if there are differences?


Hi again.

Found what you were looking at just before I got called into work for an emergency.

Mine definitely does say that it is running at correct speed i.e. 1500 on both GPUz and Cuda_app

I don't try to overclock it any more than it is already either.

Maybe we need to start a new thread, and encourage people to post their findings? And does it make a difference at all? For example, if the speed is being read wrongly, does that mean the flops calculation is wrong too?

FYI, I'm running Boinc 6.10.17, and Lunatics 32bit v0.2, on Windows 7 64bit. I had problems with the 64bit Lunatics, and I'm still not sure why, so I went back to that one because I knew it worked.

regards, Gizbar.




A proud GPU User Server Donor!
ID: 947460 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Noobie CUDA GPU temperature worry


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.