Message boards :
Number crunching :
Noobie CUDA GPU temperature worry
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
... a single error using CUDA for SETI crunching can cause a computation error or maybe prevent your unit validating. I agree with you, but few errors will not do any harm to SET as project. Result will be sent to another comp, and you have choice: to buy a new card or use that for playing :) I am cruncher :) I LOVE SETI BOINC :) |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Let some of you say to me: how manufacturer of card may know that you use your GPU for CUDA, and not for playing? :)))) Exactly the problem I have with my XFX GTX295. Although I told the supplier (Scan computers) that it was used for CUDA processing and how I had localised the fault (using MemtestG80) they tested only with Games and reported back NFF. After a discussion on the phone during which I reiterated HOW I had located the fault, they agreed to return it to XFX anyway and XFX have also reported NFF. Having got it back, I can still get MemtestG80 to log a memory fault in less than 500 iterations. I have now loaded the latest NVidia drivers and it is throwing only 2 or 3 errors per day (all from GPU #2) on S@H and non on Milkyway so I will keep it crunching (with apologies to my wingmen) until it REALLY falls over in the hope that that will happen before the warranty expires. F. |
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
Every computer part can be defective. But how many card work flawless? That is question that nobody asks? Since CUDA is stress for GPU it will like in your case be half defective, but I think it is only small percent of all cards. So we can say: you just have bad luck. I am cruncher :) I LOVE SETI BOINC :) |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
I agree that it is bad luck :-( but I would make 2 points: 1. The GTX295 crunched perfectly for over 3 months (no errors at all) before going faulty. Note that, despite the noise, I keep the GPU fan running at 100% while crunching to control the temps - and have taken the side off the case until I get round to cutting a hole in the side and adding a 120mm fan directly over the graphics card. 2. Almost invariably, where a CUDA cruncher is returning errors from a GTX295 card, all the errors will come from the same GPU (and, curiously it almost always seems to be GPU#2). I would bet that all, like mine, are the older 2-board version - so I believe there is a fundamental flaw with the older 295's when used for crunching. F. |
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
... So another processor GPU#1 is working ok? He also crunch 24/7, but he is not broken? That is what I say to you: bad luck :( Nothing is perfect, and we all know that I am cruncher :) I LOVE SETI BOINC :) |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
... Yes - but it is the kind of bad luck that RMA was designed for (or so I thought!!) F. |
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
Hope that new board revisions as chip revisions will be better. I am cruncher :) I LOVE SETI BOINC :) |
hiamps Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0 |
Small add-on : The kids came in complaining thier computer no longer worked. I checked my boinc computer list and seti was still running but no video. They said it hadn't worked for a couple days. I opened it up and it was so full of dust and a huge dust bunny had clogged the video card all up on an ATI hd4350. I blew it out but figured it was dead, to my surprise it booted right up and works great. The kids are back to games and youtube. Official Abuser of Boinc Buttons... And no good credit hound! |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Hi Sutaru, I know your post is quite an old one, but here in the UK, EVGA are also offering a 10 year warranty on their GTX260's. I actually ordered one because of the warranty, and because it was clocked a lot higher than a standard GTX260, but the online supplier took too many orders for it and it went out of stock. So he let me change to the Gigabyte super-overclock version, which was clocked even higher than the EVGA, but only comes with a 3 year warranty. I'd check direct with EVGA if I were you, and the place you purchased them from. I hope you haven't purchased a 2 year warranty extension for nothing... AFAIK, IIRC, I looked on the US EVGA site and there they wrote 10 year warranty only for USA and Canada. Maybe will look to the european EVGA site (also in english), maybe they write there also in Europe.. ;-) But I hope I'll never need the warranty. ;-) Yes, I looked around and GIGABYTE have the highest OC for a GTX260-216. I bought the GIGABYTE GTX260(-216) SOC. It's a new release. But I'm little confused. A stock GTX260-216 have 576/1242/999 [GPU/shader/RAM]. The EVGA GTX260-216 SSC have 675/1458/1152. The GIGABYTE GTX260(-216) SOC have on the GIGABYTE site: 680/1466/1250. In a report in internet and GPU-Z say: 680/1500/1250. The opt._CUDA_6.08_V12_app say 1512 shader. I looked to your PC, you have also the GIGABYTE GTX260(-216) SOC? The CUDA_app say at yours 'only' 1500 shader speed. Hmm.. confused.. what's correct? What should you/I believe? If we have both the same GPU, why I have different shader speed in the CUDA_app output? Also why we have higher speed at the shader, than the GIGABYTE site say? What you see if you look with GPU-Z? |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Ahh.. BTW.. I can compare the EVGA GTX260-216 SSC and the GIGABYTE GTX260(-216) SOC.. ;-) The EVGA have at ~ 82 °C GPU - ~ 60 % RPM fan speed. The GIGABYTE have at ~ 82 °C GPU - ~ 40 % RPM fan speed. So the GIGABYTE is quiet but hotter at same ambient. |
gizbar Send message Joined: 7 Jan 01 Posts: 586 Credit: 21,087,774 RAC: 0 |
Hi Sutaru, just seen your message. My GPUz reports correctly AFAIK. It reports 680 core, 1500 shaders, and 2500 ram. Gigabyte claim that this is because they cherry-pick the GPU's. I must admit I never looked at the Gigabyte website. Just checked, and it does say 1466, but definitely reporting 1500 from GPUz. The website I bought it from (Overclockers UK) stated 650 core, 1500 shaders and 2500 ram, which is why I originally went for the EVGA at 675/1466/1152, plus they said 10 year warranty. It was only when I checked the details in the description that I found it was 680/1500/2500. Not sure why it's reported differently, it's only 12Mhz, reporting error? Or maybe it it running just that shade higher, slight over-overclock? Where do you get the information on the speed in Cuda App? I checked Boinc Manager and a completed task, and didn't see the speed reported in either. EVGA in europe state this on their website http://www.evga.de/warranty/ , which might mean it's too late to get the warranty extended for free. Oh, and my GTX260 reports GPU temp of 72-75c while running Cuda WU. regards, Gizbar. A proud GPU User Server Donor! |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Hi Sutaru, just seen your message. Thanks for reply! Every GPU manufacturer which sell OCed GPUs cherry-pick the chips.. GPU-Z report for my GIGABYTE GTX260(-216) SOC 680/1500/1250 [GPU/shader/RAM]. It's manufacturer OCed, no additional selfmade OC. The opt._CUDA_6.08_V12_app report 1512 MHz shader speed. Here an example of my GPU: [http://setiathome.berkeley.edu/result.php?resultid=1419718179] clockRate = 1512000 Here an example of your GPU: [http://setiathome.berkeley.edu/result.php?resultid=1418978902] clockRate = 1500000 Maybe there is a BUG somewhere.. One prog must report the wrong shader speed. Maybe others could report also their GPU-Z and CUDA_app shader speed, if there are differences? |
gizbar Send message Joined: 7 Jan 01 Posts: 586 Credit: 21,087,774 RAC: 0 |
Hi again. Found what you were looking at just before I got called into work for an emergency. Mine definitely does say that it is running at correct speed i.e. 1500 on both GPUz and Cuda_app I don't try to overclock it any more than it is already either. Maybe we need to start a new thread, and encourage people to post their findings? And does it make a difference at all? For example, if the speed is being read wrongly, does that mean the flops calculation is wrong too? FYI, I'm running Boinc 6.10.17, and Lunatics 32bit v0.2, on Windows 7 64bit. I had problems with the 64bit Lunatics, and I'm still not sure why, so I went back to that one because I knew it worked. regards, Gizbar. A proud GPU User Server Donor! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.