Can a CUDA core burn out?

Author	Message
shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1173826 - Posted: 26 Nov 2011, 8:57:08 UTC Or better yet, a group of cores? My 16core would take about 7000 secs to do a 100+ credit task. Now it's taking 12000:( ID: 1173826 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 1173832 - Posted: 26 Nov 2011, 9:46:43 UTC - in response to Message 1173826. Or better yet, a group of cores? My 16core would take about 7000 secs to do a 100+ credit task. Now it's taking 12000:( If its not failing them it sounds more like its downclocked or has fallen back to using the cpu. Can you provide a link to the wu? ID: 1173832 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1173840 - Posted: 26 Nov 2011, 11:00:35 UTC - in response to Message 1173832. Last modified: 26 Nov 2011, 11:02:27 UTC Or better yet, a group of cores? My 16core would take about 7000 secs to do a 100+ credit task. Now it's taking 12000:( If its not failing them it sounds more like its downclocked or has fallen back to using the cpu. Can you provide a link to the wu? Or perhaps he is falling victim to the fact that WUs seem to be getting less credit now than fairly recently (my APs of similar length have gone (mostly) from the mid-700 credits to the 600s). So to get 100 credits, he would have to compute for longer now. Which is causing the RAC on both my machines to slowly decrease even though they have been running solid over the last few weeks, with no downtime due to running out of WUs. Does that make sense? ID: 1173840 ·

Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0	Message 1173854 - Posted: 26 Nov 2011, 12:14:49 UTC - in response to Message 1173840. Or better yet, a group of cores? My 16core would take about 7000 secs to do a 100+ credit task. Now it's taking 12000:( If its not failing them it sounds more like its downclocked or has fallen back to using the cpu. Can you provide a link to the wu? Or perhaps he is falling victim to the fact that WUs seem to be getting less credit now than fairly recently (my APs of similar length have gone (mostly) from the mid-700 credits to the 600s). So to get 100 credits, he would have to compute for longer now. Which is causing the RAC on both my machines to slowly decrease even though they have been running solid over the last few weeks, with no downtime due to running out of WUs. Does that make sense? You might well be right. I'm running my 'main' for longer and my RAC has dropped back quite a bit, just in the last few weeks. I may get the odd 'reward' which is quite nice, but in the main, its working longer for less...a bit like pensions in the UK! lol Don't take life too seriously, as you'll never come out of it alive! ID: 1173854 ·

shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1173875 - Posted: 26 Nov 2011, 14:59:11 UTC It's not one WU in particular, it's all of them. And as far as I can tell it's been going on for about three weeks. The card just seems to be running at half speed (fortunately without producing any errors). I haven't caught it downclocking, the temps are normal and so is the GPU load. I'm pretty sure it's got nothing to do with how credit is granted since "Estimated task size in GFLOPs" vs eventual granted credit look unchanged. I just hope there's nothing wrong with the GPU. What about the GPU memory? Could that be half gone? ID: 1173875 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1173879 - Posted: 26 Nov 2011, 15:20:30 UTC Get hold of a copy of GPU-Z, a little free utility that allows you to monitor a GPU in real time. This will show what speed the various bits of the GPU are running at, how much of the GPU's resources you are using, and what temperature it is running at. It will also report the driver version you are actually using - some are known to give the sort of problems you are reporting. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1173879 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1173881 - Posted: 26 Nov 2011, 15:36:52 UTC - in response to Message 1173875. Last modified: 26 Nov 2011, 15:38:24 UTC It's not one WU in particular, it's all of them. And as far as I can tell it's been going on for about three weeks. The card just seems to be running at half speed (fortunately without producing any errors). I haven't caught it downclocking, the temps are normal and so is the GPU load. I'm pretty sure it's got nothing to do with how credit is granted since "Estimated task size in GFLOPs" vs eventual granted credit look unchanged. I just hope there's nothing wrong with the GPU. What about the GPU memory? Could that be half gone? I haven't heard about partially burned out CUDA-Cored or parts of memory, yet, also 'sounds' a bit weird. It could be possible, memory or CUDA cores, (SHADER's), but haven't heard or seen an example of this. I experienced, some drop-down of GPU; core/memory/shader-speed, depending of the driver used. And particular DC projects don't work flawlessly, with SETI's, (optimized?) and (BÃªta?) app.'s for ATI (2x EAH5870's), even causes memory (waiting?) problems and crashes. I've, for the moment, only CPU Projects, Rosetta, LHC and Malaria Control , installed, Einstein@home (whitout GPU! No work from Milkyway, don't like Collatz C.(personal view!) Installed on this host. After installing BOINC(6.12.34; 64bit), again on a new HDD, first new install of WIN 7(64bit) ofcoarse on a new set of SAMSUNG HD503HJ(500GB) and 2 HD103SJ (1TB) and 2 500GB USB 3.0 (really high througput of ~100MB/s to 150MB/s, from SATA II (system-disks) to USB 3.0, linked HDD. Also using ReadyBoost, through a 133MB/s SD card of 8 GByte, limitting HDD access! I, also upped the system clock from 100MHz to 102MHz., it runs 3.6GHz and 3.9GHz. if 2 cores are used, turbo-mode System is stable, ran a variety of BenchMark Tests. In the next week I'll install SETI (BÃªta) again and hope all goes well and behaves well,,,,,.....;-). Sorry, if I slipped too much off topic, but life-time expectencies may significantly drop to unwanted values, when Non-FERMI cards are used 100%, 24x 7, cooling and sufficient power, is a necessity, in avoiding faults and burned out cores/shaders or memory, useally affecting [b]all of the cards contents. Also apologies to my wingmates, whom had to wait longer, cause a lot of tasks timed out or were faulty! ID: 1173881 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1174013 - Posted: 27 Nov 2011, 7:16:42 UTC - in response to Message 1173875. The card just seems to be running at half speed ... Why "seems"? Use MSI Afterburner or EVGA Precision to check and report the exact speeds (MHz) of GPU Core/Shader/Memory http://setiathome.berkeley.edu/forum_thread.php?id=64917&nowrap=true#1173453 GPU-Z also reports them: http://www.techpowerup.com/gpuz/ Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1174013 ·

shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1174069 - Posted: 27 Nov 2011, 15:37:54 UTC - in response to Message 1174013. I'm using GPU-Z and even tried CUDA-Z A couple of things I just noticed and am not sure of: GPU-Z says Memory Size 512MB then in the Sensors tab says 180MB Memory Usage (Dedicated). Is this normal? CUDA-Z says 2 multiprocessors. Anybody know what that means? Oh, and thank you all for your replies! ID: 1174069 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1174332 - Posted: 28 Nov 2011, 18:43:35 UTC - in response to Message 1174069. I'm willing to bet this is probably due to a nVidia driver update. Perhaps one of the nVidia folks could recommend an older driver that would maximize his GPU cycles. I assume this is an onboard chip so you'll have dedicated video thats no biggy. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1174332 ·

shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1174434 - Posted: 29 Nov 2011, 0:34:11 UTC I'll offer you indispensable troubleshooters a fresh angle at the problem I'm facing: Event log - 35 GFLOPS peak Task properties - Estimated app speed 13.62 GFLOPs/sec Any ideas? And for the life of me, I can't catch my GPU downclocking no matter how hard I try! ID: 1174434 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1174440 - Posted: 29 Nov 2011, 0:51:57 UTC - in response to Message 1174434. Last modified: 29 Nov 2011, 0:52:44 UTC Event log - 35 GFLOPS peak Task properties - Estimated app speed 13.62 GFLOPs/sec Any ideas? the 'GFlops peak' are theoretical 'marketing Flops' and only acheivable in code under specific, not very realistic conditions. Typical real world thoughput of a little below half that is about right. When you've been comparing task elapsed times, have you been taking the task "Angle Range' into consideration ? Processing times vary a lot depending on the rate the telescope was moving during the observations, so comparing tasks with roughly the same angle range is essential. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1174440 ·

shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1174500 - Posted: 29 Nov 2011, 8:37:38 UTC - in response to Message 1174440. Thanx Jason, I know nVIDIA's numbers are "creative" (much like stereo Watts and TFT contrast ratios, to name a few), and even though I can't remember the old numbers, I was under the impression I was running higher (25 GFLOPS, for example). But if you say less than half is normal, then that's good enough for me. As for the angles, if you are referring to vlars, SETI doesn't send those to my GPU. For all intents and purposes I consider this question answered, that pretty much "No, a CUDA core/group of cores cannot burn out" (not without the card producing errors, anyway). Skildude is probably right, it may be driver related. If/when I figure out what I messed up, I'll post an update. Thanx everybody! ID: 1174500 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1174541 - Posted: 29 Nov 2011, 14:50:05 UTC - in response to Message 1174500. Last modified: 29 Nov 2011, 14:52:09 UTC Thanx Jason, I know nVIDIA's numbers are "creative" (much like stereo Watts and TFT contrast ratios, to name a few), and even though I can't remember the old numbers, I was under the impression I was running higher (25 GFLOPS, for example). But if you say less than half is normal, then that's good enough for me. As for the angles, if you are referring to vlars, SETI doesn't send those to my GPU. For all intents and purposes I consider this question answered, that pretty much "No, a CUDA core/group of cores cannot burn out" (not without the card producing errors, anyway). Skildude is probably right, it may be driver related. If/when I figure out what I messed up, I'll post an update. Thanx everybody! NVIDIA ION GPU. (Some added information) NVIDIA's number on CUDA-cores, aren't "creative" ;-) and you certainly can't compaire them to "Watt's on AUDIO-Devices", which handle, R.M.S.* (Sinus) Power and a number of irrelevant # of Watt's, i.e. Music Power, total Watt's, etc. An (audio) Watt is (OutPut)Voltage devided by the impedance (Ohm), measured when the device is not clipping, i.e. running from max. V to min. Volt. And a given distorsion, but this is getting way out off TOPIC :) * Root Mean Square. ID: 1174541 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1174612 - Posted: 30 Nov 2011, 0:12:02 UTC - in response to Message 1174541. Last modified: 30 Nov 2011, 1:06:40 UTC When comparing with "stereo Watts" ("Music Power") he was talking about the advertised (inflated/synthetic/theoretical) GFLOPS - Not about the number of CUDA-cores or multiprocessors. Here is a table for Multiprocessors / CUDA cores for different GPUs: http://www.geeks3d.com/20100606/gpu-computing-nvidia-cuda-compute-capability-comparative-table/ For Fermi: 1 Multiprocessor = 32 or 48 CUDA cores (depends on model - Compute Capability 2.0 or 2.1) For older CUDA GPUs: 1 Multiprocessor = 8 CUDA cores More info (PDF): http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf " Fermi: Third Generation Streaming Multiprocessor (SM) 32 CUDA cores per SM, 4x over GT200 " http://en.wikipedia.org/wiki/CUDA Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1174612 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.