GTX 690 conundrum - only Seti Multibeam don't run on GPUs

Author	Message
Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 1288249 - Posted: 27 Sep 2012, 0:01:05 UTC Last modified: 27 Sep 2012, 0:12:44 UTC Both BOINC and science app correctly discover all GTX 690 GPUs, but computing doesn't event start on GPUs! BOINC 7.0.28: 26/09/2012 23:51:10 \| \| NVIDIA GPU 0: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 8382373MB available, 3132 GFLOPS peak) 26/09/2012 23:51:10 \| \| NVIDIA GPU 1: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1957MB available, 3132 GFLOPS peak) 26/09/2012 23:51:10 \| \| NVIDIA GPU 2: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1952MB available, 3132 GFLOPS peak) 26/09/2012 23:51:10 \| \| NVIDIA GPU 3: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1957MB available, 3132 GFLOPS peak) 26/09/2012 23:51:10 \| \| OpenCL: NVIDIA GPU 0: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 8382373MB available) 26/09/2012 23:51:10 \| \| OpenCL: NVIDIA GPU 1: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1957MB available) 26/09/2012 23:51:10 \| \| OpenCL: NVIDIA GPU 2: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1952MB available) 26/09/2012 23:51:10 \| \| OpenCL: NVIDIA GPU 3: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1957MB available) It assigns to each GPU, but every task on every GPU terminates shortly after it is supposed to load on GPU. With or without app_info the same happens - the app seemingly sees all is well. Here stderr from default app: setiathome_CUDA: Found 4 CUDA device(s): Device 1 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 Device 2 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 Device 3 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 Device 4 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 690 is okay SETI@home using CUDA accelerated device GeForce GTX 690 setiathome_enhanced 6.09 Visual Studio/Microsoft C++ libboinc: 6.3.22 Work Unit Info: ............... WU true angle range is : 2.570755 Optimal function choices: ----------------------------------------------------- name ----------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.00010 0.00000 v_ChirpData 0.00581 0.00000 v_Transpose4 0.00163 0.00000 FPU opt folding 0.00116 0.00000 The task initiates on CPU, but as soon as it's assigned to GPU it terminates. I've set up so many crunchers and never experienced this. I'm unable to see what exactly is failing, except that MB CUDA is. Astropulse is working normally, but is OpenCL. The motherboard, and CPU is brand new, and I have no such problem running either Milkyway or Einstein, so what's the tweak here? Both GTX 690 cards have been busy on another host, successfully crunching thousands of tasks. PS: no tasks reported until this is resolved. EDIT: Even more strange is that the SetiPerformance64.exe and Lunatics_x41g is working without a hitch on all GPUs! Morten Ross ID: 1288249 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1288254 - Posted: 27 Sep 2012, 0:14:15 UTC - in response to Message 1288249. There is a known incompatibility between Legacy Cuda 3.0 applications (stock 6.10) and Cuda 5 drivers. There are several options: - go back to a Cuda 4.2 driver (Where available) - Use a third party Cuda 3.2 or newer build - Use newest Cuda 5 WHQL driver and set a system environment variable, CUDA_GRID_SIZE_COMPAT=1 The last is a workaround to a quality problem found in Cuda 3's CUFFT DLL, that was introduced in release Cuda 5 drivers, after I reported the issue to nVidia. With multibeam v7 on the near horizon, it's unlikely this will be addressed in stock V6 builds. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1288254 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 1288484 - Posted: 27 Sep 2012, 16:55:24 UTC - in response to Message 1288254. There is a known incompatibility between Legacy Cuda 3.0 applications (stock 6.10) and Cuda 5 drivers. There are several options: - go back to a Cuda 4.2 driver (Where available) - Use a third party Cuda 3.2 or newer build - Use newest Cuda 5 WHQL driver and set a system environment variable, CUDA_GRID_SIZE_COMPAT=1 The last is a workaround to a quality problem found in Cuda 3's CUFFT DLL, that was introduced in release Cuda 5 drivers, after I reported the issue to nVidia. With multibeam v7 on the near horizon, it's unlikely this will be addressed in stock V6 builds. Jason Thanks for your suggestions Jason. Unfortunately it is not that easy. It also happens on latest 306.23 and latest science app and app_info. I have now eliminated all components and arrived at a corrupt Windows or BOINC install. The disk/OS installation previously hosted the same CPU - i7-3930K and motherboard - Rampage IV Formula. I RMA-ed the board due to this CUDA-specific behaviour as well as frequent system freezes. There was nothing wrong with the board so I RMA-ed the CPU. The CPU was indeed faulty and I got a new one. My first actually, and next time I wil RMA both CPU and board to save lots of time. When I fired up the system with the new CPU I was really annoyed that the problem was just the same regarding CUDA tasks not hitting the GPU. I therefore tested CUDA/MB tasks with several backups of BOINC directory, but all failed the same way - with or without App_info/default app. The last component I did not consider until now was the software/OS, so I set up a new disk and that has resolved the problem, but not the conundrum! What can become so selectively corrupt on a Windows installation? It was running with a bad CPU, but again, how can it selectively corrupt the CUDA? Morten Ross ID: 1288484 ·

Arvid Almstrom Send message Joined: 23 Mar 00 Posts: 98 Credit: 137,331,372 RAC: 0	Message 1289009 - Posted: 28 Sep 2012, 20:23:11 UTC - in response to Message 1288484. I have read many threads where people have been trying to get 4 x 690's to work and have not succeeded due to hardware resource limits. In one thread, cant remember who it was sorry, he had been getting custom firmware from nVidia / card manufacturer to try an overcome this problem, but that only killed his card. Have you tried to remove one of the cards and see if 3 x 690's would work, it would then save you from re-installing Windows. Just a thought. Arvid Arvid Almstrom ID: 1289009 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 1289058 - Posted: 28 Sep 2012, 21:49:19 UTC - in response to Message 1289009. Hi Arvid, As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical. In this case a new OS installation on a new disk resolved the problem. It seems I had 2 root causes: 1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new). 2: a bad disk controller on the hard disk itself. After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007. As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager. Morten Ross ID: 1289058 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 1289074 - Posted: 28 Sep 2012, 22:34:27 UTC - in response to Message 1289062. Hi Arvid, As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical. In this case a new OS installation on a new disk resolved the problem. It seems I had 2 root causes: 1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new). 2: a bad disk controller on the hard disk itself. After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007. As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager. Ok, then what short of a Nuclear power plant are You powering this PC with? I'm slowly building a 590 version, as quickly as My present income allows that is... I've daisy-chained 2 PSUs, as I have a surplus of PSUs at my disposal. One Corsair 1200 and one 950W from another manufacturer. If you're worried about space and what case that will hold all your gear - I recommend opting for a test bench from DimasTech. It will give optimal accessability and plenty of air, and you can decide whether to have the model that holds PSUs, or one that only holds the board and GTXs and then leave the PSUs on the side. Morten Ross ID: 1289074 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13918 Credit: 208,696,464 RAC: 304	Message 1289092 - Posted: 29 Sep 2012, 0:14:10 UTC - in response to Message 1289084. They should support 32 GB of RAM. Corsair's newest memory kit (or card) series supports 8 GB of RAM on one such kit. You will need 4 such kits in order to utilize a new motherboard to its maximum memory capacity. Why bother? Unless you're doing some serious Photoshop work the system won't make use of much more than 6GB. I've got 8GB in my present system & even with several browsers open, multiple tabs in each, the odd video playing & Seti crunching away the system is only using 2.3GB, another 2.3GB is being used for system caching leaving 3.5GB completely unused. If it's a dual channel system, then get 4 * 2GB modules or 2 * 4GB modules. If it's a 3 channel system then 3 * 2 GB modules, or if a 4 channels system then 4 * 2GB. Unless you're using a progamme such as Photoshop (or similar) that will actually use the extra RAM, any more than 6GB generally goes unused. Grant Darwin NT ID: 1289092 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.