GTX 690 conundrum - only Seti Multibeam don't run on GPUs


log in

Advanced search

Message boards : Number crunching : GTX 690 conundrum - only Seti Multibeam don't run on GPUs

Author Message
Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1288249 - Posted: 27 Sep 2012, 0:01:05 UTC
Last modified: 27 Sep 2012, 0:12:44 UTC

Both BOINC and science app correctly discover all GTX 690 GPUs, but computing doesn't event start on GPUs!

BOINC 7.0.28:
26/09/2012 23:51:10 | | NVIDIA GPU 0: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 8382373MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | NVIDIA GPU 1: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1957MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | NVIDIA GPU 2: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1952MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | NVIDIA GPU 3: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1957MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 0: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 8382373MB available)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 1: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1957MB available)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 2: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1952MB available)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 3: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1957MB available)

It assigns to each GPU, but every task on every GPU terminates shortly after it is supposed to load on GPU.

With or without app_info the same happens - the app seemingly sees all is well.

Here stderr from default app:

setiathome_CUDA: Found 4 CUDA device(s): Device 1 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 Device 2 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 Device 3 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 Device 4 : GeForce GTX 690 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1019500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 8 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 690 is okay SETI@home using CUDA accelerated device GeForce GTX 690 setiathome_enhanced 6.09 Visual Studio/Microsoft C++ libboinc: 6.3.22 Work Unit Info: ............... WU true angle range is : 2.570755 Optimal function choices: ----------------------------------------------------- name ----------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.00010 0.00000 v_ChirpData 0.00581 0.00000 v_Transpose4 0.00163 0.00000 FPU opt folding 0.00116 0.00000

The task initiates on CPU, but as soon as it's assigned to GPU it terminates.

I've set up so many crunchers and never experienced this.

I'm unable to see what exactly is failing, except that MB CUDA is. Astropulse is working normally, but is OpenCL.

The motherboard, and CPU is brand new, and I have no such problem running either Milkyway or Einstein, so what's the tweak here?

Both GTX 690 cards have been busy on another host, successfully crunching thousands of tasks.

PS: no tasks reported until this is resolved.

EDIT:

Even more strange is that the SetiPerformance64.exe and Lunatics_x41g is working without a hitch on all GPUs!
____________
Morten Ross

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5051
Credit: 73,799,928
RAC: 12,934
Australia
Message 1288254 - Posted: 27 Sep 2012, 0:14:15 UTC - in response to Message 1288249.

There is a known incompatibility between Legacy Cuda 3.0 applications (stock 6.10) and Cuda 5 drivers.

There are several options:
- go back to a Cuda 4.2 driver (Where available)
- Use a third party Cuda 3.2 or newer build
- Use newest Cuda 5 WHQL driver and set a system environment variable, CUDA_GRID_SIZE_COMPAT=1

The last is a workaround to a quality problem found in Cuda 3's CUFFT DLL, that was introduced in release Cuda 5 drivers, after I reported the issue to nVidia. With multibeam v7 on the near horizon, it's unlikely this will be addressed in stock V6 builds.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1288484 - Posted: 27 Sep 2012, 16:55:24 UTC - in response to Message 1288254.

There is a known incompatibility between Legacy Cuda 3.0 applications (stock 6.10) and Cuda 5 drivers.

There are several options:
- go back to a Cuda 4.2 driver (Where available)
- Use a third party Cuda 3.2 or newer build
- Use newest Cuda 5 WHQL driver and set a system environment variable, CUDA_GRID_SIZE_COMPAT=1

The last is a workaround to a quality problem found in Cuda 3's CUFFT DLL, that was introduced in release Cuda 5 drivers, after I reported the issue to nVidia. With multibeam v7 on the near horizon, it's unlikely this will be addressed in stock V6 builds.

Jason


Thanks for your suggestions Jason.

Unfortunately it is not that easy.

It also happens on latest 306.23 and latest science app and app_info.

I have now eliminated all components and arrived at a corrupt Windows or BOINC install.

The disk/OS installation previously hosted the same CPU - i7-3930K and motherboard - Rampage IV Formula. I RMA-ed the board due to this CUDA-specific behaviour as well as frequent system freezes.

There was nothing wrong with the board so I RMA-ed the CPU. The CPU was indeed faulty and I got a new one. My first actually, and next time I wil RMA both CPU and board to save lots of time.

When I fired up the system with the new CPU I was really annoyed that the problem was just the same regarding CUDA tasks not hitting the GPU.

I therefore tested CUDA/MB tasks with several backups of BOINC directory, but all failed the same way - with or without App_info/default app.

The last component I did not consider until now was the software/OS, so I set up a new disk and that has resolved the problem, but not the conundrum!

What can become so selectively corrupt on a Windows installation? It was running with a bad CPU, but again, how can it selectively corrupt the CUDA?
____________
Morten Ross

Profile Arvid Almstrom
Avatar
Send message
Joined: 23 Mar 00
Posts: 98
Credit: 137,325,665
RAC: 3,845
Australia
Message 1289009 - Posted: 28 Sep 2012, 20:23:11 UTC - in response to Message 1288484.

I have read many threads where people have been trying to get 4 x 690's to work and have not succeeded due to hardware resource limits.

In one thread, cant remember who it was sorry, he had been getting custom firmware from nVidia / card manufacturer to try an overcome this problem, but that only killed his card.

Have you tried to remove one of the cards and see if 3 x 690's would work, it would then save you from re-installing Windows.

Just a thought.

Arvid
____________
Arvid Almstrom

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1289058 - Posted: 28 Sep 2012, 21:49:19 UTC - in response to Message 1289009.

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.
____________
Morten Ross

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46489
Credit: 36,835,693
RAC: 5,125
United States
Message 1289062 - Posted: 28 Sep 2012, 21:58:25 UTC - in response to Message 1289058.

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.

Ok, then what short of a Nuclear power plant are You powering this PC with? I'm slowly building a 590 version, as quickly as My present income allows that is...
____________
My Facebook, War Commander, 2015

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1289074 - Posted: 28 Sep 2012, 22:34:27 UTC - in response to Message 1289062.

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.

Ok, then what short of a Nuclear power plant are You powering this PC with? I'm slowly building a 590 version, as quickly as My present income allows that is...


I've daisy-chained 2 PSUs, as I have a surplus of PSUs at my disposal. One Corsair 1200 and one 950W from another manufacturer.

If you're worried about space and what case that will hold all your gear - I recommend opting for a test bench from DimasTech. It will give optimal accessability and plenty of air, and you can decide whether to have the model that holds PSUs, or one that only holds the board and GTXs and then leave the PSUs on the side.




____________
Morten Ross

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46489
Credit: 36,835,693
RAC: 5,125
United States
Message 1289079 - Posted: 28 Sep 2012, 22:55:57 UTC - in response to Message 1289074.

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.

Ok, then what short of a Nuclear power plant are You powering this PC with? I'm slowly building a 590 version, as quickly as My present income allows that is...


I've daisy-chained 2 PSUs, as I have a surplus of PSUs at my disposal. One Corsair 1200 and one 950W from another manufacturer.

If you're worried about space and what case that will hold all your gear - I recommend opting for a test bench from DimasTech. It will give optimal accessability and plenty of air, and you can decide whether to have the model that holds PSUs, or one that only holds the board and GTXs and then leave the PSUs on the side.


I live in a dusty area and I'm going to water cool My Monster cruncher, I have a HAF-X case and I need to protect the PCs innards from My Cat, She likes cables to chew on... I'm either going to use a 1300w Rosewill Lightning psu and an FSP 450w Booster X5 video card psu for the PC or an EVGA Nex1500, but that 1500 needs to be on 230vac before it truly comes into its own at 136A on the 12v rail and that thing @ $450 isn't inexpensive... As to motherboards I have an Asus Rampage 3 Extreme and an i7 940 cpu, plus the blocks too. I just lack the 590's to equip Her with, plus some other bits.
____________
My Facebook, War Commander, 2015

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5861
Credit: 60,359,770
RAC: 48,461
Australia
Message 1289092 - Posted: 29 Sep 2012, 0:14:10 UTC - in response to Message 1289084.

They should support 32 GB of RAM. Corsair's newest memory kit (or card) series supports 8 GB of RAM on one such kit. You will need 4 such kits in order to utilize a new motherboard to its maximum memory capacity.

Why bother?
Unless you're doing some serious Photoshop work the system won't make use of much more than 6GB. I've got 8GB in my present system & even with several browsers open, multiple tabs in each, the odd video playing & Seti crunching away the system is only using 2.3GB, another 2.3GB is being used for system caching leaving 3.5GB completely unused.
If it's a dual channel system, then get 4 * 2GB modules or 2 * 4GB modules. If it's a 3 channel system then 3 * 2 GB modules, or if a 4 channels system then 4 * 2GB.

Unless you're using a progamme such as Photoshop (or similar) that will actually use the extra RAM, any more than 6GB generally goes unused.
____________
Grant
Darwin NT.

Message boards : Number crunching : GTX 690 conundrum - only Seti Multibeam don't run on GPUs

Copyright © 2014 University of California