GTX 690 conundrum - only Seti Multibeam don't run on GPUs

留言板 : Number crunching : GTX 690 conundrum - only Seti Multibeam don't run on GPUs
留言板合理

To post messages, you must log in.

作者消息
Grant (SSSF)
志愿者测试人员

发送消息
已加入:19 Aug 99
贴子:13028
积分:208,696,464
近期平均积分:304
Australia
消息 1289092 - 发表于:29 Sep 2012, 0:14:10 UTC - 回复消息 1289084.  

They should support 32 GB of RAM. Corsair's newest memory kit (or card) series supports 8 GB of RAM on one such kit. You will need 4 such kits in order to utilize a new motherboard to its maximum memory capacity.

Why bother?
Unless you're doing some serious Photoshop work the system won't make use of much more than 6GB. I've got 8GB in my present system & even with several browsers open, multiple tabs in each, the odd video playing & Seti crunching away the system is only using 2.3GB, another 2.3GB is being used for system caching leaving 3.5GB completely unused.
If it's a dual channel system, then get 4 * 2GB modules or 2 * 4GB modules. If it's a 3 channel system then 3 * 2 GB modules, or if a 4 channels system then 4 * 2GB.

Unless you're using a progamme such as Photoshop (or similar) that will actually use the extra RAM, any more than 6GB generally goes unused.
Grant
Darwin NT
ID: 1289092 · 举报违规帖子
Profile zoom3+1=4
志愿者测试人员
Avatar

发送消息
已加入:30 Nov 03
贴子:63296
积分:55,293,173
近期平均积分:49
United States
消息 1289079 - 发表于:28 Sep 2012, 22:55:57 UTC - 回复消息 1289074.  

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.

Ok, then what short of a Nuclear power plant are You powering this PC with? I'm slowly building a 590 version, as quickly as My present income allows that is...


I've daisy-chained 2 PSUs, as I have a surplus of PSUs at my disposal. One Corsair 1200 and one 950W from another manufacturer.

If you're worried about space and what case that will hold all your gear - I recommend opting for a test bench from DimasTech. It will give optimal accessability and plenty of air, and you can decide whether to have the model that holds PSUs, or one that only holds the board and GTXs and then leave the PSUs on the side.


I live in a dusty area and I'm going to water cool My Monster cruncher, I have a HAF-X case and I need to protect the PCs innards from My Cat, She likes cables to chew on... I'm either going to use a 1300w Rosewill Lightning psu and an FSP 450w Booster X5 video card psu for the PC or an EVGA Nex1500, but that 1500 needs to be on 230vac before it truly comes into its own at 136A on the 12v rail and that thing @ $450 isn't inexpensive... As to motherboards I have an Asus Rampage 3 Extreme and an i7 940 cpu, plus the blocks too. I just lack the 590's to equip Her with, plus some other bits.
My Amazon Wishlist
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1289079 · 举报违规帖子
Morten Ross
志愿者测试人员
Avatar

发送消息
已加入:30 Apr 01
贴子:183
积分:385,664,915
近期平均积分:0
Norway
消息 1289074 - 发表于:28 Sep 2012, 22:34:27 UTC - 回复消息 1289062.  

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.

Ok, then what short of a Nuclear power plant are You powering this PC with? I'm slowly building a 590 version, as quickly as My present income allows that is...


I've daisy-chained 2 PSUs, as I have a surplus of PSUs at my disposal. One Corsair 1200 and one 950W from another manufacturer.

If you're worried about space and what case that will hold all your gear - I recommend opting for a test bench from DimasTech. It will give optimal accessability and plenty of air, and you can decide whether to have the model that holds PSUs, or one that only holds the board and GTXs and then leave the PSUs on the side.




Morten Ross
ID: 1289074 · 举报违规帖子
Profile zoom3+1=4
志愿者测试人员
Avatar

发送消息
已加入:30 Nov 03
贴子:63296
积分:55,293,173
近期平均积分:49
United States
消息 1289062 - 发表于:28 Sep 2012, 21:58:25 UTC - 回复消息 1289058.  

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.

Ok, then what short of a Nuclear power plant are You powering this PC with? I'm slowly building a 590 version, as quickly as My present income allows that is...
My Amazon Wishlist
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1289062 · 举报违规帖子
Morten Ross
志愿者测试人员
Avatar

发送消息
已加入:30 Apr 01
贴子:183
积分:385,664,915
近期平均积分:0
Norway
消息 1289058 - 发表于:28 Sep 2012, 21:49:19 UTC - 回复消息 1289009.  

Hi Arvid,

As you can see I have a log time stable 8GPU rig, so I'd say I know what i takes to set up a stable 4xGTX690. This is also not a problem with this 8GPU rig, allthough the hardware is identical.

In this case a new OS installation on a new disk resolved the problem.

It seems I had 2 root causes:

1: a rare malfuntioning i7-3930K (RMA-ed and replaced with new).
2: a bad disk controller on the hard disk itself.

After the new disk/OS was set up the MB CUDA tasks have been 100% stable with 2xGTX 690, so I once again tested with old disk, re-formatted it and reinstalled OS. Problem started when I was installing Windows updates! They simply hung at various and illogical stages. At that point I retired the disk, and perhaps in time as it's from 2007.

As for general instability issues with 3 or more GTX x90 I allways recommend that the manufacturer has a 3 or 4-way SLI certification. Then a single-rail PSU with enough 12V amp for all the cards and then some. If not they need to daisy-chain 2 PSUs. If you run with sufficient ampage, but the board is certified for 2-way SLI and you're running 3-way, you're most likely going to experience instability despite having disabled unused resources in BIOS and disabled HD audio controller, etc i Windows device manager.
Morten Ross
ID: 1289058 · 举报违规帖子
Profile Arvid Almstrom
Avatar

发送消息
已加入:23 Mar 00
贴子:98
积分:137,331,372
近期平均积分:0
Australia
消息 1289009 - 发表于:28 Sep 2012, 20:23:11 UTC - 回复消息 1288484.  

I have read many threads where people have been trying to get 4 x 690's to work and have not succeeded due to hardware resource limits.

In one thread, cant remember who it was sorry, he had been getting custom firmware from nVidia / card manufacturer to try an overcome this problem, but that only killed his card.

Have you tried to remove one of the cards and see if 3 x 690's would work, it would then save you from re-installing Windows.

Just a thought.

Arvid
Arvid Almstrom
ID: 1289009 · 举报违规帖子
Morten Ross
志愿者测试人员
Avatar

发送消息
已加入:30 Apr 01
贴子:183
积分:385,664,915
近期平均积分:0
Norway
消息 1288484 - 发表于:27 Sep 2012, 16:55:24 UTC - 回复消息 1288254.  

There is a known incompatibility between Legacy Cuda 3.0 applications (stock 6.10) and Cuda 5 drivers.

There are several options:
- go back to a Cuda 4.2 driver (Where available)
- Use a third party Cuda 3.2 or newer build
- Use newest Cuda 5 WHQL driver and set a system environment variable, CUDA_GRID_SIZE_COMPAT=1

The last is a workaround to a quality problem found in Cuda 3's CUFFT DLL, that was introduced in release Cuda 5 drivers, after I reported the issue to nVidia. With multibeam v7 on the near horizon, it's unlikely this will be addressed in stock V6 builds.

Jason


Thanks for your suggestions Jason.

Unfortunately it is not that easy.

It also happens on latest 306.23 and latest science app and app_info.

I have now eliminated all components and arrived at a corrupt Windows or BOINC install.

The disk/OS installation previously hosted the same CPU - i7-3930K and motherboard - Rampage IV Formula. I RMA-ed the board due to this CUDA-specific behaviour as well as frequent system freezes.

There was nothing wrong with the board so I RMA-ed the CPU. The CPU was indeed faulty and I got a new one. My first actually, and next time I wil RMA both CPU and board to save lots of time.

When I fired up the system with the new CPU I was really annoyed that the problem was just the same regarding CUDA tasks not hitting the GPU.

I therefore tested CUDA/MB tasks with several backups of BOINC directory, but all failed the same way - with or without App_info/default app.

The last component I did not consider until now was the software/OS, so I set up a new disk and that has resolved the problem, but not the conundrum!

What can become so selectively corrupt on a Windows installation? It was running with a bad CPU, but again, how can it selectively corrupt the CUDA?
Morten Ross
ID: 1288484 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1288254 - 发表于:27 Sep 2012, 0:14:15 UTC - 回复消息 1288249.  

There is a known incompatibility between Legacy Cuda 3.0 applications (stock 6.10) and Cuda 5 drivers.

There are several options:
- go back to a Cuda 4.2 driver (Where available)
- Use a third party Cuda 3.2 or newer build
- Use newest Cuda 5 WHQL driver and set a system environment variable, CUDA_GRID_SIZE_COMPAT=1

The last is a workaround to a quality problem found in Cuda 3's CUFFT DLL, that was introduced in release Cuda 5 drivers, after I reported the issue to nVidia. With multibeam v7 on the near horizon, it's unlikely this will be addressed in stock V6 builds.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1288254 · 举报违规帖子
Morten Ross
志愿者测试人员
Avatar

发送消息
已加入:30 Apr 01
贴子:183
积分:385,664,915
近期平均积分:0
Norway
消息 1288249 - 发表于:27 Sep 2012, 0:01:05 UTC
最近的修改日期:27 Sep 2012, 0:12:44 UTC

Both BOINC and science app correctly discover all GTX 690 GPUs, but computing doesn't event start on GPUs!

BOINC 7.0.28:
26/09/2012 23:51:10 | | NVIDIA GPU 0: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 8382373MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | NVIDIA GPU 1: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1957MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | NVIDIA GPU 2: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1952MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | NVIDIA GPU 3: GeForce GTX 690 (driver version 306.23, CUDA version 5.0, compute capability 3.0, 2048MB, 1957MB available, 3132 GFLOPS peak)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 0: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 8382373MB available)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 1: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1957MB available)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 2: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1952MB available)
26/09/2012 23:51:10 | | OpenCL: NVIDIA GPU 3: GeForce GTX 690 (driver version 306.23, device version OpenCL 1.1 CUDA, 2048MB, 1957MB available)

It assigns to each GPU, but every task on every GPU terminates shortly after it is supposed to load on GPU.

With or without app_info the same happens - the app seemingly sees all is well.

Here stderr from default app:
setiathome_CUDA: Found 4 CUDA device(s):
   Device 1 : GeForce GTX 690 
           totalGlobalMem = -2147483648 
           sharedMemPerBlock = 49152 
           regsPerBlock = 65536 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 1024 
           clockRate = 1019500 
           totalConstMem = 65536 
           major = 3 
           minor = 0 
           textureAlignment = 512 
           deviceOverlap = 1 
           multiProcessorCount = 8 
   Device 2 : GeForce GTX 690 
           totalGlobalMem = -2147483648 
           sharedMemPerBlock = 49152 
           regsPerBlock = 65536 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 1024 
           clockRate = 1019500 
           totalConstMem = 65536 
           major = 3 
           minor = 0 
           textureAlignment = 512 
           deviceOverlap = 1 
           multiProcessorCount = 8 
   Device 3 : GeForce GTX 690 
           totalGlobalMem = -2147483648 
           sharedMemPerBlock = 49152 
           regsPerBlock = 65536 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 1024 
           clockRate = 1019500 
           totalConstMem = 65536 
           major = 3 
           minor = 0 
           textureAlignment = 512 
           deviceOverlap = 1 
           multiProcessorCount = 8 
   Device 4 : GeForce GTX 690 
           totalGlobalMem = -2147483648 
           sharedMemPerBlock = 49152 
           regsPerBlock = 65536 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 1024 
           clockRate = 1019500 
           totalConstMem = 65536 
           major = 3 
           minor = 0 
           textureAlignment = 512 
           deviceOverlap = 1 
           multiProcessorCount = 8 
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 690 is okay
SETI@home using CUDA accelerated device GeForce GTX 690
setiathome_enhanced 6.09 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is :  2.570755
Optimal function choices:
-----------------------------------------------------
name                
-----------------------------------------------------
              v_BaseLineSmooth (no other)
            v_GetPowerSpectrum 0.00010 0.00000 
                   v_ChirpData 0.00581 0.00000 
                  v_Transpose4 0.00163 0.00000 
               FPU opt folding 0.00116 0.00000 

The task initiates on CPU, but as soon as it's assigned to GPU it terminates.

I've set up so many crunchers and never experienced this.

I'm unable to see what exactly is failing, except that MB CUDA is. Astropulse is working normally, but is OpenCL.

The motherboard, and CPU is brand new, and I have no such problem running either Milkyway or Einstein, so what's the tweak here?

Both GTX 690 cards have been busy on another host, successfully crunching thousands of tasks.

PS: no tasks reported until this is resolved.

EDIT:

Even more strange is that the SetiPerformance64.exe and Lunatics_x41g is working without a hitch on all GPUs!
Morten Ross
ID: 1288249 · 举报违规帖子

留言板 : Number crunching : GTX 690 conundrum - only Seti Multibeam don't run on GPUs


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.