|
Ok, i´m on XP so the path is little diferent but i foud.
WHen runs this is the oputput:
Starting automatic test: (lunatics_x41z_win32_cuda42.exe)
26 August 2012 - 09:17:24 Start, devices: 1, device count: 1
26 August 2012 - 09:19:06 Runtime: Device: 0, count: 0, 101 seconds
26 August 2012 - 09:19:06 Device: 0, finished
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 1, average time / count: 101, average time on device: 101 Seconds (1 Minutes, 41 Seconds)
Next ---------------------------------------------------------------------
26 August 2012 - 09:19:06 Start, devices: 1, device count: 2
26 August 2012 - 09:22:31 Runtime: Device: 0, count: 1, 202 seconds
26 August 2012 - 09:22:31 Runtime: Device: 0, count: 0, 202 seconds
26 August 2012 - 09:22:31 Device: 0, finished
26 August 2012 - 09:22:31 Device: 0, finished
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 2, average time / count: 202, average time on device: 101 Seconds (1 Minutes, 41 Seconds)
The best average time found: 101 Seconds (1 Minutes, 41 Seconds), with count: 1.00 (1)
Then on slot 0
an't open init data file - running in standalone mode
Can't open init data file - running in standalone mode
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 590, 1535 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 16
pciBusID = 3, pciSlotID = 0
clockRate = 1260 MHz
Device 2: GeForce GTX 590, 1535 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 16
pciBusID = 4, pciSlotID = 0
clockRate = 1260 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 590 is okay
SETI@home using CUDA accelerated device GeForce GTX 590
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully
setiathome enhanced x41z, Cuda 4.20
Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is : 0.775000
VRAM: cudaMalloc((void**) &dev_cx_DataArray, 1048576x 8bytes = 8388608bytes, offs256=0, rtotal= 8388608bytes
VRAM: cudaMalloc((void**) &dev_cx_ChirpDataArray, 1179648x 8bytes = 9437184bytes, offs256=0, rtotal= 17825792bytes
VRAM: cudaMalloc((void**) &dev_flag, 1x 8bytes = 8bytes, offs256=0, rtotal= 17825800bytes
VRAM: cudaMalloc((void**) &dev_WorkData, 1179648x 8bytes = 9437184bytes, offs256=0, rtotal= 27262984bytes
VRAM: cudaMalloc((void**) &dev_PowerSpectrum, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 31457288bytes
VRAM: cudaMalloc((void**) &dev_t_PowerSpectrum, 1048584x 4bytes = 1048608bytes, offs256=0, rtotal= 32505896bytes
VRAM: cudaMalloc((void**) &dev_GaussFitResults, 1048576x 16bytes = 16777216bytes, offs256=0, rtotal= 49283112bytes
VRAM: cudaMalloc((void**) &dev_PoT, 1572864x 4bytes = 6291456bytes, offs256=0, rtotal= 55574568bytes
VRAM: cudaMalloc((void**) &dev_PoTPrefixSum, 1572864x 4bytes = 6291456bytes, offs256=0, rtotal= 61866024bytes
VRAM: cudaMalloc((void**) &dev_NormMaxPower, 16384x 4bytes = 65536bytes, offs256=0, rtotal= 61931560bytes
VRAM: cudaMalloc((void**) &dev_flagged, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 66125864bytes
VRAM: cudaMalloc((void**) &dev_outputposition, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 70320168bytes
VRAM: cudaMalloc((void**) &dev_PowerSpectrumSumMax, 262144x 12bytes = 3145728bytes, offs256=0, rtotal= 73465896bytes
VRAM: cudaMallocArray( &dev_gauss_dof_lcgf_cache, 1x 8192bytes = 8192bytes, offs256=184, rtotal= 73474088bytes
VRAM: cudaMallocArray( &dev_null_dof_lcgf_cache, 1x 8192bytes = 8192bytes, offs256=0, rtotal= 73482280bytes
VRAM: cudaMalloc((void**) &dev_find_pulse_flag, 1x 8bytes = 8bytes, offs256=0, rtotal= 73482288bytes
VRAM: cudaMalloc((void**) &dev_t_funct_cache, 1966081x 4bytes = 7864324bytes, offs256=0, rtotal= 81346612bytes
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Flopcounter: 12287736759182.770000
Spike count: 13
Pulse count: 1
Triplet count: 0
Gaussian count: 0
Worker preemptively acknowledging a normal exit.->
called boinc_finish
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
on slot1
Can't open init data file - running in standalone mode
Can't open init data file - running in standalone mode
Can't open init data file - running in standalone mode
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 590, 1535 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 16
pciBusID = 3, pciSlotID = 0
clockRate = 1260 MHz
Device 2: GeForce GTX 590, 1535 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 16
pciBusID = 4, pciSlotID = 0
clockRate = 1260 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 590 is okay
SETI@home using CUDA accelerated device GeForce GTX 590
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully
setiathome enhanced x41z, Cuda 4.20
Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is : 0.775000
VRAM: cudaMalloc((void**) &dev_cx_DataArray, 1048576x 8bytes = 8388608bytes, offs256=0, rtotal= 8388608bytes
VRAM: cudaMalloc((void**) &dev_cx_ChirpDataArray, 1179648x 8bytes = 9437184bytes, offs256=0, rtotal= 17825792bytes
VRAM: cudaMalloc((void**) &dev_flag, 1x 8bytes = 8bytes, offs256=0, rtotal= 17825800bytes
VRAM: cudaMalloc((void**) &dev_WorkData, 1179648x 8bytes = 9437184bytes, offs256=0, rtotal= 27262984bytes
VRAM: cudaMalloc((void**) &dev_PowerSpectrum, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 31457288bytes
VRAM: cudaMalloc((void**) &dev_t_PowerSpectrum, 1048584x 4bytes = 1048608bytes, offs256=0, rtotal= 32505896bytes
VRAM: cudaMalloc((void**) &dev_GaussFitResults, 1048576x 16bytes = 16777216bytes, offs256=0, rtotal= 49283112bytes
VRAM: cudaMalloc((void**) &dev_PoT, 1572864x 4bytes = 6291456bytes, offs256=0, rtotal= 55574568bytes
VRAM: cudaMalloc((void**) &dev_PoTPrefixSum, 1572864x 4bytes = 6291456bytes, offs256=0, rtotal= 61866024bytes
VRAM: cudaMalloc((void**) &dev_NormMaxPower, 16384x 4bytes = 65536bytes, offs256=0, rtotal= 61931560bytes
VRAM: cudaMalloc((void**) &dev_flagged, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 66125864bytes
VRAM: cudaMalloc((void**) &dev_outputposition, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 70320168bytes
VRAM: cudaMalloc((void**) &dev_PowerSpectrumSumMax, 262144x 12bytes = 3145728bytes, offs256=0, rtotal= 73465896bytes
VRAM: cudaMallocArray( &dev_gauss_dof_lcgf_cache, 1x 8192bytes = 8192bytes, offs256=208, rtotal= 73474088bytes
VRAM: cudaMallocArray( &dev_null_dof_lcgf_cache, 1x 8192bytes = 8192bytes, offs256=56, rtotal= 73482280bytes
VRAM: cudaMalloc((void**) &dev_find_pulse_flag, 1x 8bytes = 8bytes, offs256=0, rtotal= 73482288bytes
VRAM: cudaMalloc((void**) &dev_t_funct_cache, 1966081x 4bytes = 7864324bytes, offs256=0, rtotal= 81346612bytes
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Flopcounter: 12287736759182.770000
Spike count: 13
Pulse count: 1
Triplet count: 0
Gaussian count: 0
Worker preemptively acknowledging a normal exit.->
called boinc_finish
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 590, 1535 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 16
pciBusID = 3, pciSlotID = 0
clockRate = 1260 MHz
Device 2: GeForce GTX 590, 1535 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 16
pciBusID = 4, pciSlotID = 0
clockRate = 1260 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 590 is okay
SETI@home using CUDA accelerated device GeForce GTX 590
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully
setiathome enhanced x41z, Cuda 4.20
Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is : 0.775000
VRAM: cudaMalloc((void**) &dev_cx_DataArray, 1048576x 8bytes = 8388608bytes, offs256=0, rtotal= 8388608bytes
VRAM: cudaMalloc((void**) &dev_cx_ChirpDataArray, 1179648x 8bytes = 9437184bytes, offs256=0, rtotal= 17825792bytes
VRAM: cudaMalloc((void**) &dev_flag, 1x 8bytes = 8bytes, offs256=0, rtotal= 17825800bytes
VRAM: cudaMalloc((void**) &dev_WorkData, 1179648x 8bytes = 9437184bytes, offs256=0, rtotal= 27262984bytes
VRAM: cudaMalloc((void**) &dev_PowerSpectrum, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 31457288bytes
VRAM: cudaMalloc((void**) &dev_t_PowerSpectrum, 1048584x 4bytes = 1048608bytes, offs256=0, rtotal= 32505896bytes
VRAM: cudaMalloc((void**) &dev_GaussFitResults, 1048576x 16bytes = 16777216bytes, offs256=0, rtotal= 49283112bytes
VRAM: cudaMalloc((void**) &dev_PoT, 1572864x 4bytes = 6291456bytes, offs256=0, rtotal= 55574568bytes
VRAM: cudaMalloc((void**) &dev_PoTPrefixSum, 1572864x 4bytes = 6291456bytes, offs256=0, rtotal= 61866024bytes
VRAM: cudaMalloc((void**) &dev_NormMaxPower, 16384x 4bytes = 65536bytes, offs256=0, rtotal= 61931560bytes
VRAM: cudaMalloc((void**) &dev_flagged, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 66125864bytes
VRAM: cudaMalloc((void**) &dev_outputposition, 1048576x 4bytes = 4194304bytes, offs256=0, rtotal= 70320168bytes
VRAM: cudaMalloc((void**) &dev_PowerSpectrumSumMax, 262144x 12bytes = 3145728bytes, offs256=0, rtotal= 73465896bytes
VRAM: cudaMallocArray( &dev_gauss_dof_lcgf_cache, 1x 8192bytes = 8192bytes, offs256=208, rtotal= 73474088bytes
VRAM: cudaMallocArray( &dev_null_dof_lcgf_cache, 1x 8192bytes = 8192bytes, offs256=56, rtotal= 73482280bytes
VRAM: cudaMalloc((void**) &dev_find_pulse_flag, 1x 8bytes = 8bytes, offs256=0, rtotal= 73482288bytes
VRAM: cudaMalloc((void**) &dev_t_funct_cache, 1966081x 4bytes = 7864324bytes, offs256=0, rtotal= 81346612bytes
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Flopcounter: 12287736759182.770000
Spike count: 13
Pulse count: 1
Triplet count: 0
Gaussian count: 0
Worker preemptively acknowledging a normal exit.->
called boinc_finish
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
no work was done for the 2 GPU or in the others slots.
maybe?: Can't open init data file - running in standalone mode
Any clue?
Sorry to take your time.
____________
|
Any clue?
Set devices to 2 and try again.
It's one board but with 2 devices.
For comparison I like to see the automatic test log.
I have a 590 on a Win 7 system.
Hi this is what you need?
Starting automatic test: (lunatics_x41z_win32_cuda42.exe)
26 August 2012 - 13:28:26 Start, devices: 2, device count: 1
26 August 2012 - 13:30:23 Runtime: Device: 0, count: 0, 114 seconds
26 August 2012 - 13:30:23 Device: 0, finished
26 August 2012 - 13:30:23 Runtime: Device: 1, count: 0, 114 seconds
26 August 2012 - 13:30:23 Device: 1, finished
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 1, average time / count: 114, average time on device: 114 Seconds (1 Minutes, 54 Seconds)
Device: 1, device count: 1, average time / count: 114, average time on device: 114 Seconds (1 Minutes, 54 Seconds)
Next ---------------------------------------------------------------------
26 August 2012 - 13:30:25 Start, devices: 2, device count: 2
26 August 2012 - 13:33:34 Runtime: Device: 1, count: 1, 184 seconds
26 August 2012 - 13:33:34 Device: 1, finished
26 August 2012 - 13:33:48 Runtime: Device: 1, count: 0, 198 seconds
26 August 2012 - 13:33:48 Device: 1, finished
26 August 2012 - 13:34:02 Runtime: Device: 0, count: 0, 212 seconds
26 August 2012 - 13:34:02 Device: 0, finished
26 August 2012 - 13:34:02 Runtime: Device: 0, count: 1, 212 seconds
26 August 2012 - 13:34:02 Device: 0, finished
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 2, average time / count: 212, average time on device: 106 Seconds (1 Minutes, 46 Seconds)
Device: 1, device count: 2, average time / count: 191, average time on device: 95 Seconds (1 Minutes, 35 Seconds)
Next ---------------------------------------------------------------------
26 August 2012 - 13:34:03 Start, devices: 2, device count: 3
26 August 2012 - 13:34:49 Runtime: Device: 1, count: 2, 38 seconds
26 August 2012 - 13:34:49 ERROR: Device: 1, finished
26 August 2012 - 13:37:57 Runtime: Device: 1, count: 1, 226 seconds
26 August 2012 - 13:37:57 Device: 1, finished
26 August 2012 - 13:37:57 Runtime: Device: 1, count: 0, 226 seconds
26 August 2012 - 13:37:57 Device: 1, finished
26 August 2012 - 13:38:49 Runtime: Device: 0, count: 0, 278 seconds
26 August 2012 - 13:38:49 Device: 0, finished
26 August 2012 - 13:39:17 Runtime: Device: 0, count: 2, 306 seconds
26 August 2012 - 13:39:17 Device: 0, finished
26 August 2012 - 13:39:17 Runtime: Device: 0, count: 1, 306 seconds
26 August 2012 - 13:39:17 Device: 0, finished
Ready ---------------------------------------------------------------------
Results:
Device: 0, device count: 3, average time / count: 296, average time on device: 98 Seconds (1 Minutes, 38 Seconds)
Device: 1, device count: 3, average time / count: 163, average time on device: 54 Seconds (0 Minutes, 54 Seconds)
Next ---------------------------------------------------------------------
26 August 2012 - 13:39:19 Start, devices: 2, device count: 4
____________
|
|
This are my results:
Both tests on Win 7 Pro 32b.
I7-3770 (3.5Ghz constant with turbo boosted) on z68 chipset. BOINC full stopped.
Driver 301.42
GTX680 Default Clock (core at 1111Mhz -full boost-, Mem at 3004Mhz) Pcie3.016x
Device: 0, device count: 1, average time / count: 170, average time on device: 170 Seconds (2 Minutes, 50 Seconds)
Device: 0, device count: 2, average time / count: 244, average time on device: 122 Seconds (2 Minutes, 2 Seconds)
Device: 0, device count: 3, average time / count: 355, average time on device: 118 Seconds (1 Minutes, 58 Seconds)
Device: 0, device count: 4, average time / count: 477, average time on device: 119 Seconds (1 Minutes, 59 Seconds)
The best average time found: 118 Seconds (1 Minutes, 58 Seconds), with count: 0.33 (3)
GTX680 OC'ed (core at 1215Mhz -full boost-, Mem at 3105Mhz) Pcie3.016x
Device: 0, device count: 1, average time / count: 163, average time on device: 163 Seconds (2 Minutes, 43 Seconds)
Device: 0, device count: 2, average time / count: 235, average time on device: 117 Seconds (1 Minutes, 57 Seconds)
Device: 0, device count: 3, average time / count: 340, average time on device: 113 Seconds (1 Minutes, 53 Seconds)
Device: 0, device count: 4, average time / count: 456, average time on device: 114 Seconds (1 Minutes, 54 Seconds)
The best average time found: 113 Seconds (1 Minutes, 53 Seconds), with count: 0.33 (3)
Weird thing... my gtx680 seems to be slower than others, even OC'ed... And nothing else running on the host other than MSI afterburner and Process Laso to set the apps at normal priority...
____________
|