Need help with this one

Message boards : Number crunching : Need help with this one
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617002 - Posted: 21 Dec 2014, 17:36:35 UTC

So my computer has stopped crunching and I get the following error on the Boinc manager task


(scheduler wait:Cuda runtime, memory related failure, theadsafe temporary Exit)

Tried turning off the computer and then restarting but the messages continue.

Any ideas?

Zalster
ID: 1617002 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1617003 - Posted: 21 Dec 2014, 17:45:39 UTC - in response to Message 1617002.  

Reading older threads on this warning, try reseating the videocard first.
ID: 1617003 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617008 - Posted: 21 Dec 2014, 18:01:22 UTC - in response to Message 1617003.  
Last modified: 21 Dec 2014, 18:06:56 UTC

Ok,

found this. Now to decipher it slot 0


setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1071 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1071 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1071 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
ID: 1617008 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617010 - Posted: 21 Dec 2014, 18:07:08 UTC - in response to Message 1617008.  

slot 6

setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1071 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 2
setiathome_CUDA: CUDA Device 2 specified, checking...
Device 2: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1123 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 2
setiathome_CUDA: CUDA Device 2 specified, checking...
Device 2: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1123 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
ID: 1617010 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1617011 - Posted: 21 Dec 2014, 18:07:58 UTC - in response to Message 1617008.  

Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument

Link to the Wu please.

Possibly related to:

WUs with very unusual Autocorr parameters

Yesterday I got 4 tasks from the 10fe09ab.22806.* splitter run with short estimates, today 2 more from 10fe09ab.1847.* . All have

<autocorr_thresh>1000</autocorr_thresh>
<autocorr_per_spectrum>0</autocorr_per_spectrum>
<autocorr_fftlen>8</autocorr_fftlen>

rather than the usual settings. I suspect that's an accident.

The splitter estimate has a term which depends on the autocorr_fftlen, and doing DCTs at those short lengths will actually be a lot faster, so the reduced estimates may be OK.

In theory, all app versions ought to be able to handle the unusual processing, but probably none have actually been previously tested that way so there may be surprises. Beta testing is often interesting.

I'll ask Eric to comment if he has time.


Claggy
ID: 1617011 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617012 - Posted: 21 Dec 2014, 18:08:13 UTC - in response to Message 1617010.  

slot 9

setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1071 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1071 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 3 CUDA device(s):
Device 1: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 5, pciSlotID = 0
Device 2: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 1, pciSlotID = 0
Device 3: GeForce GTX 780, 3072 MiB, regsPerBlock 65536
computeCap 3.5, multiProcs 12
pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 0k elements.
Work Unit Info:
...............
WU true angle range is : 0.545577

Kepler GPU current clockRate = 1071 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 256 bytes
re-using dev_GaussFitResults+32x8 array for dev_AutoCorrOut, 256 bytes
Thread call stack limit is: 1k
Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu, line 200: invalid configuration argument
Exiting
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Cuda sync'd & freed.
Preemptively acknowledging a safe temporary exit->
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
ID: 1617012 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1617016 - Posted: 21 Dec 2014, 18:14:53 UTC - in response to Message 1617008.  

So, which machine is this on? The one with three GTX 780 appears to be host 7135571, but that one has no work in progress on this project (not even any reported today).

I'm asking, because the symptom

Error on launch (ac_reducePartial<<<grid3, block3,blksize*sizeof(float3)>>>( (float *)dev_AutoCorrIn, dev_ac_partials )),
file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_autocorr.cu,
line 200: invalid configuration argument

possibly matches Beta message 53309 by Joe Segur - in which case, it might have been data related.

Link to one of the WUs in question, please?
ID: 1617016 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617017 - Posted: 21 Dec 2014, 18:18:08 UTC - in response to Message 1617016.  

Sorry,

Yes it is the 780 but over on Beta. I was running it there since things here went bonkers. Sorry at work and going back and forth, I'll see if I can add a link

Zalster
ID: 1617017 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617018 - Posted: 21 Dec 2014, 18:20:12 UTC - in response to Message 1617017.  
Last modified: 21 Dec 2014, 18:21:36 UTC

I tried to reset the project and send new one, before that I had 1 go invalid

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=6781709

Since then, the ones that I downloaded have also stalled

Edit..

Here is link to new work unit

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=6782596
ID: 1617018 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617020 - Posted: 21 Dec 2014, 18:24:43 UTC - in response to Message 1617018.  

I didn't see Joe's message until right now. Interesting but this is over my head. hmm
ID: 1617020 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617022 - Posted: 21 Dec 2014, 18:34:16 UTC - in response to Message 1617020.  
Last modified: 21 Dec 2014, 18:37:23 UTC

Ok,

Now the machine with the 980s is doing it as well.

That make me think it's the Multibeams over at Beta that are the problem.

I'm going to suspend them and try MB here on main and if those work then I know 100% what the issue is.

Thanks Everyone


Zalster

Edit..

Yup that is it.. MB here on Main working just fine. GPUs kicking in and ramping up.
ID: 1617022 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1617026 - Posted: 21 Dec 2014, 18:40:45 UTC - in response to Message 1617022.  
Last modified: 21 Dec 2014, 18:48:21 UTC

Well, it's the same tape as Joe's report, and - more crucially - it has the same autocorr configuration:

<analysis_cfg>
<spike_thresh>24</spike_thresh>
<spikes_per_spectrum>1</spikes_per_spectrum>
<autocorr_thresh>1000</autocorr_thresh>
<autocorr_per_spectrum>0</autocorr_per_spectrum>
<autocorr_fftlen>8</autocorr_fftlen>
<gauss_null_chi_sq_thresh>2.9097061157227</gauss_null_chi_sq_thresh>
...

Since it doesn't seem likely that these tasks are going to finish and report cleanly, could you please:

Move the discussion to Joe's thread at Beta
Post that std_err you gathered manually
Switch to a newer version of BOINC which preserves std_err in the report if you have to abort the tasks
See if you can wake Jason up and point him to these reports. He needs to chat with Joe and Eric to find out if these configurations are a mistake, or something he needs to prepare a new application for.

Edit - re my third point (newer version of BOINC) - any version 7.3.14 or later, where "client: read stderr file if abort non-running job." came in. Personally, I'd strongly advise v7.4.36 - the current alpha release candidate.
ID: 1617026 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1617035 - Posted: 21 Dec 2014, 18:46:32 UTC - in response to Message 1617026.  

Ok, I'll get started on some of those.

Wake Jason up? lol I'd need his cell phone number for that, but I'll see what I can do.


Zalster
ID: 1617035 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1617037 - Posted: 21 Dec 2014, 18:48:55 UTC - in response to Message 1617035.  

(see edit to previous re the BOINC version)
ID: 1617037 · Report as offensive

Message boards : Number crunching : Need help with this one


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.