NVIDIA GPU CUDA 32 & 42 WU Errors: Task Postponed 180.000000 Sec: CuFFT Plan Failure, Temporary Exit

Questions and Answers : GPU applications : NVIDIA GPU CUDA 32 & 42 WU Errors: Task Postponed 180.000000 Sec: CuFFT Plan Failure, Temporary Exit
Message board moderation

To post messages, you must log in.

AuthorMessage
HankPa Project Donor

Send message
Joined: 12 Mar 13
Posts: 5
Credit: 1,955,592
RAC: 3
United States
Message 1730530 - Posted: 1 Oct 2015, 4:33:28 UTC

Started experiencing this problem with all CUDA 32/42 WUs about a week after upgrading WIN7 Home to WIN10 Home. CUDA 23s, MBs and a very rare AP were all processing normally (with my "NVIDIA G200 kind of normal", that is). Since all CUDA 23s have been processed, I only have CUDA 32/42 varieties left in my queue.
The error conditions are repeating - WU elapsed run time of 15-27 seconds, error, next CUDA 32/42 WU starts and the cycle continues. When this same WU returns, the processing elapsed & remaining times reset to zero.

Have suspended all further GPU processing for CUDA 32/42s.
Reset BOINC --same problem.
Rolled NVIDIA back to 341.74 from 341.81 --same problem. Upgrade to WIN10, NVIDIA GPU went from 341.44 to 341.81.
Have aborted some CUDA 42s, but hated to have done it.
Under WIN7, 1 active core, under WIN10, 2 active cores.

Current sampling of various portions of the event log -
9/29/2015 11:29:31 PM Starting BOINC client version 7.6.9 for windows_x86_64
9/29/2015 11:29:31 PM | log flags: file_xfer, sched_ops, task, cpu_sched, cpu_sched_status
9/29/2015 11:29:31 PM | Libraries: libcurl/7.39.0 OpenSSL/1.0.2a zlib/1.2.8
9/29/2015 11:29:31 PM | Data directory: C:\ProgramData\BOINC
9/29/2015 11:29:31 PM | Running under account Henry
9/29/2015 11:29:36 PM | CUDA: NVIDIA GPU 0: GeForce G200 (driver version 341.74, CUDA version 6.5, compute capability 1.1, 256MB, 144MB available, 53 GFLOPS peak)
9/29/2015 11:29:36 PM | OpenCL: NVIDIA GPU 0: GeForce G200 (driver version 341.74, device version OpenCL 1.0 CUDA, 256MB, 144MB available, 53 GFLOPS peak)
9/29/2015 11:29:36 PM | Host name: Henry-W10PC
9/29/2015 11:29:36 PM | Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU T6500 @ 2.10GHz [Family 6 Model 23 Stepping 10

9/29/2015 11:29:36 PM | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm tm2 pbe
9/29/2015 11:29:36 PM | OS: Microsoft Windows 10: Core x64 Edition, (10.00.10240.00)
9/29/2015 11:29:36 PM | Memory: 7.75 GB physical, 15.50 GB virtual
9/29/2015 11:29:36 PM | Disk: 684.99 GB total, 496.93 GB free
9/29/2015 11:29:36 PM | Local time is UTC -7 hours
9/29/2015 11:29:36 PM | VirtualBox version: 4.3.12

MORE log -
9/30/2015 12:17:44 AM | SETI@home | [css] running 05no11ae.10548.14426.438086664197.12.23_0 ( )
9/30/2015 12:18:01 AM | SETI@home | task postponed 180.000000 sec: CuFFT Plan Failure, temporary exit
9/30/2015 12:18:01 AM | SETI@home | [cpu_sched] Restarting task 05oc11ak.3048.12848.438086664200.12.76_1 using setiathome_v7 version 700 (cuda42) in slot 5
9/30/2015 12:18:01 AM | SETI@home | [css] running 05oc11ak.3048.12848.438086664200.12.76_1 (0.0097 CPUs + 1 NVIDIA GPU)
9/30/2015 12:18:01 AM | SETI@home | [css] running 05oc11ak.3048.12848.438086664200.12.189_0 ( )
9/30/2015 12:18:01 AM | SETI@home | [css] running 05no11ae.10548.14426.438086664197.12.23_0 ( )
9/30/2015 12:18:17 AM | SETI@home | task postponed 180.000000 sec: CuFFT Plan Failure, temporary exit
9/30/2015 12:18:17 AM | SETI@home | [cpu_sched] Restarting task 05no11ae.10548.14426.438086664197.12.57_1 using setiathome_v7 version 700 (cuda42) in slot 6
9/30/2015 12:18:17 AM | SETI@home [css] running 05no11ae.10548.14426.438086664197.12.57_1 (0.0097 CPUs + 1 NVIDIA GPU)
9/30/2015 12:18:17 AM | SETI@home | [css] running 05oc11ak.3048.12848.438086664200.12.189_0 ( )
9/30/2015 12:18:17 AM | SETI@home | [css] running 05no11ae.10548.14426.438086664197.12.23_0 ( )
9/30/2015 12:18:34 AM | SETI@home | task postponed 180.000000 sec: CuFFT Plan Failure, temporary exit

Any suggestions on how to resolve this would be greatly appreciated.
ID: 1730530 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22199
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1730542 - Posted: 1 Oct 2015, 5:05:12 UTC

The log doesn't tell the whole story, what you need is the contents of the sterr output file - fortunately one of your tasks had written on (the rest you had abandoned, so they were empty) I've captured it below so others can see what it said.
The bock of data below repeats until the tasks was abandoned.

Stderr output

<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
<stderr_txt>
d: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 4.20

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is : 10.719985
re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum][0], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file 'c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_fft.cu' in line 21 : out of memory.
A cuFFT plan FAILED, Initiating Boinc temporary exit (180 secs)
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.
Preemptively Acknowledging temporary exit -> Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce G200, 256 MiB, regsPerBlock 8192
computeCap 1.1, multiProcs 2
pciBusID = 3, pciSlotID = 0
clockRate = 1100 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce G200 is okay
SETI@home using CUDA accelerated device GeForce G200
pulsefind: blocks per SM 1 (Pre-Fermi default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1730542 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1730642 - Posted: 1 Oct 2015, 14:41:52 UTC
Last modified: 1 Oct 2015, 14:45:39 UTC

I have sent the CUDA Guru a message about this thread.

For posterity, the CUDA23 Stderr.

Stderr output

<core_client_version>7.4.42</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce G200, 256 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2 
     clockRate = 1100 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce G200 is okay
SETI@home using CUDA accelerated device GeForce G200
pulsefind: blocks per SM 1 (Pre-Fermi default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, Cuda 2.30

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.368790
re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): worker didn't respond to exit request within 2 seconds, exiting anyway.
Cuda threadsafe ExitProcess() initiated, rval 0
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
  Worker Acknowledging exit request, spinning-> boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce G200, 256 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2 
     clockRate = 1100 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce G200 is okay
SETI@home using CUDA accelerated device GeForce G200
pulsefind: blocks per SM 1 (Pre-Fermi default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully
Restarted at 47.60 percent, with Lunatics x41zc, Cuda 2.30
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
  Worker Acknowledging exit request, spinning-> boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): worker didn't respond to exit request within 2 seconds, exiting anyway.
Cuda threadsafe ExitProcess() initiated, rval 0
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce G200, 256 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2 
     clockRate = 1100 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce G200 is okay
SETI@home using CUDA accelerated device GeForce G200
pulsefind: blocks per SM 1 (Pre-Fermi default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully
Restarted at 65.28 percent, with Lunatics x41zc, Cuda 2.30
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.

Flopcounter: 54918334103992.250000

Spike count:    5
Autocorr count: 0
Pulse count:    4
Triplet count:  0
Gaussian count: 0
Worker preemptively acknowledging a normal exit.->
called boinc_finish
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0

</stderr_txt>
]]>


Just in case the WU gets removed before Jason gets here.

ID: 1730642 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1730660 - Posted: 1 Oct 2015, 15:33:18 UTC

If those CUFFT plans fail, you are likely running out of video memory, in a way that bothers the CUFFT library's internally managed resources (these aren't in the scope of the app code, but the Cuda libraries)

You will probably need to ensure you free as much VRAM as possible, which can be tricky with Cuda 3.2 onwards, because those libraries do tend to bloat to use more, with Cuda version.

The other option is to force use of the Cuda23 build by using the Lunatics Installer.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1730660 · Report as offensive
HankPa Project Donor

Send message
Joined: 12 Mar 13
Posts: 5
Credit: 1,955,592
RAC: 3
United States
Message 1731655 - Posted: 3 Oct 2015, 23:28:40 UTC

Thanks for the information Jason --
My limited technical knowledge to sufficiently understand your explanations may be on the order of you discussing Apples while I'm discussing Oranges (could possibly be an LOL, too).

In regards to option one, "free as much VRAM as possible," the NVIDIA GeForce 9400 (G200 chip type) adapter has these allocated memory MB amounts: 4063-total, 224-dedicated, 64-Sys video and 3775-shared sys. I do not know of a way to alter, or otherwise dumb-down the monitor's color management or display settings that would change or free VRAM. Likewise, System BIOS is locked or otherwise inaccessible, and I am not sure that adjustable BIOS settings would alter anything considering the hardware limits. From what I've been able to ascertain, the adapter is neither replaceable with one more capable or upgradable with additional ram.

In regards to using the Lunatics Installer option, the same lack of technical knowledge may hinder my using or installing it without considerable assistance. I've read the latest v0.43b installer instructions more than once, but the "fog" remains. I am convinced though that tinkering with the settings/instruction sets can raise havoc. Additionally, any benefit gained may not significantly alter the limitation of the hardware configuration.

I am perfectly satisfied to process CUDA23s, MBs and an occasional AP. With the additional CPU core that WIN10 opened, I clunk along fairly well using both cores for processing. In addition, it would be preferable to be able to make use of the graphics adapter.

Again, any suggestions offered by anyone with the knowledge would be greatly appreciated.
ID: 1731655 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1731905 - Posted: 4 Oct 2015, 19:44:17 UTC - in response to Message 1731655.  

In regards to using the Lunatics Installer option, the same lack of technical knowledge may hinder my using or installing it without considerable assistance. I've read the latest v0.43b installer instructions more than once, but the "fog" remains.

- You may be less concerned if you see the screenshots beforehand:
http://prikachi.com/gallery_view.php?user=18390&gal=17578

(They are from older installer but the new looks similar)

- You may stop the install at any time - no changes will be made until you press the last [Install] button:
http://prikachi.com/images/819/7531819a.png


I am perfectly satisfied to process CUDA23s

And you can choose this on the NVIDIA page:
http://prikachi.com/images/817/7531817u.png
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1731905 · Report as offensive
HankPa Project Donor

Send message
Joined: 12 Mar 13
Posts: 5
Credit: 1,955,592
RAC: 3
United States
Message 1731916 - Posted: 4 Oct 2015, 20:49:07 UTC

Thank you Sir!
I'll get into your instructions later, but first Yard duties (mowing, trimming, edging, and the like) will occupy my energies for most of day. Woe is me!
..
ID: 1731916 · Report as offensive

Questions and Answers : GPU applications : NVIDIA GPU CUDA 32 & 42 WU Errors: Task Postponed 180.000000 Sec: CuFFT Plan Failure, Temporary Exit


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.