GPU task stuck - cannot process anymore GPU work

Message boards : Number crunching : GPU task stuck - cannot process anymore GPU work
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile David@home
Volunteer tester
Avatar

Send message
Joined: 16 Jan 03
Posts: 755
Credit: 5,040,916
RAC: 28
United Kingdom
Message 1888888 - Posted: 9 Sep 2017, 20:39:30 UTC

Many thanks, for first run I put the mid-range card suggestions in the readme file. FIrst such GPU work unit completed OK:

https://setiathome.berkeley.edu/result.php?result_name=10se08ad.30634.8661.7.34.162_1

Is there anyway to check that those command lines switches were actually read OK? I did press the read config files in BOINC manager but would like the comfort factor to confirm something did change.

When I have few work units completed I wil try your recommended command line, but again would like a way to double check that things are being changed so I am comfortable I am doing this right.
ID: 1888888 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1888890 - Posted: 9 Sep 2017, 20:43:14 UTC - in response to Message 1888888.  

You will see them in your stderr results of the return work unit. But since your computers are hidden, only you can check to see if they are there.
ID: 1888890 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1888892 - Posted: 9 Sep 2017, 20:52:06 UTC

You can check stderr.txt in you local slots folder before the task finnished.


With each crime and every kindness we birth our future.
ID: 1888892 · Report as offensive
Profile David@home
Volunteer tester
Avatar

Send message
Joined: 16 Jan 03
Posts: 755
Credit: 5,040,916
RAC: 28
United Kingdom
Message 1888894 - Posted: 9 Sep 2017, 21:15:42 UTC

Thanks Mike and Zalster, found a GPU work unit stderr.txt in my slots folder. It looks good:

Maximum single buffer size set to:192MB
SpikeFind FFT size threshold override set to:2048
TUNE: kernel 1 now has workgroup size of (64,1,4)

So, now to try the recommended cmd line switches from Keith.
ID: 1888894 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1888904 - Posted: 9 Sep 2017, 21:43:12 UTC

If it's a dedicated cruncher
-tt 1500 -hp -period_iterations_num 1 -high_perf -high_prec_timer -sbs 256 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -cpu_lock
in
mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt
found in the project directory.

If it's not a dedicated cruncher, or the system becomes too sluggish, set tt to 600, remove -high_perf and see how responsive the system is then.

Also in the project directory I have an
app_config.xml
file

<app_config>
 <app>
  <name>setiathome_v8</name>
  <gpu_versions>
  <gpu_usage>1.00</gpu_usage>
  <cpu_usage>1.00</cpu_usage>
  </gpu_versions>
 </app>
</app_config>

That reserves 1 CPU core for each GPU WU, but if you run out of GPU work, it will process CPU work again.
Grant
Darwin NT
ID: 1888904 · Report as offensive
Profile David@home
Volunteer tester
Avatar

Send message
Joined: 16 Jan 03
Posts: 755
Credit: 5,040,916
RAC: 28
United Kingdom
Message 1888909 - Posted: 9 Sep 2017, 22:01:34 UTC - in response to Message 1888904.  

Seeing several odd warning messages in the stderr file now:

WARNING: can't open binary kernel file for oclFFT plan: C:\ProgramData\BOINC/projects/setiathome.berkeley.edu\MB_clFFTplan_GeForceGTX750Ti_524288_gr256_lr16_wg256_tw0_ls512_bn32_cw32_r3557.bin_37653, continue with recompile...

See this work unit for an example:

https://setiathome.berkeley.edu/result.php?result_name=10oc08af.2724.81755.13.40.207_1

Is this a problem or can it be ignored?
ID: 1888909 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1888911 - Posted: 9 Sep 2017, 22:05:19 UTC

No worries that`s just because some kernels are created for the first time.
It shouldnt happen on the next tasks.


With each crime and every kindness we birth our future.
ID: 1888911 · Report as offensive
Profile David@home
Volunteer tester
Avatar

Send message
Joined: 16 Jan 03
Posts: 755
Credit: 5,040,916
RAC: 28
United Kingdom
Message 1888912 - Posted: 9 Sep 2017, 22:06:41 UTC - in response to Message 1888911.  

No worries that`s just because some kernels are created for the first time.
It shouldnt happen on the next tasks.


Good news, thanks.
ID: 1888912 · Report as offensive
Profile David@home
Volunteer tester
Avatar

Send message
Joined: 16 Jan 03
Posts: 755
Credit: 5,040,916
RAC: 28
United Kingdom
Message 1888914 - Posted: 9 Sep 2017, 22:11:22 UTC - in response to Message 1888904.  

If it's a dedicated cruncher
-tt 1500 -hp -period_iterations_num 1 -high_perf -high_prec_timer -sbs 256 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -cpu_lock
in
mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt
found in the project directory.

If it's not a dedicated cruncher, or the system becomes too sluggish, set tt to 600, remove -high_perf and see how responsive the system is then.

Also in the project directory I have an
app_config.xml
file

<app_config>
 <app>
  <name>setiathome_v8</name>
  <gpu_versions>
  <gpu_usage>1.00</gpu_usage>
  <cpu_usage>1.00</cpu_usage>
  </gpu_versions>
 </app>
</app_config>

That reserves 1 CPU core for each GPU WU, but if you run out of GPU work, it will process CPU work again.


Thanks Grant,

Will try those tomorrow as it is getting late now. I have seen other posts about reserving a CPU core for the GPU, but been puzzled by it. I assume the GPU work units need a helping hand from the CPU for certain tasks but should we dedicate a whole CPU to this or let it context switch along with all the other Windows background tasks? Dedicating a CPU sounds like losing some useful CPU workunit crunching time?
ID: 1888914 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1888916 - Posted: 9 Sep 2017, 22:20:34 UTC
Last modified: 9 Sep 2017, 22:21:26 UTC

The - tt switch will not help much on a 750 TI.
Just try different values with -period_iterations_num xx, 50 is default.
Decrease to speed up or increase to get rid of screen lags.


With each crime and every kindness we birth our future.
ID: 1888916 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1888917 - Posted: 9 Sep 2017, 22:34:23 UTC - in response to Message 1888916.  
Last modified: 9 Sep 2017, 22:36:13 UTC

The - tt switch will not help much on a 750 TI.

Probably depends on the system.
When I tried SoG on my C2D (32bit OS, 2*GTX 750Tis and 4GB RAM), dropping the high_perf and changing tt to 600 made the system usable, prior to that screen and input lag was 30-60 seconds.
As it says in the sample settings- experimentation required.

It will be interesting to see how David's system does with aggressive settings- plenty of cores, plenty of RAM and plenty of clock speed (compared to my poor old C2D) will more than likely offset such aggressive settings.
Can only try and see how it goes.



David@home
I assume the GPU work units need a helping hand from the CPU for certain tasks but should we dedicate a whole CPU to this or let it context switch along with all the other Windows background tasks? Dedicating a CPU sounds like losing some useful CPU workunit crunching time?

Even with a lowly GTX 750TI, the loss of CPU processing is more than offset by the boost in GPU processing, and it's just 1 core that is used, unless you add more GPUs. And if you run out of GPU work, it will pick up CPU work till you get more GPU work.
It's an issue with the NVidia OpenCL implementation. AFAIK the OpenCL application on AMD cards doesn't take a CPU core to support each GPU WU being processed. However- even the Linux CUDA special application which doesn't use much CPU time at all, gives it's best output when you give it a whole CPU core to support it.
The fact is, the faster you process the GPU work, the more CPU support is needed to keep it fed with WUs.
Grant
Darwin NT
ID: 1888917 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1888921 - Posted: 9 Sep 2017, 22:47:33 UTC

I always make suggestions based on hardware and i also know why it says that in the read me.
We shouldn`t confuse people more than necessary.
I only make suggestions when helpful, otherwise i stay out of disussion.


With each crime and every kindness we birth our future.
ID: 1888921 · Report as offensive
Profile David@home
Volunteer tester
Avatar

Send message
Joined: 16 Jan 03
Posts: 755
Credit: 5,040,916
RAC: 28
United Kingdom
Message 1889268 - Posted: 11 Sep 2017, 16:39:36 UTC

Hi All,

Just a note to say many thanks to everybody who helped me get though the GPU tasks aborting issue and onto tuning my graphics card.

I am amazed at the performance increase this has provided and no doubt helped a lot with the silver badge that I have now earned for my RAC,

Many thanks
ID: 1889268 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : GPU task stuck - cannot process anymore GPU work


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.