Posts by Harri Liljeroos


log in
1) Message boards : Number crunching : BOINC does not identify GPUs correctly (Message 1816606)
Posted 19 days ago by Harri Liljeroos
I won't worry. I just have to remember it for future reference. :)
2) Message boards : Number crunching : BOINC does not identify GPUs correctly (Message 1816604)
Posted 19 days ago by Harri Liljeroos
I have similar situation. My GPUs are factory overclocked. The newer GTX970 has a bit higher clock than the older and Boinc sees the faster as GPU0.
3) Message boards : Number crunching : BOINC does not identify GPUs correctly (Message 1816339)
Posted 19 days ago by Harri Liljeroos
I am not using the latest driver but version 361.91 (same as when GTX650 Ti was in use). I reinstalled it though after changing the GPU. Anyway it is good to know if you have to debug a system that what Boinc shows you may not be the whole truth.
4) Message boards : Number crunching : BOINC does not identify GPUs correctly (Message 1816178)
Posted 20 days ago by Harri Liljeroos
Boinc identifies GPU0 and GPU1 incorrectly.

Here's the story what I see: I used to have on one of my hosts two Nvidia GPUs a GTX970 (GPU0) and a GTX650 Ti (GPU1). Boinc, GPU-Z and Nvidia Inspector agreed which one was GPU0 and which GPU1. Then this week I bought a new GTX970 which is identical to the old GTX970 (Asus Strix-GTX970-DC2OC-4GD5) and replaced the GTX 650 Ti with the new GTX970. Now Boinc disagrees with GPU-Z and Nvidia Inspector about which one is GPU0 and which is GPU1.

Below is a link to a picture of a situation when one GPU is starting a new task. Boinc shows that GPU1 (d1) is starting the new task but Nvidia Inspector shows that it is GPU0 which has the load at 0%.

Does anybody else see similar situation on a host that has two identical GPUs?

https://1drv.ms/i/s!AsrPYtj_-FTpgZRD_5gO3nNmY_e3dQ
5) Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help (Message 1805930)
Posted 30 Jul 2016 by Harri Liljeroos
New set of builds (r3500) available here: https://cloud.mail.ru/public/LJ8s/c3WyRR8ip
Please test


I don't see much difference to the previous version. Slightly more lags but not too bad. When running with defaults same situation. When I freed two CPU cores and tried with a minimal commandline:
-sbs 1024 -hp -instances_per_device 1 -cpu_lock

I got a driver restart. Here's the WU where it happened: http://setiathome.berkeley.edu/result.php?resultid=5065787291
6) Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help (Message 1805568)
Posted 29 Jul 2016 by Harri Liljeroos
The test with freeing a CPU core for feeding the GPUs resulted the already known conclusion: one free CPU core is not enough for two GPUs to maximize the GPU load. For best result you should free one CPU core for each GPU (while running one WU/GPU at a time). This is for my host, but YMMV. Now I'm back to using my old app_config and commnadline parameters.


I would recommend to try also -hp -cpu_lock addition to your tuning line to improve GPU load with busy CPU.


The -hp was there already but I'm adding the -cpu_lock now. So new commandline is:
-sbs 1024 -use_sleep -hp -period_iterations_num 5 -instances_per_device 1 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 -tt 75 -cpu_lock


The -sbs 1024 applies to GTX970 only as the application seems to limit that value to 25% of GPU's memory. So GTX 650Ti only uses -sbs 256.
7) Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help (Message 1805554)
Posted 29 Jul 2016 by Harri Liljeroos
The test with freeing a CPU core for feeding the GPUs resulted the already known conclusion: one free CPU core is not enough for two GPUs to maximize the GPU load. For best result you should free one CPU core for each GPU (while running one WU/GPU at a time). This is for my host, but YMMV. Now I'm back to using my old app_config and commnadline parameters.
8) Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help (Message 1805323)
Posted 28 Jul 2016 by Harri Liljeroos
And what about running with defaults?

Today I have started running with default settings, so no commandline and lately also without app_config. I am positively surprised how well the host is behaving. The load on GPUs is of course fluctuating as there are no free CPU cores to feed them constantly but I have no big lags or driver restarts. On CPU there is running 6 setiathome_v8 tasks and 2 CPDN wah2_eu25 tasks so CPU load is constantly at 100%.

I can also use the computer for email, surfing the net and I could view the new Seti video from Youtube without a problem (Firefox HW-acceleration is turned off).

I am using the r3486 release with date 19.7.2016 on the exe.

Here are a few WUs I have crunched so far with default settings:
http://setiathome.berkeley.edu/result.php?resultid=5063178279 a Guppi VLAR on GTX970.
http://setiathome.berkeley.edu/result.php?resultid=5063162557 normal Arecibo WU on GTX970.
http://setiathome.berkeley.edu/result.php?resultid=5063120898 normal Arecibo WU on GTX650Ti.
http://setiathome.berkeley.edu/result.php?resultid=5063178278 a Guppi VLAR on GTX650Ti.

So good job Raistmer on this application.

(edit)Next I try and free one CPU core and see what that does.(/edit)
9) Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help (Message 1805206)
Posted 27 Jul 2016 by Harri Liljeroos
Well my experience without the -use_sleep parameter was a driver restart within 5 minutes of starting the application. So I was forced to go back to -use_sleep. The application does use a lot less CPU compared to previous versions. I am currently running with -period_iterations_num 5 with no lag on this host: http://setiathome.berkeley.edu/results.php?hostid=7043787, decreasing it to 3 and lags will appear. The host has two GPUs: a GTX650Ti and a GTX970 both running one seti task at a time with three CPU cores reserved for GPU feeding. The host's GPUs do also Einstein WUs one or two at a time depending on how BOINC is scheduling them (seti share is 45 and einstein is 25).
10) Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help (Message 1802989)
Posted 16 Jul 2016 by Harri Liljeroos
Please test this build: https://cloud.mail.ru/public/9ifo/XQTdqmfJJ
Look for CPU usage vs r3486.


http://setiathome.berkeley.edu/result.php?resultid=5042270028 Here's one that was crunched with this version. Stderr misses the normal output of used parameters but includes a lot of other other information. The CPU use don't seem to be any different to the previous r3486 version. The parameters used were:
-sbs 384 -use_sleep -hp -instances_per_device 1 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32


It was run on my GTX970.

In general all these SoG applications are very sensitive to the CPU load. If it reaches 100% the driver reset is bound to happen although I have now the TdrDelay set at 8 seconds. So I have reserved 1.6 CPU cores per MB task because this is also my daily driver so extra head room is required. I will set up now the non-Boinc use limit to 18% to ease the effect of other running applications.
11) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1799316)
Posted 29 Jun 2016 by Harri Liljeroos
There is new set of RC builds available here: https://cloud.mail.ru/public/8RM1/LMYTwvGYp
If you experience any issues with v8.12 please try new RC build instead.


Here's a WU done with the new app. http://setiathome.berkeley.edu/result.php?resultid=5010274940

The output on this task looks a lot different than before, can't even see which GPU was used. I see quite a lot of these.

On the other hand here's another task that the output looks normal: http://setiathome.berkeley.edu/result.php?resultid=5010142578
12) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1798223)
Posted 23 Jun 2016 by Harri Liljeroos
Harri wrote:
I will reduce the -period_iterations_num value until the driver restarts stop.

Double-check that.

Raistmer wrote:
Param should be increased not reduced. Default is 50. Task you list has 30. It can be the reason of driver restart. Try to set it let say to 100.


Ah a logical typo, of course I ment increase. So far with value 35 I've had no restarts.

I'm off to celebrate mid-summer so next comments from me on sunday or monday. I'll leave the computer crunching.
13) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1798189)
Posted 23 Jun 2016 by Harri Liljeroos
That's the plan. I will reduce the -period_iterations_num value until the driver restarts stop.

For the TDR registry, should I specify only the TdrDelay value or should I create all of those mentioned on the Microsoft page you linked? Currently I cannot find any of those in my registry.
14) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1798149)
Posted 23 Jun 2016 by Harri Liljeroos

- The SoG application does not terminate if you stop Boinc, you have to kill the process manually to recover the WU, otherwise it's locked up and cannot be accessed when you restart Boinc

BOINC restart was done after driver restart or in usual conditions?


Boinc restart was done an hour and half after driver restart when I came to the computer this morning and noticing the situation. Shutting down the Boinc resulted a message (from BoincTasks) that Boinc could not be stopped. But Boinc had actually stopped when I looked at TaskManager but one SoG application was still running. So I killed the SoG process and restarted Boinc. Then the stuck WU was resumed till end with the other GPU (970).

Edit: When the WU was stuck the elapsed time was running, it was showing about 1,5 hours when Boinc was stopped. After restart the elapsed time was showing about 10 minutes, so the time that was ticking while the application was stuck got lost.

I have witnessed the same before but then I didn't notice the orphaned SoG process when I restarted Boinc. This left the stuck WU locked and Boinc trying to run it again. After about 30 seconds Boinc switched to a new WU and message line said something like "Acquiring lock" & "task postponed for 600 seconds". Finally I aborted that WU manually.
15) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1798142)
Posted 23 Jun 2016 by Harri Liljeroos
I am using the 3472 build of SoG that should help with application hangup when using -use_sleep parameter. I downloaded it from Raistmer's site.

Unfortunately the hangups still happen on my host that has a GTX970 and a GTX650 Ti. Here is a link to the latest WU with problem: http://setiathome.berkeley.edu/result.php?resultid=4999412070 The WU was started with 650 Ti and finished with the 970 after I restarted Boinc. The Nvidia driver is 361.91 and commandline was -sbs 256 -period_iterations_num 30 -use_sleep_ex 4 -hp. I am running one WU at a time on both GPUs.

This is what I have noticed about these hangups:
- They happen when driver was not responding and was restarted (but not at every restart)
- Happens only on my 650 Ti when doing Arecibo WU (but not on every WU)
- The SoG application does not terminate if you stop Boinc, you have to kill the process manually to recover the WU, otherwise it's locked up and cannot be accessed when you restart Boinc
- If the hangup goes on long enough the WU errors on TIME_LIMIT_EXCEEDED. I don't know is the SoG application terminated in this situation.

I have now reduced -period_iteration_num to 35 to see when the driver restarts go away.

This propably didn't provide any new info but anyway that's what I am seeing.
16) Message boards : Number crunching : BOINC assigns device X - Problem (Message 1797548)
Posted 20 Jun 2016 by Harri Liljeroos
I have the problem. Here http://setiathome.berkeley.edu/result.php?resultid=4994650280 is a task that was run today. It has the line
BOINC assigns device 0
but is missing the line
Running on device number: 0
And it was actually running on device 1 based on the reported compute units (4).

This machine has two GPUs, device 0 = GTX970 and device 1 is GTX650Ti. The application used was SoG r3472 from Lunatics 0.45b3. Boinc is 7.6.22 on Win7 64 bit.
17) Message boards : Number crunching : 1073741205 Error Code (Unknown Error) (Message 1796891)
Posted 17 Jun 2016 by Harri Liljeroos
Today I got 26 WU fail with this error while I was updating windows. One software refused to close while Windows was shutting down and Windows showed that this software was prohibiting windows shutdown. So I had to cancel the Windows restart and kill the stuck software and then do the restart. Unfortunately 26 WUs were trashed during this.

I'm now uninstalling the KB3145739 and see what this does.
18) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794635)
Posted 9 Jun 2016 by Harri Liljeroos
Thank you for the information. I'll keep it in mind for the next time. For now I don't have time to experiment more.
19) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794507)
Posted 8 Jun 2016 by Harri Liljeroos
If "1" never shown what about app's device capabilities listing? Does it show sometime other GPU selected?


Yes, sometimes it has used the device 0 and shown it correctly on both lines of stderr.

Unfortunately I had to revert back to the cuda applications, too many driver and computer crashes while running SoG. I may try again after tuning becomes easier. Maybe these cards (GTX970 and GTX650 Ti) are too different to run SoG smoothly on a same computer.
20) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1794361)
Posted 8 Jun 2016 by Harri Liljeroos
I have a host with two NV GPUs, d0= GTX970 and d1= GTX650Ti. The stderr output shows the following:

<core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> Number of period iterations for PulseFind set to:100 Maximum single buffer size set to:256MB Sleep() argument set to: 2; use_sleep enabled Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: Intel(R) Corporation OpenCL platform detected: NVIDIA Corporation [color=red]BOINC assigns device 0[/color] Info: BOINC provided OpenCL device ID used Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW USE_SSE3 x86 CPUID: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX OpenCL-kernels filename : MultiBeam_Kernels_r3430.cl ar=0.423898 NumCfft=196705 NumGauss=1114679782 NumPulse=226372973048 NumTriplet=452748356604 Currently allocated 357 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Windows optimized setiathome_v8 application Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan SSE3xj Win32 Build 3430 , Ported by : Raistmer, JDWhale SETI8 update by Raistmer OpenCL version by Raistmer, r3430 Number of OpenCL platforms: 2 OpenCL Platform Name: Intel(R) OpenCL Number of devices: 1 Max compute units: 16 Max work group size: 512 Max clock frequency: 350Mhz Max memory allocation: 190840832 Cache type: Read/Write Cache line size: 64 Cache size: 2097152 Global memory size: 763363328 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Queue properties: Out-of-Order: No Name: Intel(R) HD Graphics 4000 Vendor: Intel(R) Corporation Driver version: 9.18.10.3165 Version: OpenCL 1.2 Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_depth_images cl_khr_gl_depth_images cl_khr_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_image2d_from_buffer OpenCL Platform Name: NVIDIA CUDA Number of devices: 2 Max compute units: 13 Max work group size: 1024 Max clock frequency: 1253Mhz Max memory allocation: 1073741824 Cache type: Read/Write Cache line size: 128 Cache size: 212992 Global memory size: 4294967296 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 970 Vendor: NVIDIA Corporation Driver version: 361.91 Version: OpenCL 1.2 CUDA Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts Max compute units: 4 Max work group size: 1024 Max clock frequency: 928Mhz Max memory allocation: 268435456 Cache type: Read/Write Cache line size: 128 Cache size: 65536 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 650 Ti Vendor: NVIDIA Corporation Driver version: 361.91 Version: OpenCL 1.2 CUDA Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.423898 Used GPU device parameters are: Number of compute units: 4 Single buffer allocation size: 256MB [color=red]Total device global memory: 1024MB[/color] max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: yes LowPerformanceGPU path: no period_iterations_num=100 Spike: peak=24.21453, time=33.55, d_freq=1420445367.72, chirp=-8.2481, fft_len=128k Spike: peak=24.62073, time=33.55, d_freq=1420445367.72, chirp=-8.2527, fft_len=128k Spike: peak=24.28191, time=43.62, d_freq=1420446142.43, chirp=-22.286, fft_len=64k Spike: peak=24.99428, time=43.62, d_freq=1420446142.42, chirp=-22.289, fft_len=64k Spike: peak=24.54694, time=43.62, d_freq=1420446142.41, chirp=-22.293, fft_len=64k Pulse: peak=2.569161, time=101, period=0.6728, d_freq=1420449811.82, score=1.023, chirp=-23.578, fft_len=256 Triplet: peak=9.798454, time=46.64, period=0.4719, d_freq=1420447396.73, chirp=-38.251, fft_len=512 Triplet: peak=10.44948, time=46.64, period=0.4719, d_freq=1420447398.26, chirp=-38.628, fft_len=512 Pulse: peak=1.646489, time=63.31, period=0.3199, d_freq=1420452020.71, score=1.001, chirp=-76.252, fft_len=32 Pulse: peak=9.856218, time=63.31, period=4.148, d_freq=1420451296.07, score=1.013, chirp=-85.281, fft_len=128 Pulse: peak=10.87323, time=63.31, period=4.148, d_freq=1420451283.61, score=1.117, chirp=-90.297, fft_len=128 Pulse: peak=11.17374, time=63.31, period=4.148, d_freq=1420451296.35, score=1.148, chirp=-91.301, fft_len=128 Pulse: peak=10.0039, time=63.31, period=4.148, d_freq=1420451309.15, score=1.028, chirp=-92.304, fft_len=128 Best spike: peak=24.99428, time=43.62, d_freq=1420446142.42, chirp=-22.289, fft_len=64k Best autocorr: peak=16.21737, time=20.13, delay=0.5461, d_freq=1420449022.03, chirp=-9.7713, fft_len=128k Best gaussian: peak=3.597618, mean=0.5139577, ChiSq=1.088942, time=84.72, d_freq=1420445953.86, score=-0.3936186, null_hyp=2.063037, chirp=7.9949, fft_len=16k Best pulse: peak=11.17374, time=63.31, period=4.148, d_freq=1420451296.35, score=1.148, chirp=-91.301, fft_len=128 Best triplet: peak=10.44948, time=46.64, period=0.4719, d_freq=1420447398.26, chirp=-38.628, fft_len=512 Flopcounter: 7199838776822.398400 Spike count: 5 Autocorr count: 0 Pulse count: 6 Triplet count: 2 Gaussian count: 0 Wallclock time elapsed since last restart: 1630.1 seconds class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0 class Gaussian_new_best: total=75, N=75, <>=1, min=1 max=1 class Gaussian_report: total=0, N=0, <>=0, min=0 max=0 class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0 class PC_triplet_find_hit: total=25237, N=25237, <>=1, min=1 max=1 class PC_triplet_find_miss: total=250, N=250, <>=1, min=1 max=1 class PC_pulse_find_hit: total=12728, N=12728, <>=1, min=1 max=1 class PC_pulse_find_miss: total=15, N=15, <>=1, min=1 max=1 class PC_pulse_find_early_miss: total=13, N=13, <>=1, min=1 max=1 class PC_pulse_find_2CPU: total=1, N=1, <>=1, min=1 max=1 class PoT_transfer_not_needed: total=25227, N=25227, <>=1, min=1 max=1 class PoT_transfer_needed: total=261, N=261, <>=1, min=1 max=1 GPU device sync requested... ...GPU device synched 08:55:20 (7728): called boinc_finish(0) </stderr_txt> ]]>


At the top the first red line suggest that d0 (GTX970) was selected but the end section (Used GPU parameters) the red line refers to GTX650Ti with 1GByte memory. So is the first line a mistake or what? I am not at that host currently so can't check from it. Browsing thru the returned WUs the line "BOINC assigns device 0" is on every stderr and none have "BOINC assigns device 1".

Edit: the color red doesn't show inside the [code] tags.


Next 20

Copyright © 2016 University of California