Message boards :
Number crunching :
OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 18 · Next
Author | Message |
---|---|
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66202 Credit: 55,293,173 RAC: 49 |
Do you know where your seti@home folder is? Thanks Zalster, My i7 3820 cpu likes that, a lot, so far the cpu is at 6.5% and dropping, better than 99% any day. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
2) Do additional commands have to be on the same line or can they be listed sequentially on following lines? . . Thanks again, you have been most helpful. Strange thing though since I added that command I haven't received a single CPU WU. I might remove it and see if that changes. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Can you point me to the doc file (*.txt) that explains all this? . . Will do right now |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . I have implemented the -use_sleep command and the CPU load has dropped far enough to permit returning to 3 CPU tasks instead of just two. Downside: SoG runtimes are up from 12-13 mins to around 20 mins. GPU usage is more sporadic in nature now which would account for the increase in runtime. Some messages have indicated using a parameter with -use_sleep implying it determines the length of sleep interval in mS. The doc file does not show it supporting any parameters. Is the use of a parameter supported or not? And if it is does the value relate to sleep interval in mS? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
2) Do additional commands have to be on the same line or can they be listed sequentially on following lines? . . Problem solved, turns out that when I suspended the Guppi WUs in my cache, so I could test with like type of WUs over different settings, BOINC saw that as a halt to requests for new work. Timing is everything and mine sucks as usual, I discovered this as I was about to go out, so I was not able to investigate and resolve it until I got home, but sorted nicely now. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . P.S. Note to self, when you have found the topic you wanted in a text document keep scrolling, there may be more and useful information further down. . . Turns out there is a -use_sleep_ex 'N' command but it does not explain the value or function of 'N'. I set it to 1 as one message suggested but cannot say there is any appreciable difference. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It's the GBT data that is requiring such large amount of CPU usage. . . Actually on my setup SoG runs slightly faster than CUDA50 on nonVLARs but still steals a CPU taking 100% of that CPU's time. But it takes only 30 mins for a Guppi compared to over 50 mins with CUDA50. This is with the sleep option OFF. . . With sleep ON everything changes. CPU loads return to normal and I can again run 3 CPU WUs, BUT runtimes for nonVLARs blow out to 20 mins, significantly longer than CUDA50. Guppis much longer again. This is because GPU utilisation becomes sporadic and averages only about 30%. . . To compensate I started running threesies (3 WUs at a time) on GPU. With nonVLARs this worked as anticipated and raised GPU utilisation to a good level, nonVLARs running in about 36 mins (12 per WU). BUT again Guppis are the fly in the ointment and Guppi runtimes blew out to 3Hrs 15-20mins. I am currently experimenting with -use_sleep_ex and the parameter, I have managed to increase the GPU laod cycle marginally but run times remain exagerated. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66202 Credit: 55,293,173 RAC: 49 |
I'd tried to run 4 or 5 cpu wu's with 3 gpu wu's, big mistake, now I have 3 cpu and 3 gpu wu's being crunched and I have 3 gpu wu's waiting to run, cpu i7 3820, gpu gtx 580, driver is 353.06 on Windows 7 Pro sp1 x64. I'm using the open beta, I thought somebody would like to know. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I'd tried to run 4 or 5 cpu wu's with 3 gpu wu's, big mistake, now I have 3 cpu and 3 gpu wu's being crunched and I have 3 gpu wu's waiting to run, cpu i7 3820, gpu gtx 580, driver is 353.06 on Windows 7 Pro sp1 x64. I'm using the open beta, I thought somebody would like to know. . . Normally I would say 1 cpu core was enough to service a GPU even running multiple tasks, but with SoG it seem to be more like 1 CPU per WU running (if you turn ON sleep that changes) and one for general admin/overflow would leave 4 free to crunch with. I wouldn't suggest running multiple GPU WUs with sleep OFF. But the proof of the pudding they say is in the eating. . . I am running 3 CPU cores crunching and three GPU tasks under SoG (GTX950) but with sleep turned ON. The one non-crunching CPU core is coping well with that arrangement. |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
I need a command line for my gtx560m, it works fine when I don't use the -use_sleep option and running 2 WUs at a time, when I do use the -use_sleep option, it will only run 1 WU. The command line I need is one the don't stress the laptop to much since I also use it for other things than SETI. Also, I don't run any CPU tasks. |
Mike Send message Joined: 17 Feb 01 Posts: 34348 Credit: 79,922,639 RAC: 80 |
I need a command line for my gtx560m, it works fine when I don't use the -use_sleep option and running 2 WUs at a time, when I do use the -use_sleep option, it will only run 1 WU. -sbs 256 -spike_fft_thresh 2048 -tune 1 64 1 4 With each crime and every kindness we birth our future. |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
I need a command line for my gtx560m, it works fine when I don't use the -use_sleep option and running 2 WUs at a time, when I do use the -use_sleep option, it will only run 1 WU. This is what I've tried, but when I added -use_sleep, which I need, it will only run 1 WU instead of the 2 it should. |
Mike Send message Joined: 17 Feb 01 Posts: 34348 Credit: 79,922,639 RAC: 80 |
I need a command line for my gtx560m, it works fine when I don't use the -use_sleep option and running 2 WUs at a time, when I do use the -use_sleep option, it will only run 1 WU. So add this to the comand line -total_GPU_instances_num 2. post a link to finnished task with those params so i can check. With each crime and every kindness we birth our future. |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
It's not working, 1 WU runs fine but the other, while it runs, there's no progress at all. It's been running for 30 mins with 0% in progress and time remaining isn't moving. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's not working, 1 WU runs fine but the other, while it runs, there's no progress at all. It's been running for 30 mins with 0% in progress and time remaining isn't moving. There is known issue with -use_sleep on SoG app (see support thread). Here is possible fix: https://cloud.mail.ru/public/3X4g/HwEhUWHCE |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
Here's the 1 WU run with the SoG app http://setiathome.berkeley.edu/result.php?resultid=4970988937 And here 3 WUs run with the app that Raistmer just posted, which seems to run fine http://setiathome.berkeley.edu/result.php?resultid=4970983131 http://setiathome.berkeley.edu/result.php?resultid=4971019196 http://setiathome.berkeley.edu/result.php?resultid=4971040584 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's quick fix attempt so something different could be broken. So watch closely for inconclusives rate for this build. |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 4645 Credit: 85,281,665 RAC: 126 |
I have a host with two NV GPUs, d0= GTX970 and d1= GTX650Ti. The stderr output shows the following: <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> Number of period iterations for PulseFind set to:100 Maximum single buffer size set to:256MB Sleep() argument set to: 2; use_sleep enabled Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: Intel(R) Corporation OpenCL platform detected: NVIDIA Corporation [color=red]BOINC assigns device 0[/color] Info: BOINC provided OpenCL device ID used Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW USE_SSE3 x86 CPUID: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX OpenCL-kernels filename : MultiBeam_Kernels_r3430.cl ar=0.423898 NumCfft=196705 NumGauss=1114679782 NumPulse=226372973048 NumTriplet=452748356604 Currently allocated 357 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Windows optimized setiathome_v8 application Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan SSE3xj Win32 Build 3430 , Ported by : Raistmer, JDWhale SETI8 update by Raistmer OpenCL version by Raistmer, r3430 Number of OpenCL platforms: 2 OpenCL Platform Name: Intel(R) OpenCL Number of devices: 1 Max compute units: 16 Max work group size: 512 Max clock frequency: 350Mhz Max memory allocation: 190840832 Cache type: Read/Write Cache line size: 64 Cache size: 2097152 Global memory size: 763363328 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Queue properties: Out-of-Order: No Name: Intel(R) HD Graphics 4000 Vendor: Intel(R) Corporation Driver version: 9.18.10.3165 Version: OpenCL 1.2 Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_depth_images cl_khr_gl_depth_images cl_khr_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_image2d_from_buffer OpenCL Platform Name: NVIDIA CUDA Number of devices: 2 Max compute units: 13 Max work group size: 1024 Max clock frequency: 1253Mhz Max memory allocation: 1073741824 Cache type: Read/Write Cache line size: 128 Cache size: 212992 Global memory size: 4294967296 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 970 Vendor: NVIDIA Corporation Driver version: 361.91 Version: OpenCL 1.2 CUDA Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts Max compute units: 4 Max work group size: 1024 Max clock frequency: 928Mhz Max memory allocation: 268435456 Cache type: Read/Write Cache line size: 128 Cache size: 65536 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 650 Ti Vendor: NVIDIA Corporation Driver version: 361.91 Version: OpenCL 1.2 CUDA Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.423898 Used GPU device parameters are: Number of compute units: 4 Single buffer allocation size: 256MB [color=red]Total device global memory: 1024MB[/color] max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: yes LowPerformanceGPU path: no period_iterations_num=100 Spike: peak=24.21453, time=33.55, d_freq=1420445367.72, chirp=-8.2481, fft_len=128k Spike: peak=24.62073, time=33.55, d_freq=1420445367.72, chirp=-8.2527, fft_len=128k Spike: peak=24.28191, time=43.62, d_freq=1420446142.43, chirp=-22.286, fft_len=64k Spike: peak=24.99428, time=43.62, d_freq=1420446142.42, chirp=-22.289, fft_len=64k Spike: peak=24.54694, time=43.62, d_freq=1420446142.41, chirp=-22.293, fft_len=64k Pulse: peak=2.569161, time=101, period=0.6728, d_freq=1420449811.82, score=1.023, chirp=-23.578, fft_len=256 Triplet: peak=9.798454, time=46.64, period=0.4719, d_freq=1420447396.73, chirp=-38.251, fft_len=512 Triplet: peak=10.44948, time=46.64, period=0.4719, d_freq=1420447398.26, chirp=-38.628, fft_len=512 Pulse: peak=1.646489, time=63.31, period=0.3199, d_freq=1420452020.71, score=1.001, chirp=-76.252, fft_len=32 Pulse: peak=9.856218, time=63.31, period=4.148, d_freq=1420451296.07, score=1.013, chirp=-85.281, fft_len=128 Pulse: peak=10.87323, time=63.31, period=4.148, d_freq=1420451283.61, score=1.117, chirp=-90.297, fft_len=128 Pulse: peak=11.17374, time=63.31, period=4.148, d_freq=1420451296.35, score=1.148, chirp=-91.301, fft_len=128 Pulse: peak=10.0039, time=63.31, period=4.148, d_freq=1420451309.15, score=1.028, chirp=-92.304, fft_len=128 Best spike: peak=24.99428, time=43.62, d_freq=1420446142.42, chirp=-22.289, fft_len=64k Best autocorr: peak=16.21737, time=20.13, delay=0.5461, d_freq=1420449022.03, chirp=-9.7713, fft_len=128k Best gaussian: peak=3.597618, mean=0.5139577, ChiSq=1.088942, time=84.72, d_freq=1420445953.86, score=-0.3936186, null_hyp=2.063037, chirp=7.9949, fft_len=16k Best pulse: peak=11.17374, time=63.31, period=4.148, d_freq=1420451296.35, score=1.148, chirp=-91.301, fft_len=128 Best triplet: peak=10.44948, time=46.64, period=0.4719, d_freq=1420447398.26, chirp=-38.628, fft_len=512 Flopcounter: 7199838776822.398400 Spike count: 5 Autocorr count: 0 Pulse count: 6 Triplet count: 2 Gaussian count: 0 Wallclock time elapsed since last restart: 1630.1 seconds class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0 class Gaussian_new_best: total=75, N=75, <>=1, min=1 max=1 class Gaussian_report: total=0, N=0, <>=0, min=0 max=0 class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0 class PC_triplet_find_hit: total=25237, N=25237, <>=1, min=1 max=1 class PC_triplet_find_miss: total=250, N=250, <>=1, min=1 max=1 class PC_pulse_find_hit: total=12728, N=12728, <>=1, min=1 max=1 class PC_pulse_find_miss: total=15, N=15, <>=1, min=1 max=1 class PC_pulse_find_early_miss: total=13, N=13, <>=1, min=1 max=1 class PC_pulse_find_2CPU: total=1, N=1, <>=1, min=1 max=1 class PoT_transfer_not_needed: total=25227, N=25227, <>=1, min=1 max=1 class PoT_transfer_needed: total=261, N=261, <>=1, min=1 max=1 GPU device sync requested... ...GPU device synched 08:55:20 (7728): called boinc_finish(0) </stderr_txt> ]]> At the top the first red line suggest that d0 (GTX970) was selected but the end section (Used GPU parameters) the red line refers to GTX650Ti with 1GByte memory. So is the first line a mistake or what? I am not at that host currently so can't check from it. Browsing thru the returned WUs the line "BOINC assigns device 0" is on every stderr and none have "BOINC assigns device 1". Edit: the color red doesn't show inside the [code] tags. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
If "1" never shown what about app's device capabilities listing? Does it show sometime other GPU selected? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi Raistmer, . . I hope I am not boring you with my attempts to tweak things but SoG has been running wonderfully with these settings:- . . -use_sleep_ex 3 -sbs 384 -period_iterations_num 4 . . But the distinct lag/stutter effect was annoying me so I tried this change:- . . -use_sleep_ex 3 -sbs 384 -period_iterations_num 5 . . This seemed to achieve a modest improvement in the lag effect but also caused a noticeable blow out in run times appearing to add around 15% to CPU and nonVLAR GPU WUs but more with VLAR/Guppies. Seeming to be about 50% or more. Most noticeable on bc5 Guppies. . . I am returning to the -period_iteration_num 4 . |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.