Message boards :
Number crunching :
BOINC assigns device X - Problem
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And ? BOINCs_device should contain enumerated from zero device ID. boinc_device_id is opaque OpenCL device handler. Actually, pointer. SO, suggestions? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I let my host 8064262 run all night with SoG and with the "<api_version>" line for the SoG app removed from app_info.xml. Everything looks fine, with device numbers being consistently reported correctly in the Stderr. So that's what should be fixed - app_info.aistub in installer and standalone pack perhaps if I adopted that tag too. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Better to cure at source. What cure you speak about? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
You're using boinc_device_id extracted (by the API) from init_data.xml You're reporting the BOINCs_device parsed from the command line. They are not (necessarily) the same - and this thread proves that they aren't in practice, either. You need to report the same variable that you're using. Edit: //R: now we know on what device program will be executed so query this particular device for capabilities And that is boinc_device_id: so, fprintf(stderr,"BOINC assigns device %d\n",boinc_device_id); (with exception testing for the other enumeration routes) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
fprintf(stderr,"BOINC assigns device %d\n",boinc_device_id); I would appreciate if you would read all my messages before responding, not just cite them. boinc_device_id is opaque OpenCL device handler. Actually, pointer. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
BOINCs_device should contain enumerated from zero device ID. But it doesn't, because you haven't put it there in the native OpenCL case. Instead, you're printing 'BOINC assigns device 0', even when this is untrue - BOINC has assigned a different device. OK, let's leave it for now - I need a break for maintenance. I'll start reading from http://boinc.berkeley.edu/trac/wiki/AppCoprocessor#Deviceselection in the morning, and see what I can come up with. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
OK, another search, and I think I have it. No pointers, honest. But before I go on (and before you feel the urge to shout at me in bold large font again) please remember that I have been trained in understanding the logic of program execution, but I have not been taught the formal syntax of C++ - I don't think it had been invented when they taught me BCPL around 1973. So any examples I suggest will be in some form of meta-logic code, and will need translating to the implementation language of choice. So, I found Building BOINC applications with Cuda and OpenCL, and some guidance on device numbers in a sample application: // * Code to determine the correct device assigned by BOINC. It needs to get // the device number from the gpu_opencl_dev_index field of init_data.xml // if it exists, else from the gpu_device_num field of init_data.xml if that // exists, else from the --device or -device argument passed by the client. // See api/boinc_opencl.cpp for code which does this. That makes sense to my logic-trained mind: try the newest, most fully-defined case first, and only fall back to the alternatives if that fails. We don't want a pointer to gpu_opencl_dev_index or gpu_device_num, but we do have a structure containing them, and you're using it already: extern struct APP_INIT_DATA app_init_data; //R: for project directory query As well as the project directory implied by your comment, you're also using it already for app_init_data.host_info.m_nbytes, so I'd suggest something along the lines of IF app_init_data.gpu_opencl_dev_index >=0 THEN printf "Info: BOINC provided OpenCL device ID" app_init_data.gpu_opencl_dev_index ELSEIF app_init_data.gpu_device_num >=0 THEN printf "Info: BOINC provided gpu device ID" app_init_data.gpu_device_num ELSEIF BOINCs_device >=0 THEN printf "Info: BOINC'S command line provided GPU device ID" BOINCs_device ELSE [something went wrong - shouldn't reach here if we don't have a valid ID] ENDIF Note that implies that you need to initialise BOINCs_device to something like -1 - not a good idea to default a variable to zero when it's going to be used for a zero-based enumeration: if you initialise it to something out of band, you can test (as here) if a valid value has been substituted during processing. I was caught that way sooooo many times before I discovered 'option explicit'... |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
Hi Richard, excellent, so should it be.. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
0 was taken as default to ensure first GPU device will be used even if BOINC can't provide anything usable (and at time of writing that code BOINC had no adequate GPU support indeed. Don't forget that SETI supported ATi GPU computations way before BOINC did). It can be initialized to negative number indeed but should be reverted to 0 after possible BOINC failure. I'll add logic you suggested into the some new rev though currently it's cosmetic only - real device still handled by OpenCL device handler passed by BOINC (that is, 2 different entities will be used still for actual device selection and eye-candy device ID printing). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
... currently it's cosmetic only - real device still handled by OpenCL device handler passed by BOINC (that is, 2 different entities will be used still for actual device selection and eye-candy device ID printing). Thanks - sounds good. I don't think the 'real device' will be different from the 'eye candy' device in practice, because boinc_get_opencl_ids() will use the gpu_opencl_dev_index passed in init_data.xml to identify the boinc_device_id called for by the client. So we should be good to go :-) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Check if device ID printed correctly now: https://cloud.mail.ru/public/HQXF/SQ6bU8XpL SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Check if device ID printed correctly now: Got nearly all the way - but it needs somebody with an ATi_HD5, please. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
here is NV flavor: https://cloud.mail.ru/public/Libc/k4c5kay12 this build with -total_GPU_instances_num N provided (in CPUlock mode) attempts to skip hyperthreaded CPU to use different cores when number of instances allows. worth to test also. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
No change on the "BOINC assigns device 0" front, I'm afraid. This a freshly started task after swapping applications: as previous example from this machine (this thread), task is running on BOINC's NV device 1, the 750Ti with 5 compute units. Planning to head to bed very soon, but can test again first thing in the morning. Edit - checked the init_data.xml for that job. <gpu_type>NVIDIA</gpu_type> <gpu_device_num>1</gpu_device_num> <gpu_opencl_dev_index>1</gpu_opencl_dev_index> Maximum single buffer size set to:256MB SpikeFind FFT size threshold override set to:2048 TUNE: kernel 1 now has workgroup size of (64,1,4) oclFFT global radix override set to:256 oclFFT local radix override set to:16 oclFFT max WG size override set to:256 oclFFT max local FFT size override set to:512 oclFFT number of local memory banks set to:32 oclFFT minimal memory coalesce width set to:32 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: Intel(R) Corporation OpenCL platform detected: NVIDIA Corporation BOINC assigns device 0 Info: BOINC provided OpenCL device ID used Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW USE_SSE3 x86 CPUID: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl ar=0.447463 NumCfft=191393 NumGauss=1056663158 NumPulse=226342285696 NumTriplet=452755819082 Currently allocated 357 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Windows optimized setiathome_v8 application Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan SSE3xj Win32 Build 3500 , Ported by : Raistmer, JDWhale SETI8 update by Raistmer OpenCL version by Raistmer, r3500 Number of OpenCL platforms: 2 OpenCL Platform Name: Intel(R) OpenCL Number of devices: 1 Max compute units: 20 Max work group size: 512 Max clock frequency: 1200Mhz Max memory allocation: 340158054 Cache type: Read/Write Cache line size: 64 Cache size: 2097152 Global memory size: 1360632218 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Queue properties: Out-of-Order: No Name: Intel(R) HD Graphics 4600 Vendor: Intel(R) Corporation Driver version: 10.18.10.3621 Version: OpenCL 1.2 Extensions: cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_accelerator cl_intel_ctz cl_intel_motion_estimation OpenCL Platform Name: NVIDIA CUDA Number of devices: 2 Max compute units: 13 Max work group size: 1024 Max clock frequency: 1228Mhz Max memory allocation: 1073741824 Cache type: Read/Write Cache line size: 128 Cache size: 212992 Global memory size: 4294967296 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49151 Queue properties: Out-of-Order: Yes Name: GeForce GTX 970 Vendor: NVIDIA Corporation Driver version: 350.12 Version: OpenCL 1.2 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Max compute units: 5 Max work group size: 1024 Max clock frequency: 1280Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 128 Cache size: 81920 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49151 Queue properties: Out-of-Order: Yes Name: GeForce GTX 750 Ti Vendor: NVIDIA Corporation Driver version: 350.12 Version: OpenCL 1.2 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.447463 Used GPU device parameters are: Number of compute units: 5 Single buffer allocation size: 256MB Total device global memory: 2048MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: yes LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
|
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
No change on the "BOINC assigns device 0" front, I'm afraid. I concur. I put the <api_version>7.5.0</api_version> line back into my app_info, copied the .exe and .cl files from the new package over, and restarted BOINC. The restarted tasks are all showing "BOINC assigns device 0" and the first new task, which is running on Device 3, also shows "BOINC assigns device 0". |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
https://cloud.mail.ru/public/5NAU/dtjuyXMcL SETI apps news We're not gonna fight them. We're gonna transcend them. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
https://cloud.mail.ru/public/5NAU/dtjuyXMcL Looks much better. :^) All restarted tasks show correct device numbers in Stderr and the first newly started one shows: Maximum single buffer size set to:256MB SpikeFind FFT size threshold override set to:2048 TUNE: kernel 1 now has workgroup size of (64,1,4) app_init_data.gpu_opencl_dev_index=3; app_init_data.gpu_device_num=3; selected_device=-1 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: NVIDIA Corporation BOINC assigns device 3 Info: BOINC provided OpenCL device ID used |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Works for me here too. But I notice that file MultiBeam_Kernels_r3500.cl has grown in size yet again, to 200,639 bytes, dated 01 September 2016. That procedure (changing files without committing the changes and getting a new revision number) is causing download errors at Beta, so I won't add to the confusion with a Beta5 installer until there's a proper new build number with committed code, once the other changes have been tested too. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Sad story about spending few days for debugging w/o success and ultimately splitting single kernel into two besides that. But this story for AMD support forums perhaps. This build skips odd CPU numbers when possible too so time fore rev update soon. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.