Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Author	Message
Bluerazor Send message Joined: 22 May 99 Posts: 15 Credit: 3,889,427 RAC: 12	Message 2010688 - Posted: 4 Sep 2019, 23:07:07 UTC Unfortunately today's 19.9.1 drivers do not fix the situation. Also, the problem will be very acute because the RX 5700 "finishes" each task in 11 seconds before reporting them as overflow. Naturally, I aborted unfinished GPU tasks and blocked the GPU to prevent further problems. Do you need the actual stderr copied or just links?? Here are some results that unfortunately are bound to come out invalid: https://setiathome.berkeley.edu/result.php?resultid=8021927435 https://setiathome.berkeley.edu/result.php?resultid=8021927356 https://setiathome.berkeley.edu/result.php?resultid=8021927647 https://setiathome.berkeley.edu/result.php?resultid=8021927677 As for testing, I have the broken HW, but I will be unavailable for a little while so I can't help out in the short term. If there is a proposed fix or a way of doing better troubleshooting I would be willing to do some testing. ID: 2010688 · Reply Quote

elec999 Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141	Message 2010691 - Posted: 4 Sep 2019, 23:29:45 UTC I recently purchased one of these cards 5700XT. Guess I'll return it and stick to Nvidia. ID: 2010691 · Reply Quote

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 2010720 - Posted: 5 Sep 2019, 7:04:40 UTC - in response to Message 2010648. And did someone thread about it on AMD OpenCL forums? Anyone with ability to do offline testing and possession of such "broken" hardware+software? Phoronix did testing and reviews of the RX 5700XT and could not get the card and drivers to pass the OpenCL parts of their standardized test suite. Thanks. Shame on AMD . Natural dunces :/ SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 2010720 · Reply Quote

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 2010724 - Posted: 5 Sep 2019, 8:31:39 UTC https://community.amd.com/message/2928820 SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 2010724 · Reply Quote

Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62	Message 2010728 - Posted: 5 Sep 2019, 9:58:51 UTC Number of devices: 1 Max compute units: 18 Max work group size: 256 Max clock frequency: 1625Mhz Max memory allocation: 3221225472 Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 3221225472 Constant buffer size: 3221225472 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Queue properties: Out-of-Order: No Name: gfx1010 Vendor: Advanced Micro Devices, Inc. Driver version: 2906.10 (PAL,LC) Version: OpenCL 1.2 AMD-APP (2906.10) Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_gl_event cl_amd_liquid_flash cl_amd_copy_buffer_p2p seems not to have command line options, autodetect only 18 of 36 CU of the 5700 non XT ? ID: 2010728 · Reply Quote

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 2010731 - Posted: 5 Sep 2019, 10:19:31 UTC - in response to Message 2010728. What is the computer number, what is the task number? Without these basic bits of information what you have just posted is fairly meaningless... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 2010731 · Reply Quote

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 2010739 - Posted: 5 Sep 2019, 11:20:41 UTC I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD. ID: 2010739 · Reply Quote

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 2010741 - Posted: 5 Sep 2019, 11:37:29 UTC - in response to Message 2010739. I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD. +++ SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 2010741 · Reply Quote

Bluerazor Send message Joined: 22 May 99 Posts: 15 Credit: 3,889,427 RAC: 12	Message 2010767 - Posted: 5 Sep 2019, 15:59:07 UTC - in response to Message 2010741. Here's what I got from that utility, for what it's worth. Apologies about any followup questions, as I will be unable to respond much until Tuesday or so. Note ... the 19.9.1 drivers fixed a crash bug with the card, and the card was crashing my PC with some regularity (per Windows Event Logs), so if that level of bug still exists in the drivers, I am not too surprised they haven't gotten to the OpenCL issue. Disappointed, yes, but not surprised. I am running a fully patched Windows 10 system with the 19.9.1 driver that released yesterday, and I have no overclocks on the card. I would be happy to go post to the AMD forums myself with this (next week). Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.1 AMD-APP (2906.10) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices Platform Host timer resolution 100ns Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 1 Device Name gfx1010 Device Vendor Advanced Micro Devices, Inc. Device Vendor ID 0x1002 Device Version OpenCL 2.0 AMD-APP (2906.10) Driver Version 2906.10 (PAL,LC) Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Board Name (AMD) AMD Radeon RX 5700 Device Topology (AMD) PCI-E, 2f:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 18 SIMD per compute unit (AMD) 2 SIMD width (AMD) 32 SIMD instruction width (AMD) 1 Max clock frequency 1625MHz Graphics IP (AMD) 10.10 Device Partition (core) Max number of sub-devices 18 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 256 Preferred work group size (AMD) 256 Max work group size (AMD) 1024 Preferred work group size multiple 32 Wavefront width (AMD) 32 Preferred / native vector sizes char 4 / 4 short 2 / 2 int 1 / 1 long 1 / 1 half 1 / 1 (cl_khr_fp16) float 1 / 1 double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs No Round to nearest No Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 8573157376 (7.984GiB) Global free memory (AMD) 8306688 (7.922GiB) Global memory channels (AMD) 8 Global memory banks per channel (AMD) 4 Global memory bank width (AMD) 256 bytes Error Correction support No Max memory allocation 4244635648 (3.953GiB) Unified memory for Host and Device No Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing Yes Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 2048 bits (256 bytes) Preferred alignment for atomics SVM 0 bytes Global 0 bytes Local 0 bytes Max size for global variable 3820172032 (3.558GiB) Preferred total size of global vars 8573157376 (7.984GiB) Global Memory cache type Read/Write Global Memory cache size 16384 (16KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 134217728 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 256 bytes Pitch alignment for 2D image buffers 256 pixels Max 2D image size 16384x16384 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 64 Max number of read/write image args 64 Max number of pipe args 16 Max active pipe reservations 16 Max pipe packet size 4244635648 (3.953GiB) Local memory type Local Local memory size 65536 (64KiB) Local memory syze per CU (AMD) 65536 (64KiB) Local memory banks (AMD) 32 Max number of constant args 8 Max constant buffer size 4244635648 (3.953GiB) Preferred constant buffer size (AMD) 16384 (16KiB) Max size of kernel argument 1024 Queue properties (on host) Out-of-order execution No Profiling Yes Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 262144 (256KiB) Max size 8388608 (8MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop Yes Number of P2P devices (AMD) 0 P2P devices (AMD) (n/a) Profiling timer resolution 1ns Profiling timer offset since Epoch (AMD) 1567638949549610000ns (Wed Sep 04 19:15:49 2019) Execution capabilities Run OpenCL kernels Yes Run native kernels No Thread trace supported (AMD) Yes Number of async queues (AMD) 2 Max real-time compute queues (AMD) 3 Max real-time compute units (AMD) 8 printf() buffer size 4194304 (4MiB) Built-in kernels (n/a) Device Extensions cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_copy_buffer_p2p cl_amd_planar_yuv NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [AMD] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx1010 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx1010 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx1010 ID: 2010767 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2010770 - Posted: 5 Sep 2019, 16:11:43 UTC - in response to Message 2010739. I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD. You don't need to build clinfo for Debian or Ubuntu. It is standard in the distros. Just install it. sudo apt install clinfo Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2010770 · Reply Quote

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 2010772 - Posted: 5 Sep 2019, 16:21:25 UTC - in response to Message 2010767. Last modified: 5 Sep 2019, 16:23:01 UTC When posting results as you have it is good practice to include the task number and computer id - this allows others to quickly see if it is "once off" event, or one that is repeating. In the example below I grabbed one of yours from your "pending" pils as it will probably be around for a bit longer than one in the error or invalid lists: (A quick glance at this result suggests to me that this one is going to end up in the "invalid" list eventually - there are a lot of signals detected, and the run-time was very short, also it has exited with exit state=9) Task 8021927435 Name blc35_2bit_guppi_58643_82791_HIP35821_0120.30609.818.23.46.89.vlar_0 Workunit 3639746008 Created 4 Sep 2019, 16:45:55 UTC Sent 4 Sep 2019, 22:56:35 UTC Report deadline 28 Oct 2019, 3:56:17 UTC Received 4 Sep 2019, 22:58:04 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 8795938 Run time 11 sec CPU time 8 sec Validate state Initial Credit 0.00 Device peak FLOPS 112.64 GFLOPS Application version SETI@home v8 v8.22 (opencl_ati5_nocal) windows_intelx86 Peak working set size 93.35 MB Peak swap size 81.73 MB Peak disk usage 0.01 MB Stderr output <core_client_version>7.14.2</core_client_version> <![CDATA[ <stderr_txt> Running on device number: 0 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: Advanced Micro Devices, Inc. BOINC assigns device 0 0 slot of 64 used for this instance Info: BOINC provided OpenCL device ID used Info: CPU affinity mask used: 1; system mask is ffffff Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY OCL_CHIRP3 FFTW AMD specific USE_SSE2 x86 CPUID: AMD Ryzen 9 3900X 12-Core Processor Cache: L1=64K L2=512K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX SSE4A OpenCL-kernels filename : MultiBeam_Kernels_r3584.cl ar=0.007156 NumCfft=113991 NumGauss=0 NumPulse=44570346368 NumTriplet=57539513504 Currently allocated 185 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Windows optimized setiathome_v8 application Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan SSE2xj Win32 Build 3584 , Ported by : Raistmer, JDWhale SETI8 update by Raistmer OpenCL version by Raistmer, r3584 AMD HD5 version by Raistmer Number of OpenCL platforms: 1 OpenCL Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Max compute units: 18 Max work group size: 256 Max clock frequency: 1625Mhz Max memory allocation: 3221225472 Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 3221225472 Constant buffer size: 3221225472 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Queue properties: Out-of-Order: No Name: gfx1010 Vendor: Advanced Micro Devices, Inc. Driver version: 2906.10 (PAL,LC) Version: OpenCL 1.2 AMD-APP (2906.10) Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_gl_event cl_amd_liquid_flash cl_amd_copy_buffer_p2p Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.007156 Used GPU device parameters are: Number of compute units: 18 Single buffer allocation size: 128MB Total device global memory: 3072MB max WG size: 256 local mem type: Real LotOfMem path: no LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50 Spike: peak=25.92781, time=26.93, d_freq=2067034775.02, chirp=0, fft_len=2k Spike: peak=26.86989, time=28.54, d_freq=2067037546.63, chirp=0, fft_len=2k Spike: peak=26.08467, time=36.42, d_freq=2067031383.14, chirp=0, fft_len=2k Spike: peak=24.27416, time=37.67, d_freq=2067031556.37, chirp=0, fft_len=2k Spike: peak=25.10714, time=46.26, d_freq=2067028510.94, chirp=0, fft_len=2k Spike: peak=25.66345, time=51.99, d_freq=2067034283.28, chirp=0, fft_len=2k Spike: peak=24.48245, time=52.7, d_freq=2067039513.59, chirp=0, fft_len=2k Spike: peak=24.8561, time=54.31, d_freq=2067034327.98, chirp=0, fft_len=2k Spike: peak=32.07286, time=60.04, d_freq=2067034193.87, chirp=0, fft_len=2k Spike: peak=24.13826, time=62.9, d_freq=2067034193.87, chirp=0, fft_len=2k Spike: peak=25.58695, time=63.8, d_freq=2067028427.12, chirp=0, fft_len=2k Spike: peak=25.65657, time=67.2, d_freq=2067028740.05, chirp=0, fft_len=2k Spike: peak=26.55581, time=68.09, d_freq=2067037418.11, chirp=0, fft_len=2k Spike: peak=25.71747, time=69.35, d_freq=2067037501.93, chirp=0, fft_len=2k Spike: peak=24.31979, time=86.17, d_freq=2067033836.25, chirp=0, fft_len=2k Spike: peak=26.3004, time=0.5369, d_freq=2067034193.87, chirp=0, fft_len=4k Spike: peak=24.47003, time=1.611, d_freq=2067028642.26, chirp=0, fft_len=4k Spike: peak=25.57119, time=3.4, d_freq=2067031740.77, chirp=0, fft_len=4k Spike: peak=27.56359, time=3.758, d_freq=2067031777.09, chirp=0, fft_len=4k Spike: peak=26.07256, time=5.548, d_freq=2067039826.51, chirp=0, fft_len=4k Spike: peak=24.59409, time=7.337, d_freq=2067031461.37, chirp=0, fft_len=4k Spike: peak=25.85529, time=9.485, d_freq=2067031550.78, chirp=0, fft_len=4k Spike: peak=24.74949, time=11.63, d_freq=2067034280.49, chirp=0, fft_len=4k Spike: peak=26.53337, time=13.42, d_freq=2067039779.01, chirp=0, fft_len=4k Spike: peak=29.69875, time=14.85, d_freq=2067031338.44, chirp=0, fft_len=4k Spike: peak=29.9281, time=16.64, d_freq=2067039784.6, chirp=0, fft_len=4k Spike: peak=29.3638, time=17.72, d_freq=2067034199.46, chirp=0, fft_len=4k Spike: peak=26.35707, time=18.79, d_freq=2067039868.42, chirp=0, fft_len=4k Spike: peak=24.2139, time=21.3, d_freq=2067033970.36, chirp=0, fft_len=4k Spike: peak=27.34326, time=22.01, d_freq=2067033917.27, chirp=0, fft_len=4k OpenCL queue synchronized SETI@Home Informational message -9 result_overflow NOTE: The number of results detected equals the storage space allocated. Best spike: peak=32.07286, time=60.04, d_freq=2067034193.87, chirp=0, fft_len=2k Best autocorr: peak=0, time=-2.124e+011, delay=0, d_freq=0, chirp=0, fft_len=0 Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+011, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=0.6354929, time=45.82, period=0.6218, d_freq=2067032897.47, score=0.8903, chirp=0, fft_len=64 Best triplet: peak=0, time=-2.124e+011, period=0, d_freq=0, chirp=0, fft_len=0 Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 Wallclock time elapsed since last restart: 7.0 seconds Fftlength=32,pass=3:Tune: sum=16.7263(ms); min=16.73(ms); max=16.73(ms); mean=16.73(ms); s_mean=16.73; sleep=15(ms); delta=110; N=1; usual Fftlength=64,pass=3:Tune: sum=216.164(ms); min=2.943(ms); max=7.725(ms); mean=6.755(ms); s_mean=5.449; sleep=0(ms); delta=103; N=32; usual Fftlength=64,pass=4:Tune: sum=83.1424(ms); min=3.473(ms); max=6.462(ms); mean=5.939(ms); s_mean=5.975; sleep=0(ms); delta=85; N=14; usual Fftlength=128,pass=3:Tune: sum=61.9831(ms); min=1.387(ms); max=4.058(ms); mean=3.444(ms); s_mean=2.894; sleep=0(ms); delta=89; N=18; usual Fftlength=128,pass=4:Tune: sum=36.6409(ms); min=1.971(ms); max=4.156(ms); mean=3.331(ms); s_mean=3.038; sleep=0(ms); delta=118; N=11; usual Fftlength=128,pass=5:Tune: sum=18.7126(ms); min=1.112(ms); max=2.713(ms); mean=2.339(ms); s_mean=2.173; sleep=0(ms); delta=151; N=8; usual Fftlength=256,pass=3:Tune: sum=19.6036(ms); min=1.739(ms); max=2.416(ms); mean=2.178(ms); s_mean=2.139; sleep=0(ms); delta=80; N=9; usual Fftlength=256,pass=4:Tune: sum=17.5097(ms); min=0.5999(ms); max=2.433(ms); mean=1.946(ms); s_mean=1.698; sleep=0(ms); delta=80; N=9; usual Fftlength=256,pass=5:Tune: sum=9.6746(ms); min=0.5409(ms); max=1.414(ms); mean=1.209(ms); s_mean=1.116; sleep=0(ms); delta=79; N=8; usual Fftlength=512,pass=3:Tune: sum=8.37276(ms); min=0.4706(ms); max=1.102(ms); mean=0.9303(ms); s_mean=0.851; sleep=0(ms); delta=44; N=9; usual Fftlength=512,pass=4:Tune: sum=7.25688(ms); min=0.5584(ms); max=1.07(ms); mean=0.9071(ms); s_mean=0.8474; sleep=0(ms); delta=43; N=8; usual Fftlength=512,pass=5:Tune: sum=4.4568(ms); min=0.465(ms); max=0.7165(ms); mean=0.6367(ms); s_mean=0.6359; sleep=0(ms); delta=42; N=7; usual Fftlength=1024,pass=3:Tune: sum=4.11368(ms); min=0.3851(ms); max=0.5703(ms); mean=0.5142(ms); s_mean=0.5115; sleep=0(ms); delta=25; N=8; usual Fftlength=1024,pass=4:Tune: sum=3.30268(ms); min=0.1552(ms); max=0.4931(ms); mean=0.4128(ms); s_mean=0.3729; sleep=0(ms); delta=25; N=8; usual Fftlength=1024,pass=5:Tune: sum=2.01672(ms); min=0.1594(ms); max=0.3435(ms); mean=0.2881(ms); s_mean=0.2724; sleep=0(ms); delta=24; N=7; usual Fftlength=2048,pass=3:Tune: sum=1.383(ms); min=1.383(ms); max=1.383(ms); mean=1.383(ms); s_mean=1.383; sleep=0(ms); delta=1; N=1; high_perf class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0 class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0 class Gaussian_report: total=0, N=0, <>=0, min=0 max=0 class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0 class PC_triplet_find_hit: total=4, N=4, <>=1, min=1 max=1 class PC_triplet_find_miss: total=4, N=4, <>=1, min=1 max=1 class PC_pulse_find_hit: total=5, N=5, <>=1, min=1 max=1 class PC_pulse_find_miss: total=2, N=2, <>=1, min=1 max=1 class PC_pulse_find_early_miss: total=2, N=2, <>=1, min=1 max=1 class PC_pulse_find_2CPU: total=0, N=0, <>=0, min=0 max=0 class PoT_transfer_not_needed: total=4, N=4, <>=1, min=1 max=1 class PoT_transfer_needed: total=5, N=5, <>=1, min=1 max=1 class SleepQuantum: total=0, N=0, <>=0, min=0 max=0 GPU device sync requested... ...GPU device synched 18:57:39 (1936): called boinc_finish(0) </stderr_txt> ]]> Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 2010772 · Reply Quote

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 2010780 - Posted: 5 Sep 2019, 16:57:49 UTC - in response to Message 2010772. Well, it overflows almost immediately - zero chirp. Maybe, very first FFT was bad one. Spike search performed right after FFT. And on zero chirp even de-chirping kernel shouldnotaffect result. Plain FFT and comparison of bin's power with threshold. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 2010780 · Reply Quote

Bluerazor Send message Joined: 22 May 99 Posts: 15 Credit: 3,889,427 RAC: 12	Message 2010785 - Posted: 5 Sep 2019, 17:50:10 UTC Last modified: 5 Sep 2019, 17:51:10 UTC I had previously inspected results that ended up invalid and seemed really fast, and basically it was the same thing... near immediate overflow, on every task, regardless. The card never ran any task for more than like 15sec, usually 11. And so of course I blocked it from computing in order to avoid further pollution. I just unblocked it briefly yesterday to see if the drivers worked, then shut it back down. I had also compared one of my previous results that went invalid to the canonical result, and sure enough these were not supposed to overflow. Also, thanks for the tip on also posting task and computer. Hopefully that won't be necessary - normally I wouldn't really be doing this - assuming it was just an individual problem - but it seemed from other posts/threads that this is consistent for everyone with the same card. ID: 2010785 · Reply Quote

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 2010786 - Posted: 5 Sep 2019, 17:55:34 UTC I wonder if it's to do with the changes to the compute units in the RX 5000 series. For the tech buffs: The RDNA Compute Unit sees the bulk of AMD's innovation. Groups of two CUs make a "Dual Compute Unit" that share a scalar data cahe, shader instruction cache, and a local data share. Each CU is now split between two SIMD units of 32 stream processors, a vector register, and a scalar unit, each. This way, AMD doubled the number of scalar units on the silicon to 80, double the CU count. Each scalar unit is similar in concept to a CPU core, and is designed to handle heavy scalar indivisible workloads. Each SIMD unit has its own scheduler. Four TMUs are part of each CU. This massive redesign in SIMD and CU hierarchy achieves a doubling in scalar- and vector instruction rates, and resource pooling between every two adjacent CUs. The bulk of AMD's engineering effort with RDNA has been to increase the number of dedicated resources to avoid starvation by fewer components waiting for access to a resource. The "Navi 10" silicon has two Shader Engines sharing a centralized Command Processor that distributes workloads, a Geometry Processor, and ACEs (asynchronous compute engines). Each Shader Engine is further divided into two Graphics Engines. A graphics engine shares render backends, a Rasterizer, and a Prim Unit among five Workgroup Processors. This is where the core of RDNA begins. AMD figured it could merge two compute units (CUs) to share schedulers, scalar units, a data-share, instruction and data caches, and TMUs. The Workgroup Processor, or "dual-compute unit" as shown in the architecture block diagram, is for all intents and purposes indivisible, in that individual CUs cannot be disabled. An RDNA compute unit packs 64 stream processors for vector operations and double the number of scalar units for localized serial processing. The stream processors in a CU are split into groups of two, each equipped with a scalar unit. According to AMD, this greatly reduces latency and improves the overall IPC of the compute unit. It also more efficiently utilizes local caches. The vector execution units, or stream processors, is where much of the GPU's parallel processing happens. Due to the redesigned compute unit, two scalar processors pull two SIMD32 vector units made up of 32 stream processors, each, instead of a single scalar processor pulling four SIMD16 vector units. How is this important? On GCN, the way SIMD units are laid out, all items in a Wave64 operation get to do work once every four clocks due to hardware interleaving. With RDNA, Wave32 work items can do work every clock cycle. In all, RDNA minimizes wasted clock cycles by more efficiently and uniformly utilizing the hardware resources. AMD examined previous generations of its graphics architecture to locate bottlenecks in the graphics pipeline. Besides increasing the number of dedicated resources, the company reworked the chip's cache hierarchy by cushioning data transfers at various stages. Each workgroup processor has dedicated 32 KB instruction and 16 KB data caches, which write back to a 128 KB L1 cache dedicated to each Graphics Engine. These L1 caches talk to 4 MB of L2 cache. The introduction of the L1 cache and doubling in bandwidth between the various caches contributes greatly to IPC as it minimizes memory accesses, which are much slower than cache accesses. AMD is also using faster (lower latency) SRAM that reduces cache latencies by around 20 percent on die and by 8 percent at the memory level. AMD also introduced new features to the ACEs that include async-compute tunneling. Source: https://www.techpowerup.com/review/amd-radeon-rx-5700-xt/2.html ID: 2010786 · Reply Quote

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 2011295 - Posted: 8 Sep 2019, 22:05:10 UTC - in response to Message 2010786. Last modified: 8 Sep 2019, 22:05:58 UTC Well, at least on OpenCL runtime level this shouldn't matter. It operates logical entities like CU, queue and work-item w/o knowledge of their implementation in hardware. Driver does though. So it seems AMD driver doesn't understand AMD hardware well enough. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 2011295 · Reply Quote

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 2015690 - Posted: 17 Oct 2019, 7:13:28 UTC How much longer is SETI going to allow it's Database to be polluted by cross-validating Hosts producing Incorrect results? This is continuous, everyday, All day. False results being entered into the database. https://setiathome.berkeley.edu/results.php?hostid=8826743&state=4 https://setiathome.berkeley.edu/results.php?hostid=8772813&state=4 https://setiathome.berkeley.edu/results.php?hostid=8828658&state=4 https://setiathome.berkeley.edu/results.php?hostid=8831881&state=4 https://setiathome.berkeley.edu/results.php?hostid=6692170&state=4 https://setiathome.berkeley.edu/results.php?hostid=8830944&state=4 https://setiathome.berkeley.edu/results.php?hostid=8550813&state=4 https://setiathome.berkeley.edu/results.php?hostid=8807116&state=4 https://setiathome.berkeley.edu/results.php?hostid=6168316&state=4 https://setiathome.berkeley.edu/results.php?hostid=8821720&state=4 Etc... Etc..... ETC...... ID: 2015690 · Reply Quote

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 2016129 - Posted: 21 Oct 2019, 5:31:43 UTC Last modified: 21 Oct 2019, 5:32:52 UTC Well, look at that... seems the new AMD RX 5700 actually works running the OpenCL App in MacOS Catalina. I believe he's running a Hackintosh though, https://setiathome.berkeley.edu/results.php?hostid=8592369&offset=100 It appears a day ago he was running a GTX 1080Ti on the OpenCL App in High Sierra....has to be a Hackintosh. Seems the RX5700 is about as fast as the GTX1080 was running the OpenCL Apps. To put it in perspective, My 5+ year old 750Ti running the CUDA Special App is faster than both of them, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=1160...frightening ;-) ID: 2016129 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2017197 - Posted: 30 Oct 2019, 10:44:57 UTC Just got mugged by a couple of RX 5700 XTs Grant Darwin NT ID: 2017197 · Reply Quote

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 2017244 - Posted: 30 Oct 2019, 17:38:57 UTC I feel your pain :-( Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 2017244 · Reply Quote

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 2017249 - Posted: 30 Oct 2019, 18:05:28 UTC been happening to me a lot also. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 2017249 · Reply Quote

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.