Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 20 · Next

AuthorMessage
Bluerazor

Send message
Joined: 22 May 99
Posts: 15
Credit: 3,889,427
RAC: 12
United States
Message 2010688 - Posted: 4 Sep 2019, 23:07:07 UTC

Unfortunately today's 19.9.1 drivers do not fix the situation. Also, the problem will be very acute because the RX 5700 "finishes" each task in 11 seconds before reporting them as overflow. Naturally, I aborted unfinished GPU tasks and blocked the GPU to prevent further problems.

Do you need the actual stderr copied or just links?? Here are some results that unfortunately are bound to come out invalid:
https://setiathome.berkeley.edu/result.php?resultid=8021927435
https://setiathome.berkeley.edu/result.php?resultid=8021927356
https://setiathome.berkeley.edu/result.php?resultid=8021927647
https://setiathome.berkeley.edu/result.php?resultid=8021927677

As for testing, I have the broken HW, but I will be unavailable for a little while so I can't help out in the short term. If there is a proposed fix or a way of doing better troubleshooting I would be willing to do some testing.
ID: 2010688 · Report as offensive     Reply Quote
elec999 Project Donor

Send message
Joined: 24 Nov 02
Posts: 375
Credit: 416,969,548
RAC: 141
Canada
Message 2010691 - Posted: 4 Sep 2019, 23:29:45 UTC

I recently purchased one of these cards 5700XT. Guess I'll return it and stick to Nvidia.
ID: 2010691 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2010720 - Posted: 5 Sep 2019, 7:04:40 UTC - in response to Message 2010648.  

And did someone thread about it on AMD OpenCL forums?
Anyone with ability to do offline testing and possession of such "broken" hardware+software?

Phoronix did testing and reviews of the RX 5700XT and could not get the card and drivers to pass the OpenCL parts of their standardized test suite.

Thanks. Shame on AMD . Natural dunces :/
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2010720 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2010724 - Posted: 5 Sep 2019, 8:31:39 UTC

https://community.amd.com/message/2928820
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2010724 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 2010728 - Posted: 5 Sep 2019, 9:58:51 UTC


Number of devices: 1
Max compute units: 18
Max work group size: 256
Max clock frequency: 1625Mhz
Max memory allocation: 3221225472
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3221225472
Constant buffer size: 3221225472
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Queue properties:
Out-of-Order: No
Name: gfx1010
Vendor: Advanced Micro Devices, Inc.
Driver version: 2906.10 (PAL,LC)
Version: OpenCL 1.2 AMD-APP (2906.10)
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_gl_event cl_amd_liquid_flash cl_amd_copy_buffer_p2p


seems not to have command line options, autodetect only 18 of 36 CU of the 5700 non XT ?

ID: 2010728 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2010731 - Posted: 5 Sep 2019, 10:19:31 UTC - in response to Message 2010728.  

What is the computer number, what is the task number?
Without these basic bits of information what you have just posted is fairly meaningless...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2010731 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2010739 - Posted: 5 Sep 2019, 11:20:41 UTC

I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD.
ID: 2010739 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2010741 - Posted: 5 Sep 2019, 11:37:29 UTC - in response to Message 2010739.  

I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD.

+++
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2010741 · Report as offensive     Reply Quote
Bluerazor

Send message
Joined: 22 May 99
Posts: 15
Credit: 3,889,427
RAC: 12
United States
Message 2010767 - Posted: 5 Sep 2019, 15:59:07 UTC - in response to Message 2010741.  

Here's what I got from that utility, for what it's worth. Apologies about any followup questions, as I will be unable to respond much until Tuesday or so. Note ... the 19.9.1 drivers fixed a crash bug with the card, and the card was crashing my PC with some regularity (per Windows Event Logs), so if that level of bug still exists in the drivers, I am not too surprised they haven't gotten to the OpenCL issue. Disappointed, yes, but not surprised.

I am running a fully patched Windows 10 system with the 19.9.1 driver that released yesterday, and I have no overclocks on the card. I would be happy to go post to the AMD forums myself with this (next week).

Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (2906.10)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
Platform Host timer resolution 100ns
Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx1010
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0 AMD-APP (2906.10)
Driver Version 2906.10 (PAL,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) AMD Radeon RX 5700
Device Topology (AMD) PCI-E, 2f:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 18
SIMD per compute unit (AMD) 2
SIMD width (AMD) 32
SIMD instruction width (AMD) 1
Max clock frequency 1625MHz
Graphics IP (AMD) 10.10
Device Partition (core)
Max number of sub-devices 18
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple 32
Wavefront width (AMD) 32
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 8573157376 (7.984GiB)
Global free memory (AMD) 8306688 (7.922GiB)
Global memory channels (AMD) 8
Global memory banks per channel (AMD) 4
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 4244635648 (3.953GiB)
Unified memory for Host and Device No
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 3820172032 (3.558GiB)
Preferred total size of global vars 8573157376 (7.984GiB)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 64
Max number of read/write image args 64
Max number of pipe args 16
Max active pipe reservations 16
Max pipe packet size 4244635648 (3.953GiB)
Local memory type Local
Local memory size 65536 (64KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 4244635648 (3.953GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 262144 (256KiB)
Max size 8388608 (8MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Number of P2P devices (AMD) 0
P2P devices (AMD) (n/a)
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1567638949549610000ns (Wed Sep 04 19:15:49 2019)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 2
Max real-time compute queues (AMD) 3
Max real-time compute units (AMD) 8
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_copy_buffer_p2p cl_amd_planar_yuv

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx1010
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx1010
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx1010
ID: 2010767 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2010770 - Posted: 5 Sep 2019, 16:11:43 UTC - in response to Message 2010739.  

I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD.

You don't need to build clinfo for Debian or Ubuntu. It is standard in the distros. Just install it.
sudo apt install clinfo

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2010770 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2010772 - Posted: 5 Sep 2019, 16:21:25 UTC - in response to Message 2010767.  
Last modified: 5 Sep 2019, 16:23:01 UTC

When posting results as you have it is good practice to include the task number and computer id - this allows others to quickly see if it is "once off" event, or one that is repeating. In the example below I grabbed one of yours from your "pending" pils as it will probably be around for a bit longer than one in the error or invalid lists:

(A quick glance at this result suggests to me that this one is going to end up in the "invalid" list eventually - there are a lot of signals detected, and the run-time was very short, also it has exited with exit state=9)

Task 8021927435
Name blc35_2bit_guppi_58643_82791_HIP35821_0120.30609.818.23.46.89.vlar_0
Workunit 3639746008
Created 4 Sep 2019, 16:45:55 UTC
Sent 4 Sep 2019, 22:56:35 UTC
Report deadline 28 Oct 2019, 3:56:17 UTC
Received 4 Sep 2019, 22:58:04 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 8795938
Run time 11 sec
CPU time 8 sec
Validate state Initial
Credit 0.00
Device peak FLOPS 112.64 GFLOPS
Application version SETI@home v8 v8.22 (opencl_ati5_nocal)
windows_intelx86
Peak working set size 93.35 MB
Peak swap size 81.73 MB
Peak disk usage 0.01 MB
Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
0 slot of 64 used for this instance
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 1; system mask is ffffff

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY OCL_CHIRP3 FFTW AMD specific USE_SSE2 x86
CPUID: AMD Ryzen 9 3900X 12-Core Processor

Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX SSE4A
OpenCL-kernels filename : MultiBeam_Kernels_r3584.cl
ar=0.007156 NumCfft=113991 NumGauss=0 NumPulse=44570346368 NumTriplet=57539513504
Currently allocated 185 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768

Windows optimized setiathome_v8 application
Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan
SSE2xj Win32 Build 3584 , Ported by : Raistmer, JDWhale

SETI8 update by Raistmer

OpenCL version by Raistmer, r3584

AMD HD5 version by Raistmer

Number of OpenCL platforms: 1

OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Max compute units: 18
Max work group size: 256
Max clock frequency: 1625Mhz
Max memory allocation: 3221225472
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3221225472
Constant buffer size: 3221225472
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Queue properties:
Out-of-Order: No
Name: gfx1010
Vendor: Advanced Micro Devices, Inc.
Driver version: 2906.10 (PAL,LC)
Version: OpenCL 1.2 AMD-APP (2906.10)
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_gl_event cl_amd_liquid_flash cl_amd_copy_buffer_p2p

Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.007156
Used GPU device parameters are:
Number of compute units: 18
Single buffer allocation size: 128MB
Total device global memory: 3072MB
max WG size: 256
local mem type: Real
LotOfMem path: no
LowPerformanceGPU path: no
HighPerformanceGPU path: no
period_iterations_num=50
Spike: peak=25.92781, time=26.93, d_freq=2067034775.02, chirp=0, fft_len=2k
Spike: peak=26.86989, time=28.54, d_freq=2067037546.63, chirp=0, fft_len=2k
Spike: peak=26.08467, time=36.42, d_freq=2067031383.14, chirp=0, fft_len=2k
Spike: peak=24.27416, time=37.67, d_freq=2067031556.37, chirp=0, fft_len=2k
Spike: peak=25.10714, time=46.26, d_freq=2067028510.94, chirp=0, fft_len=2k
Spike: peak=25.66345, time=51.99, d_freq=2067034283.28, chirp=0, fft_len=2k
Spike: peak=24.48245, time=52.7, d_freq=2067039513.59, chirp=0, fft_len=2k
Spike: peak=24.8561, time=54.31, d_freq=2067034327.98, chirp=0, fft_len=2k
Spike: peak=32.07286, time=60.04, d_freq=2067034193.87, chirp=0, fft_len=2k
Spike: peak=24.13826, time=62.9, d_freq=2067034193.87, chirp=0, fft_len=2k
Spike: peak=25.58695, time=63.8, d_freq=2067028427.12, chirp=0, fft_len=2k
Spike: peak=25.65657, time=67.2, d_freq=2067028740.05, chirp=0, fft_len=2k
Spike: peak=26.55581, time=68.09, d_freq=2067037418.11, chirp=0, fft_len=2k
Spike: peak=25.71747, time=69.35, d_freq=2067037501.93, chirp=0, fft_len=2k
Spike: peak=24.31979, time=86.17, d_freq=2067033836.25, chirp=0, fft_len=2k
Spike: peak=26.3004, time=0.5369, d_freq=2067034193.87, chirp=0, fft_len=4k
Spike: peak=24.47003, time=1.611, d_freq=2067028642.26, chirp=0, fft_len=4k
Spike: peak=25.57119, time=3.4, d_freq=2067031740.77, chirp=0, fft_len=4k
Spike: peak=27.56359, time=3.758, d_freq=2067031777.09, chirp=0, fft_len=4k
Spike: peak=26.07256, time=5.548, d_freq=2067039826.51, chirp=0, fft_len=4k
Spike: peak=24.59409, time=7.337, d_freq=2067031461.37, chirp=0, fft_len=4k
Spike: peak=25.85529, time=9.485, d_freq=2067031550.78, chirp=0, fft_len=4k
Spike: peak=24.74949, time=11.63, d_freq=2067034280.49, chirp=0, fft_len=4k
Spike: peak=26.53337, time=13.42, d_freq=2067039779.01, chirp=0, fft_len=4k
Spike: peak=29.69875, time=14.85, d_freq=2067031338.44, chirp=0, fft_len=4k
Spike: peak=29.9281, time=16.64, d_freq=2067039784.6, chirp=0, fft_len=4k
Spike: peak=29.3638, time=17.72, d_freq=2067034199.46, chirp=0, fft_len=4k
Spike: peak=26.35707, time=18.79, d_freq=2067039868.42, chirp=0, fft_len=4k
Spike: peak=24.2139, time=21.3, d_freq=2067033970.36, chirp=0, fft_len=4k
Spike: peak=27.34326, time=22.01, d_freq=2067033917.27, chirp=0, fft_len=4k
OpenCL queue synchronized
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.


Best spike: peak=32.07286, time=60.04, d_freq=2067034193.87, chirp=0, fft_len=2k
Best autocorr: peak=0, time=-2.124e+011, delay=0, d_freq=0, chirp=0, fft_len=0
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+011, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=0.6354929, time=45.82, period=0.6218, d_freq=2067032897.47, score=0.8903, chirp=0, fft_len=64
Best triplet: peak=0, time=-2.124e+011, period=0, d_freq=0, chirp=0, fft_len=0
Spike count: 30
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0
Wallclock time elapsed since last restart: 7.0 seconds
Fftlength=32,pass=3:Tune: sum=16.7263(ms); min=16.73(ms); max=16.73(ms); mean=16.73(ms); s_mean=16.73; sleep=15(ms); delta=110; N=1; usual
Fftlength=64,pass=3:Tune: sum=216.164(ms); min=2.943(ms); max=7.725(ms); mean=6.755(ms); s_mean=5.449; sleep=0(ms); delta=103; N=32; usual
Fftlength=64,pass=4:Tune: sum=83.1424(ms); min=3.473(ms); max=6.462(ms); mean=5.939(ms); s_mean=5.975; sleep=0(ms); delta=85; N=14; usual
Fftlength=128,pass=3:Tune: sum=61.9831(ms); min=1.387(ms); max=4.058(ms); mean=3.444(ms); s_mean=2.894; sleep=0(ms); delta=89; N=18; usual
Fftlength=128,pass=4:Tune: sum=36.6409(ms); min=1.971(ms); max=4.156(ms); mean=3.331(ms); s_mean=3.038; sleep=0(ms); delta=118; N=11; usual
Fftlength=128,pass=5:Tune: sum=18.7126(ms); min=1.112(ms); max=2.713(ms); mean=2.339(ms); s_mean=2.173; sleep=0(ms); delta=151; N=8; usual
Fftlength=256,pass=3:Tune: sum=19.6036(ms); min=1.739(ms); max=2.416(ms); mean=2.178(ms); s_mean=2.139; sleep=0(ms); delta=80; N=9; usual
Fftlength=256,pass=4:Tune: sum=17.5097(ms); min=0.5999(ms); max=2.433(ms); mean=1.946(ms); s_mean=1.698; sleep=0(ms); delta=80; N=9; usual
Fftlength=256,pass=5:Tune: sum=9.6746(ms); min=0.5409(ms); max=1.414(ms); mean=1.209(ms); s_mean=1.116; sleep=0(ms); delta=79; N=8; usual
Fftlength=512,pass=3:Tune: sum=8.37276(ms); min=0.4706(ms); max=1.102(ms); mean=0.9303(ms); s_mean=0.851; sleep=0(ms); delta=44; N=9; usual
Fftlength=512,pass=4:Tune: sum=7.25688(ms); min=0.5584(ms); max=1.07(ms); mean=0.9071(ms); s_mean=0.8474; sleep=0(ms); delta=43; N=8; usual
Fftlength=512,pass=5:Tune: sum=4.4568(ms); min=0.465(ms); max=0.7165(ms); mean=0.6367(ms); s_mean=0.6359; sleep=0(ms); delta=42; N=7; usual
Fftlength=1024,pass=3:Tune: sum=4.11368(ms); min=0.3851(ms); max=0.5703(ms); mean=0.5142(ms); s_mean=0.5115; sleep=0(ms); delta=25; N=8; usual
Fftlength=1024,pass=4:Tune: sum=3.30268(ms); min=0.1552(ms); max=0.4931(ms); mean=0.4128(ms); s_mean=0.3729; sleep=0(ms); delta=25; N=8; usual
Fftlength=1024,pass=5:Tune: sum=2.01672(ms); min=0.1594(ms); max=0.3435(ms); mean=0.2881(ms); s_mean=0.2724; sleep=0(ms); delta=24; N=7; usual
Fftlength=2048,pass=3:Tune: sum=1.383(ms); min=1.383(ms); max=1.383(ms); mean=1.383(ms); s_mean=1.383; sleep=0(ms); delta=1; N=1; high_perf

class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0
class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0

class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0

class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0
class Gaussian_report: total=0, N=0, <>=0, min=0 max=0
class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0

class PC_triplet_find_hit: total=4, N=4, <>=1, min=1 max=1
class PC_triplet_find_miss: total=4, N=4, <>=1, min=1 max=1

class PC_pulse_find_hit: total=5, N=5, <>=1, min=1 max=1
class PC_pulse_find_miss: total=2, N=2, <>=1, min=1 max=1
class PC_pulse_find_early_miss: total=2, N=2, <>=1, min=1 max=1
class PC_pulse_find_2CPU: total=0, N=0, <>=0, min=0 max=0

class PoT_transfer_not_needed: total=4, N=4, <>=1, min=1 max=1
class PoT_transfer_needed: total=5, N=5, <>=1, min=1 max=1

class SleepQuantum: total=0, N=0, <>=0, min=0 max=0

GPU device sync requested... ...GPU device synched
18:57:39 (1936): called boinc_finish(0)

</stderr_txt>
]]>

Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2010772 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2010780 - Posted: 5 Sep 2019, 16:57:49 UTC - in response to Message 2010772.  

Well, it overflows almost immediately - zero chirp.
Maybe, very first FFT was bad one.
Spike search performed right after FFT. And on zero chirp even de-chirping kernel shouldnotaffect result.
Plain FFT and comparison of bin's power with threshold.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2010780 · Report as offensive     Reply Quote
Bluerazor

Send message
Joined: 22 May 99
Posts: 15
Credit: 3,889,427
RAC: 12
United States
Message 2010785 - Posted: 5 Sep 2019, 17:50:10 UTC
Last modified: 5 Sep 2019, 17:51:10 UTC

I had previously inspected results that ended up invalid and seemed really fast, and basically it was the same thing... near immediate overflow, on every task, regardless. The card never ran any task for more than like 15sec, usually 11. And so of course I blocked it from computing in order to avoid further pollution. I just unblocked it briefly yesterday to see if the drivers worked, then shut it back down. I had also compared one of my previous results that went invalid to the canonical result, and sure enough these were not supposed to overflow.

Also, thanks for the tip on also posting task and computer. Hopefully that won't be necessary - normally I wouldn't really be doing this - assuming it was just an individual problem - but it seemed from other posts/threads that this is consistent for everyone with the same card.
ID: 2010785 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 2010786 - Posted: 5 Sep 2019, 17:55:34 UTC

I wonder if it's to do with the changes to the compute units in the RX 5000 series. For the tech buffs:

The RDNA Compute Unit sees the bulk of AMD's innovation. Groups of two CUs make a "Dual Compute Unit" that share a scalar data cahe, shader instruction cache, and a local data share. Each CU is now split between two SIMD units of 32 stream processors, a vector register, and a scalar unit, each. This way, AMD doubled the number of scalar units on the silicon to 80, double the CU count. Each scalar unit is similar in concept to a CPU core, and is designed to handle heavy scalar indivisible workloads. Each SIMD unit has its own scheduler. Four TMUs are part of each CU. This massive redesign in SIMD and CU hierarchy achieves a doubling in scalar- and vector instruction rates, and resource pooling between every two adjacent CUs.

The bulk of AMD's engineering effort with RDNA has been to increase the number of dedicated resources to avoid starvation by fewer components waiting for access to a resource. The "Navi 10" silicon has two Shader Engines sharing a centralized Command Processor that distributes workloads, a Geometry Processor, and ACEs (asynchronous compute engines).

Each Shader Engine is further divided into two Graphics Engines. A graphics engine shares render backends, a Rasterizer, and a Prim Unit among five Workgroup Processors. This is where the core of RDNA begins. AMD figured it could merge two compute units (CUs) to share schedulers, scalar units, a data-share, instruction and data caches, and TMUs. The Workgroup Processor, or "dual-compute unit" as shown in the architecture block diagram, is for all intents and purposes indivisible, in that individual CUs cannot be disabled.

An RDNA compute unit packs 64 stream processors for vector operations and double the number of scalar units for localized serial processing. The stream processors in a CU are split into groups of two, each equipped with a scalar unit. According to AMD, this greatly reduces latency and improves the overall IPC of the compute unit. It also more efficiently utilizes local caches.

The vector execution units, or stream processors, is where much of the GPU's parallel processing happens. Due to the redesigned compute unit, two scalar processors pull two SIMD32 vector units made up of 32 stream processors, each, instead of a single scalar processor pulling four SIMD16 vector units. How is this important? On GCN, the way SIMD units are laid out, all items in a Wave64 operation get to do work once every four clocks due to hardware interleaving. With RDNA, Wave32 work items can do work every clock cycle. In all, RDNA minimizes wasted clock cycles by more efficiently and uniformly utilizing the hardware resources.

AMD examined previous generations of its graphics architecture to locate bottlenecks in the graphics pipeline. Besides increasing the number of dedicated resources, the company reworked the chip's cache hierarchy by cushioning data transfers at various stages. Each workgroup processor has dedicated 32 KB instruction and 16 KB data caches, which write back to a 128 KB L1 cache dedicated to each Graphics Engine.

These L1 caches talk to 4 MB of L2 cache. The introduction of the L1 cache and doubling in bandwidth between the various caches contributes greatly to IPC as it minimizes memory accesses, which are much slower than cache accesses. AMD is also using faster (lower latency) SRAM that reduces cache latencies by around 20 percent on die and by 8 percent at the memory level. AMD also introduced new features to the ACEs that include async-compute tunneling.


Source: https://www.techpowerup.com/review/amd-radeon-rx-5700-xt/2.html
ID: 2010786 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2011295 - Posted: 8 Sep 2019, 22:05:10 UTC - in response to Message 2010786.  
Last modified: 8 Sep 2019, 22:05:58 UTC

Well, at least on OpenCL runtime level this shouldn't matter.
It operates logical entities like CU, queue and work-item w/o knowledge of their implementation in hardware. Driver does though. So it seems AMD driver doesn't understand AMD hardware well enough.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2011295 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2015690 - Posted: 17 Oct 2019, 7:13:28 UTC

ID: 2015690 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2016129 - Posted: 21 Oct 2019, 5:31:43 UTC
Last modified: 21 Oct 2019, 5:32:52 UTC

Well, look at that... seems the new AMD RX 5700 actually works running the OpenCL App in MacOS Catalina. I believe he's running a Hackintosh though, https://setiathome.berkeley.edu/results.php?hostid=8592369&offset=100
It appears a day ago he was running a GTX 1080Ti on the OpenCL App in High Sierra....has to be a Hackintosh. Seems the RX5700 is about as fast as the GTX1080 was running the OpenCL Apps. To put it in perspective, My 5+ year old 750Ti running the CUDA Special App is faster than both of them, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=1160...frightening ;-)
ID: 2016129 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 2017197 - Posted: 30 Oct 2019, 10:44:57 UTC

Just got mugged by a couple of RX 5700 XTs
Grant
Darwin NT
ID: 2017197 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2017244 - Posted: 30 Oct 2019, 17:38:57 UTC

I feel your pain :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2017244 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017249 - Posted: 30 Oct 2019, 18:05:28 UTC

been happening to me a lot also.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017249 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 20 · Next

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.