Message boards :
Number crunching :
Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 20 · Next
Author | Message |
---|---|
Bluerazor Send message Joined: 22 May 99 Posts: 15 Credit: 3,889,427 RAC: 12 |
Unfortunately today's 19.9.1 drivers do not fix the situation. Also, the problem will be very acute because the RX 5700 "finishes" each task in 11 seconds before reporting them as overflow. Naturally, I aborted unfinished GPU tasks and blocked the GPU to prevent further problems. Do you need the actual stderr copied or just links?? Here are some results that unfortunately are bound to come out invalid: https://setiathome.berkeley.edu/result.php?resultid=8021927435 https://setiathome.berkeley.edu/result.php?resultid=8021927356 https://setiathome.berkeley.edu/result.php?resultid=8021927647 https://setiathome.berkeley.edu/result.php?resultid=8021927677 As for testing, I have the broken HW, but I will be unavailable for a little while so I can't help out in the short term. If there is a proposed fix or a way of doing better troubleshooting I would be willing to do some testing. |
elec999 Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 |
I recently purchased one of these cards 5700XT. Guess I'll return it and stick to Nvidia. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And did someone thread about it on AMD OpenCL forums? Thanks. Shame on AMD . Natural dunces :/ SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
https://community.amd.com/message/2928820 SETI apps news We're not gonna fight them. We're gonna transcend them. |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62 |
|
rob smith Send message Joined: 7 Mar 03 Posts: 22491 Credit: 416,307,556 RAC: 380 |
What is the computer number, what is the task number? Without these basic bits of information what you have just posted is fairly meaningless... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14676 Credit: 200,643,578 RAC: 874 |
I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD. +++ SETI apps news We're not gonna fight them. We're gonna transcend them. |
Bluerazor Send message Joined: 22 May 99 Posts: 15 Credit: 3,889,427 RAC: 12 |
Here's what I got from that utility, for what it's worth. Apologies about any followup questions, as I will be unable to respond much until Tuesday or so. Note ... the 19.9.1 drivers fixed a crash bug with the card, and the card was crashing my PC with some regularity (per Windows Event Logs), so if that level of bug still exists in the drivers, I am not too surprised they haven't gotten to the OpenCL issue. Disappointed, yes, but not surprised. I am running a fully patched Windows 10 system with the 19.9.1 driver that released yesterday, and I have no overclocks on the card. I would be happy to go post to the AMD forums myself with this (next week). Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.1 AMD-APP (2906.10) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices Platform Host timer resolution 100ns Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 1 Device Name gfx1010 Device Vendor Advanced Micro Devices, Inc. Device Vendor ID 0x1002 Device Version OpenCL 2.0 AMD-APP (2906.10) Driver Version 2906.10 (PAL,LC) Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Board Name (AMD) AMD Radeon RX 5700 Device Topology (AMD) PCI-E, 2f:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 18 SIMD per compute unit (AMD) 2 SIMD width (AMD) 32 SIMD instruction width (AMD) 1 Max clock frequency 1625MHz Graphics IP (AMD) 10.10 Device Partition (core) Max number of sub-devices 18 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 256 Preferred work group size (AMD) 256 Max work group size (AMD) 1024 Preferred work group size multiple 32 Wavefront width (AMD) 32 Preferred / native vector sizes char 4 / 4 short 2 / 2 int 1 / 1 long 1 / 1 half 1 / 1 (cl_khr_fp16) float 1 / 1 double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs No Round to nearest No Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 8573157376 (7.984GiB) Global free memory (AMD) 8306688 (7.922GiB) Global memory channels (AMD) 8 Global memory banks per channel (AMD) 4 Global memory bank width (AMD) 256 bytes Error Correction support No Max memory allocation 4244635648 (3.953GiB) Unified memory for Host and Device No Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing Yes Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 2048 bits (256 bytes) Preferred alignment for atomics SVM 0 bytes Global 0 bytes Local 0 bytes Max size for global variable 3820172032 (3.558GiB) Preferred total size of global vars 8573157376 (7.984GiB) Global Memory cache type Read/Write Global Memory cache size 16384 (16KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 134217728 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 256 bytes Pitch alignment for 2D image buffers 256 pixels Max 2D image size 16384x16384 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 64 Max number of read/write image args 64 Max number of pipe args 16 Max active pipe reservations 16 Max pipe packet size 4244635648 (3.953GiB) Local memory type Local Local memory size 65536 (64KiB) Local memory syze per CU (AMD) 65536 (64KiB) Local memory banks (AMD) 32 Max number of constant args 8 Max constant buffer size 4244635648 (3.953GiB) Preferred constant buffer size (AMD) 16384 (16KiB) Max size of kernel argument 1024 Queue properties (on host) Out-of-order execution No Profiling Yes Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 262144 (256KiB) Max size 8388608 (8MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop Yes Number of P2P devices (AMD) 0 P2P devices (AMD) (n/a) Profiling timer resolution 1ns Profiling timer offset since Epoch (AMD) 1567638949549610000ns (Wed Sep 04 19:15:49 2019) Execution capabilities Run OpenCL kernels Yes Run native kernels No Thread trace supported (AMD) Yes Number of async queues (AMD) 2 Max real-time compute queues (AMD) 3 Max real-time compute units (AMD) 8 printf() buffer size 4194304 (4MiB) Built-in kernels (n/a) Device Extensions cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_copy_buffer_p2p cl_amd_planar_yuv NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [AMD] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx1010 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx1010 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx1010 |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I'd suggest that somebody with a card and some programming experience grabs https://github.com/Oblomov/clinfo (Windows ready-built at foot of page: linux needs - I think - building from sources) and posts the output from that. It will carry far more weight with AMD. You don't need to build clinfo for Debian or Ubuntu. It is standard in the distros. Just install it. sudo apt install clinfo Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
rob smith Send message Joined: 7 Mar 03 Posts: 22491 Credit: 416,307,556 RAC: 380 |
When posting results as you have it is good practice to include the task number and computer id - this allows others to quickly see if it is "once off" event, or one that is repeating. In the example below I grabbed one of yours from your "pending" pils as it will probably be around for a bit longer than one in the error or invalid lists: (A quick glance at this result suggests to me that this one is going to end up in the "invalid" list eventually - there are a lot of signals detected, and the run-time was very short, also it has exited with exit state=9)
Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, it overflows almost immediately - zero chirp. Maybe, very first FFT was bad one. Spike search performed right after FFT. And on zero chirp even de-chirping kernel shouldnotaffect result. Plain FFT and comparison of bin's power with threshold. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Bluerazor Send message Joined: 22 May 99 Posts: 15 Credit: 3,889,427 RAC: 12 |
I had previously inspected results that ended up invalid and seemed really fast, and basically it was the same thing... near immediate overflow, on every task, regardless. The card never ran any task for more than like 15sec, usually 11. And so of course I blocked it from computing in order to avoid further pollution. I just unblocked it briefly yesterday to see if the drivers worked, then shut it back down. I had also compared one of my previous results that went invalid to the canonical result, and sure enough these were not supposed to overflow. Also, thanks for the tip on also posting task and computer. Hopefully that won't be necessary - normally I wouldn't really be doing this - assuming it was just an individual problem - but it seemed from other posts/threads that this is consistent for everyone with the same card. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I wonder if it's to do with the changes to the compute units in the RX 5000 series. For the tech buffs: The RDNA Compute Unit sees the bulk of AMD's innovation. Groups of two CUs make a "Dual Compute Unit" that share a scalar data cahe, shader instruction cache, and a local data share. Each CU is now split between two SIMD units of 32 stream processors, a vector register, and a scalar unit, each. This way, AMD doubled the number of scalar units on the silicon to 80, double the CU count. Each scalar unit is similar in concept to a CPU core, and is designed to handle heavy scalar indivisible workloads. Each SIMD unit has its own scheduler. Four TMUs are part of each CU. This massive redesign in SIMD and CU hierarchy achieves a doubling in scalar- and vector instruction rates, and resource pooling between every two adjacent CUs. The bulk of AMD's engineering effort with RDNA has been to increase the number of dedicated resources to avoid starvation by fewer components waiting for access to a resource. The "Navi 10" silicon has two Shader Engines sharing a centralized Command Processor that distributes workloads, a Geometry Processor, and ACEs (asynchronous compute engines). Each Shader Engine is further divided into two Graphics Engines. A graphics engine shares render backends, a Rasterizer, and a Prim Unit among five Workgroup Processors. This is where the core of RDNA begins. AMD figured it could merge two compute units (CUs) to share schedulers, scalar units, a data-share, instruction and data caches, and TMUs. The Workgroup Processor, or "dual-compute unit" as shown in the architecture block diagram, is for all intents and purposes indivisible, in that individual CUs cannot be disabled. An RDNA compute unit packs 64 stream processors for vector operations and double the number of scalar units for localized serial processing. The stream processors in a CU are split into groups of two, each equipped with a scalar unit. According to AMD, this greatly reduces latency and improves the overall IPC of the compute unit. It also more efficiently utilizes local caches. The vector execution units, or stream processors, is where much of the GPU's parallel processing happens. Due to the redesigned compute unit, two scalar processors pull two SIMD32 vector units made up of 32 stream processors, each, instead of a single scalar processor pulling four SIMD16 vector units. How is this important? On GCN, the way SIMD units are laid out, all items in a Wave64 operation get to do work once every four clocks due to hardware interleaving. With RDNA, Wave32 work items can do work every clock cycle. In all, RDNA minimizes wasted clock cycles by more efficiently and uniformly utilizing the hardware resources. AMD examined previous generations of its graphics architecture to locate bottlenecks in the graphics pipeline. Besides increasing the number of dedicated resources, the company reworked the chip's cache hierarchy by cushioning data transfers at various stages. Each workgroup processor has dedicated 32 KB instruction and 16 KB data caches, which write back to a 128 KB L1 cache dedicated to each Graphics Engine. These L1 caches talk to 4 MB of L2 cache. The introduction of the L1 cache and doubling in bandwidth between the various caches contributes greatly to IPC as it minimizes memory accesses, which are much slower than cache accesses. AMD is also using faster (lower latency) SRAM that reduces cache latencies by around 20 percent on die and by 8 percent at the memory level. AMD also introduced new features to the ACEs that include async-compute tunneling. Source: https://www.techpowerup.com/review/amd-radeon-rx-5700-xt/2.html |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, at least on OpenCL runtime level this shouldn't matter. It operates logical entities like CU, queue and work-item w/o knowledge of their implementation in hardware. Driver does though. So it seems AMD driver doesn't understand AMD hardware well enough. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
How much longer is SETI going to allow it's Database to be polluted by cross-validating Hosts producing Incorrect results? This is continuous, everyday, All day. False results being entered into the database. https://setiathome.berkeley.edu/results.php?hostid=8826743&state=4 https://setiathome.berkeley.edu/results.php?hostid=8772813&state=4 https://setiathome.berkeley.edu/results.php?hostid=8828658&state=4 https://setiathome.berkeley.edu/results.php?hostid=8831881&state=4 https://setiathome.berkeley.edu/results.php?hostid=6692170&state=4 https://setiathome.berkeley.edu/results.php?hostid=8830944&state=4 https://setiathome.berkeley.edu/results.php?hostid=8550813&state=4 https://setiathome.berkeley.edu/results.php?hostid=8807116&state=4 https://setiathome.berkeley.edu/results.php?hostid=6168316&state=4 https://setiathome.berkeley.edu/results.php?hostid=8821720&state=4 Etc... Etc..... ETC...... |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, look at that... seems the new AMD RX 5700 actually works running the OpenCL App in MacOS Catalina. I believe he's running a Hackintosh though, https://setiathome.berkeley.edu/results.php?hostid=8592369&offset=100 It appears a day ago he was running a GTX 1080Ti on the OpenCL App in High Sierra....has to be a Hackintosh. Seems the RX5700 is about as fast as the GTX1080 was running the OpenCL Apps. To put it in perspective, My 5+ year old 750Ti running the CUDA Special App is faster than both of them, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=1160...frightening ;-) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
|
rob smith Send message Joined: 7 Mar 03 Posts: 22491 Credit: 416,307,556 RAC: 380 |
I feel your pain :-( Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
been happening to me a lot also. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.