Message boards :
Number crunching :
Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 20 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
What I'd like to know is why Rob's 5700XT reporting the invalid tasks. The counts are way off from the canonical result. Is there still a problem? Or is Rob overclocking the card too far causing the invalids? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Or maybe temp related. If it’s an early reference design, they tended to run hot from what I remember of the early reviews. He’s still producing mostly valid results. So that’s good. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Got two invalids on RX 5700 with 8.24 beta, in case it is relevant The first result is missing a pulse, the second one has several too many. That's not nearly as bad as what was happening before. Pulse finding is one of the more stressful portions of the code. @SETIEric@qoto.org (Mastodon) |
Rob Send message Joined: 7 Apr 12 Posts: 9 Credit: 951,019 RAC: 0 |
Thanks, Eric! Appreciate the explanation. Glad to hear 🙂 |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
It would appear it's now time to follow up with the people who were producing bad results and let them know that the solution is at hand. As far as I know as 8.24 has gone to main, it will update automatically, and fixed drivers can be had at https://www.amd.com/en/support I don't think there's anything else I need to pass on, but please advise if otherwise. I'll give it a while and then start sending messages. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
You might state that they have to use the Beta drivers. The Windows WHQL drivers are too old. [Edit] Also maybe a friendly tip that the use of DDU is advised also when installing the new drivers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Paul Send message Joined: 17 May 99 Posts: 72 Credit: 42,977,964 RAC: 43 |
Long thread, trying to catch up. Sorry if repeat something. I just got my AMD GPU working again on SETI after many years MIA, also related to driver issues on OSS Linux. I came back to the forums to report and found this thread. 1) Is there are summary of this issue somewhere? 2) This was a problem with 5700 XT, exclusively, is that correct? No other cards are affected? 3) It affects all platforms, right? Linux too? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
2) This was a problem with 5700 XT, exclusively, is that correct? No other cards are affected?It affected all the 5000 series cards. There is now a new application, which when used with the latest driver release, appears to have fixed the problem. But both the new application and the new driver are required, just one or the other won't do it. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22445 Credit: 416,307,556 RAC: 380 |
Answer to 2) It affected all RX5xxx GPUs. To get an overview of the situation it is best to look for posts by Eric Korpela (https://setiathome.berkeley.edu/show_user.php?userid=24735) He's been working on a solution, which appears to be in two parts, first a new version of the drives, and second a new version of the application (8.24) The last "official" news was that the new application was in Beta test https://setiathome.berkeley.edu/forum_thread.php?id=84508&postid=2027496 but there is no confirmation that the application has been released here on main yet. Tthere were some server problems just after Eric posted that so it is quite possible that he didn't have time to do all the things required to release an application formally on the main site. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
1. A driver/software issue causing incorrect compute results to be produced from AMD Navi cards. These incorrect results would occasionally get compared to another Navi GPU, and be validated, and the correct result was discarded. 2. 5700 and 5700XT for sure. probably all Navi cards. but I'm not sure if anyone has identified any of the newer lower end cards yet. 3. Unsure, but probably. I don't think we found any/many people running these cards on SETI in Linux. almost all of them were on Windows. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
but there is no confirmation that the application has been released here on main yet. it was released on main https://setiathome.berkeley.edu/forum_thread.php?id=84983&postid=2027798 Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Wiggo Send message Joined: 24 Jan 00 Posts: 36345 Credit: 261,360,520 RAC: 489 |
And it does effect Linux as well with the RX5xxx series. Cheers. |
rob smith Send message Joined: 7 Mar 03 Posts: 22445 Credit: 416,307,556 RAC: 380 |
Thanks - I was expecting something from Eric, but that probably got lost in the mayhem of earlier this week. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
rob smith Send message Joined: 7 Mar 03 Posts: 22445 Credit: 416,307,556 RAC: 380 |
One thing I'm not clear about is, does version 8.24 depend on having the "correct" driver version. I've just had a look at some of my recent "valid" results, and one user with an AMD GPU is suffering from errors and is using version 8.24. https://setiathome.berkeley.edu/show_host_detail.php?hostid=8393099 One of the error tasks stderr (first few lines, it gets very repetitive....) Task 8451164462 Name 25mr13ab.29473.97828.6.33.211_1 Workunit 3843027716 Created 17 Jan 2020, 1:19:39 UTC Sent 17 Jan 2020, 5:04:18 UTC Report deadline 6 Feb 2020, 16:14:00 UTC Received 17 Jan 2020, 8:29:50 UTC Server state Over Outcome Computation error Client state Compute error Exit status -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS Computer ID 8393099 Run time 20 sec CPU time 17 sec Validate state Invalid Credit 0.00 Device peak FLOPS 6.26 GFLOPS Application version SETI@home v8 v8.24 (opencl_ati5_cat132) windows_intelx86 Peak working set size 71.28 MB Peak swap size 96.04 MB Peak disk usage 0.37 MB Stderr output <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> too many boinc_temporary_exit()s</message> <stderr_txt> compute units: 2 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 128 local mem type: Real LotOfMem path: no LowPerformanceGPU path: yes HighPerformanceGPU path: no period_iterations_num=500 ERROR: OpenCL kernel/call 'Enqueueing kernel:pc_triplet_find_cl' call failed (-54) in file ..\analyzePoT.cpp near line 1393. Waiting 30 sec before restart... Running on device number: 0 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: Intel(R) Corporation OpenCL platform detected: Advanced Micro Devices, Inc. BOINC assigns device 0 0 slot of 64 used for this instance Info: BOINC provided OpenCL device ID used Info: CPU affinity mask used: 1; system mask is ff Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY OCL_CHIRP3 FFTW AMD specific USE_SSE2 x86 CPUID: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX Low-performance GPU detected, default period_iterations_num set to 500 OpenCL-kernels filename : MultiBeam_Kernels_r3584.cl ar=2.594566 NumCfft=99685 NumGauss=0 NumPulse=28394048928 NumTriplet=28394048928 Currently allocated 185 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I believe it does indeed require the new driver as well. It’ll take some time, but eventually most people will update their drivers. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
One thing I'm not clear about is, does version 8.24 depend on having the "correct" driver version.That was my understanding. AMD released a driver that specifically addressed the issue, explicitly staying Seti@Home. And Eric posted that it required the application to be recompiled with the changed compiler flag. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Answer to 2) . . I'm pretty sure Eric posted that it was now in main. Stephen . . |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
And the list of applications at https://setiathome.berkeley.edu/apps.php for main shows windows 8.24 to be on main. _\|/_ U r s |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
And the list of applications at https://setiathome.berkeley.edu/apps.php for main shows windows 8.24 to be on main.I notice there isn't one for Linux there yet. Grant Darwin NT |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I still can't see if the (luckily only) nine tasks I ran on my RX 5700 XT have validated, because of the trouble with the whole back-end. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.