Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 20 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2027870 - Posted: 16 Jan 2020, 2:57:36 UTC

What I'd like to know is why Rob's 5700XT reporting the invalid tasks. The counts are way off from the canonical result. Is there still a problem? Or is Rob overclocking the card too far causing the invalids?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2027870 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2027875 - Posted: 16 Jan 2020, 3:28:05 UTC - in response to Message 2027870.  

Or maybe temp related. If it’s an early reference design, they tended to run hot from what I remember of the early reviews.

He’s still producing mostly valid results. So that’s good.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2027875 · Report as offensive     Reply Quote
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 2027879 - Posted: 16 Jan 2020, 3:59:05 UTC - in response to Message 2027842.  

Got two invalids on RX 5700 with 8.24 beta, in case it is relevant


The first result is missing a pulse, the second one has several too many. That's not nearly as bad as what was happening before. Pulse finding is one of the more stressful portions of the code.
@SETIEric@qoto.org (Mastodon)

ID: 2027879 · Report as offensive     Reply Quote
Rob

Send message
Joined: 7 Apr 12
Posts: 9
Credit: 951,019
RAC: 0
Germany
Message 2027917 - Posted: 16 Jan 2020, 13:05:14 UTC - in response to Message 2027879.  

Thanks, Eric! Appreciate the explanation. Glad to hear 🙂
ID: 2027917 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2027994 - Posted: 16 Jan 2020, 23:39:03 UTC

It would appear it's now time to follow up with the people who were producing bad results and let them know that the solution is at hand. As far as I know as 8.24 has gone to main, it will update automatically, and fixed drivers can be had at https://www.amd.com/en/support

I don't think there's anything else I need to pass on, but please advise if otherwise. I'll give it a while and then start sending messages.
ID: 2027994 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2027997 - Posted: 17 Jan 2020, 0:10:13 UTC
Last modified: 17 Jan 2020, 0:23:43 UTC

You might state that they have to use the Beta drivers. The Windows WHQL drivers are too old.
[Edit] Also maybe a friendly tip that the use of DDU is advised also when installing the new drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2027997 · Report as offensive     Reply Quote
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 2028180 - Posted: 17 Jan 2020, 20:53:02 UTC

Long thread, trying to catch up. Sorry if repeat something. I just got my AMD GPU working again on SETI after many years MIA, also related to driver issues on OSS Linux. I came back to the forums to report and found this thread.

1) Is there are summary of this issue somewhere?

2) This was a problem with 5700 XT, exclusively, is that correct? No other cards are affected?

3) It affects all platforms, right? Linux too?
ID: 2028180 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 2028185 - Posted: 17 Jan 2020, 21:06:13 UTC - in response to Message 2028180.  

2) This was a problem with 5700 XT, exclusively, is that correct? No other cards are affected?

3) It affects all platforms, right? Linux too?
It affected all the 5000 series cards.
There is now a new application, which when used with the latest driver release, appears to have fixed the problem. But both the new application and the new driver are required, just one or the other won't do it.
Grant
Darwin NT
ID: 2028185 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2028186 - Posted: 17 Jan 2020, 21:07:02 UTC

Answer to 2)
It affected all RX5xxx GPUs.

To get an overview of the situation it is best to look for posts by Eric Korpela (https://setiathome.berkeley.edu/show_user.php?userid=24735)
He's been working on a solution, which appears to be in two parts, first a new version of the drives, and second a new version of the application (8.24)
The last "official" news was that the new application was in Beta test https://setiathome.berkeley.edu/forum_thread.php?id=84508&postid=2027496 but there is no confirmation that the application has been released here on main yet. Tthere were some server problems just after Eric posted that so it is quite possible that he didn't have time to do all the things required to release an application formally on the main site.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2028186 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028187 - Posted: 17 Jan 2020, 21:08:16 UTC - in response to Message 2028180.  


1) Is there are summary of this issue somewhere?

2) This was a problem with 5700 XT, exclusively, is that correct? No other cards are affected?

3) It affects all platforms, right? Linux too?


1. A driver/software issue causing incorrect compute results to be produced from AMD Navi cards. These incorrect results would occasionally get compared to another Navi GPU, and be validated, and the correct result was discarded.

2. 5700 and 5700XT for sure. probably all Navi cards. but I'm not sure if anyone has identified any of the newer lower end cards yet.

3. Unsure, but probably. I don't think we found any/many people running these cards on SETI in Linux. almost all of them were on Windows.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028187 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028188 - Posted: 17 Jan 2020, 21:10:05 UTC - in response to Message 2028186.  

but there is no confirmation that the application has been released here on main yet.


it was released on main

https://setiathome.berkeley.edu/forum_thread.php?id=84983&postid=2027798
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028188 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2028193 - Posted: 17 Jan 2020, 21:16:44 UTC
Last modified: 17 Jan 2020, 21:17:13 UTC

And it does effect Linux as well with the RX5xxx series.

Cheers.
ID: 2028193 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2028205 - Posted: 17 Jan 2020, 21:59:42 UTC - in response to Message 2028188.  

Thanks - I was expecting something from Eric, but that probably got lost in the mayhem of earlier this week.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2028205 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2028206 - Posted: 17 Jan 2020, 22:07:47 UTC
Last modified: 17 Jan 2020, 22:08:40 UTC

One thing I'm not clear about is, does version 8.24 depend on having the "correct" driver version.
I've just had a look at some of my recent "valid" results, and one user with an AMD GPU is suffering from errors and is using version 8.24.
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8393099

One of the error tasks stderr (first few lines, it gets very repetitive....)
Task 8451164462
Name 	25mr13ab.29473.97828.6.33.211_1
Workunit 	3843027716
Created 	17 Jan 2020, 1:19:39 UTC
Sent 	17 Jan 2020, 5:04:18 UTC
Report deadline 	6 Feb 2020, 16:14:00 UTC
Received 	17 Jan 2020, 8:29:50 UTC
Server state 	Over
Outcome 	Computation error
Client state 	Compute error
Exit status 	-226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID 	8393099
Run time 	20 sec
CPU time 	17 sec
Validate state 	Invalid
Credit 	0.00
Device peak FLOPS 	6.26 GFLOPS
Application version 	SETI@home v8 v8.24 (opencl_ati5_cat132)
windows_intelx86
Peak working set size 	71.28 MB
Peak swap size 	96.04 MB
Peak disk usage 	0.37 MB
Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
too many boinc_temporary_exit()s</message>
<stderr_txt>
compute units: 2
	Single buffer allocation size: 128MB
	Total device global memory: 512MB
	max WG size: 128
	local mem type: Real
	LotOfMem path: no
	LowPerformanceGPU path: yes
	HighPerformanceGPU path: no
period_iterations_num=500
ERROR: OpenCL kernel/call 'Enqueueing kernel:pc_triplet_find_cl' call failed (-54) in file ..\analyzePoT.cpp near line 1393.
Waiting 30 sec before restart...
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Intel(R) Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
0 slot of 64 used for this instance
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 1; system mask is ff

Build features: SETI8	Non-graphics	OpenCL	USE_OPENCL_HD5xxx	OCL_ZERO_COPY	OCL_CHIRP3	FFTW	AMD specific	USE_SSE2	x86	
     CPUID:         Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz 

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX 
Low-performance GPU detected, default period_iterations_num set to 500
OpenCL-kernels filename : MultiBeam_Kernels_r3584.cl 
ar=2.594566  NumCfft=99685  NumGauss=0  NumPulse=28394048928  NumTriplet=28394048928
Currently allocated 185 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768

Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2028206 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028208 - Posted: 17 Jan 2020, 22:19:41 UTC - in response to Message 2028206.  

I believe it does indeed require the new driver as well.

It’ll take some time, but eventually most people will update their drivers.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028208 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 2028212 - Posted: 17 Jan 2020, 22:48:06 UTC - in response to Message 2028206.  

One thing I'm not clear about is, does version 8.24 depend on having the "correct" driver version.
That was my understanding.
AMD released a driver that specifically addressed the issue, explicitly staying Seti@Home. And Eric posted that it required the application to be recompiled with the changed compiler flag.
Grant
Darwin NT
ID: 2028212 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2028239 - Posted: 18 Jan 2020, 0:55:20 UTC - in response to Message 2028186.  

Answer to 2)
It affected all RX5xxx GPUs.

To get an overview of the situation it is best to look for posts by Eric Korpela (https://setiathome.berkeley.edu/show_user.php?userid=24735)
He's been working on a solution, which appears to be in two parts, first a new version of the drives, and second a new version of the application (8.24)
The last "official" news was that the new application was in Beta test https://setiathome.berkeley.edu/forum_thread.php?id=84508&postid=2027496 but there is no confirmation that the application has been released here on main yet. Tthere were some server problems just after Eric posted that so it is quite possible that he didn't have time to do all the things required to release an application formally on the main site.


. . I'm pretty sure Eric posted that it was now in main.

Stephen

. .
ID: 2028239 · Report as offensive     Reply Quote
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 2028245 - Posted: 18 Jan 2020, 1:19:48 UTC - in response to Message 2028239.  

And the list of applications at https://setiathome.berkeley.edu/apps.php for main shows windows 8.24 to be on main.
_\|/_
U r s
ID: 2028245 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 2028249 - Posted: 18 Jan 2020, 1:34:56 UTC - in response to Message 2028245.  

And the list of applications at https://setiathome.berkeley.edu/apps.php for main shows windows 8.24 to be on main.
I notice there isn't one for Linux there yet.
Grant
Darwin NT
ID: 2028249 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 2028333 - Posted: 18 Jan 2020, 16:03:49 UTC

I still can't see if the (luckily only) nine tasks I ran on my RX 5700 XT have validated, because of the trouble with the whole back-end.
ID: 2028333 · Report as offensive     Reply Quote
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 20 · Next

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.