Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 20 · Next

AuthorMessage
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 2018156 - Posted: 8 Nov 2019, 9:38:00 UTC

ID: 2018156 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36781
Credit: 261,360,520
RAC: 489
Australia
Message 2018157 - Posted: 8 Nov 2019, 9:56:51 UTC - in response to Message 2018156.  

ID: 2018157 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2018182 - Posted: 8 Nov 2019, 17:07:09 UTC - in response to Message 2018156.  

AP app seems working well with the 5700XT !

https://setiathome.berkeley.edu/results.php?hostid=8772813&offset=0&show_names=0&state=0&appid=20
The RX5700 version works well in a Mac, https://setiathome.berkeley.edu/results.php?hostid=8592369&offset=120 You could try the other MB8 Windows App versions and see if they work with with the RX5700XT. Just install Lunatics and then insert the Apps from Here, https://setiathome.berkeley.edu/forum_thread.php?id=79765&postid=1801541#1801541 BOINC will run the NV & Intel Apps on the AMD 5700 by just keeping ATI in the <coproc> section instead of the other names. The current Stock Mac NV App is really the Intel_gpu build due to the normal NV builds not working very well on the NV GPUs. Use the included app_info.xml from the download to set it up and just use ATI in the <coproc> section. Make sure to set your cache very Low first, something like 0.001 will just download a couple of tasks at a time in case it doesn't work. Also, disable networking and suspend all but one task so you can test all the App versions. Or, just download and use the Benchmark package to test it offline, but, the Benchmark test may not run the NV & Intel Apps on the AMD card the way Anonymous platform will.
ID: 2018182 · Report as offensive     Reply Quote
Profile Justin Turner Arthur

Send message
Joined: 20 Oct 03
Posts: 12
Credit: 3,929,052
RAC: 2
United States
Message 2018580 - Posted: 11 Nov 2019, 22:07:51 UTC - in response to Message 2018182.  

Just found another corrupt WU quorum from Windows Radeon 5700 XT users: 3734170312. I let Eric know.
ID: 2018580 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 2019213 - Posted: 16 Nov 2019, 2:00:37 UTC
Last modified: 16 Nov 2019, 2:02:02 UTC

blc14_2bit_guppi_58691_76289_HIP40815_0077.5058.818.21.44.40.vlar
Mugged by RX 5700XT's, again.
*deep sigh*
Grant
Darwin NT
ID: 2019213 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 2019324 - Posted: 17 Nov 2019, 0:13:11 UTC

And now a double mugging, 2 WUs lost to RX 5700XTs
What's next, 4 WUs?
Grant
Darwin NT
ID: 2019324 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 2019325 - Posted: 17 Nov 2019, 0:15:05 UTC - in response to Message 2006525.  

Thanks for this, TBar. I had seen this mentioned before but didn't read it enough to see that they unfortunately aren't erroring out but instead running to boinc_finish completion.

I'm sure he's aware, but just in case, I'll collect the info and let Dr. Korpela know.

Edit: He's acknowledged... I am glad that I advised him as I don't think he had been yet.
It's been 3 months, the numbers of faulty data going in to the database are increasing. Might be worth a follow up.
Grant
Darwin NT
ID: 2019325 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36781
Credit: 261,360,520
RAC: 489
Australia
Message 2019326 - Posted: 17 Nov 2019, 0:16:43 UTC

They're certainly corrupting the science with their false results. :-(

Cheers.
ID: 2019326 · Report as offensive     Reply Quote
Profile Sean Project Donor
Volunteer tester

Send message
Joined: 10 Aug 00
Posts: 33
Credit: 125,775,158
RAC: 199
United States
Message 2019519 - Posted: 18 Nov 2019, 15:28:26 UTC - in response to Message 2019326.  

ID: 2019519 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2019521 - Posted: 18 Nov 2019, 16:00:30 UTC
Last modified: 18 Nov 2019, 16:02:03 UTC

It's a problem with Windows in general. Look at these, they are all the same AP WUs, same in each case, Two bad Windows and One Good Mac. The Good loses every time;
https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71141&state=5
34836695  12179862 	10 Nov 2019, 10:09:48 UTC 	10 Nov 2019, 13:25:49 UTC 	Completed, marked as invalid  2,066.68  234.68  0.00 	AstroPulse v7 v7.07 (opencl_nvidia_mac)
34836702  12179865 	10 Nov 2019, 10:09:48 UTC 	10 Nov 2019, 12:28:36 UTC 	Completed, marked as invalid 	796.48 	215.72 	0.00 	AstroPulse v7 v7.07 (opencl_nvidia_mac)
34823813  12171883 	8 Nov 2019, 13:30:36 UTC 	9 Nov 2019, 1:32:50 UTC 	Completed, marked as invalid 	780.72 	206.05 	0.00 	AstroPulse v7 v7.01 (sse3)
34813010  12171346 	7 Nov 2019, 21:28:13 UTC 	8 Nov 2019, 7:11:54 UTC 	Completed, marked as invalid 	792.29 	205.89 	0.00 	AstroPulse v7 v7.07 (opencl_ati_mac)
34812940  12171312 	7 Nov 2019, 20:50:35 UTC 	8 Nov 2019, 6:45:28 UTC 	Completed, marked as invalid 	804.41 	222.15 	0.00 	AstroPulse v7 v7.07 (opencl_ati_mac)
34811733  12170908 	7 Nov 2019, 17:05:44 UTC 	7 Nov 2019, 22:07:30 UTC 	Completed, marked as invalid 	789.17 	207.18 	0.00 	AstroPulse v7 v7.07 (opencl_ati_mac)
34811714  12170899 	7 Nov 2019, 17:05:28 UTC 	7 Nov 2019, 20:35:22 UTC 	Completed, marked as invalid 	788.46 	207.53 	0.00 	AstroPulse v7 v7.07 (opencl_ati_mac)
34811724  12170904 	7 Nov 2019, 17:05:28 UTC 	7 Nov 2019, 18:30:37 UTC 	Completed, marked as invalid 	792.83 	206.24 	0.00 	AstroPulse v7 v7.07 (opencl_ati_mac)
34811726  12170905 	7 Nov 2019, 17:05:28 UTC 	7 Nov 2019, 21:28:13 UTC 	Completed, marked as invalid 	793.05 	208.26 	0.00 	AstroPulse v7 v7.07 (opencl_ati_mac)
ID: 2019521 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2019659 - Posted: 19 Nov 2019, 19:18:55 UTC

Just a refresher about the current state of AMD and their OpenCL drivers.
https://www.anandtech.com/show/14618/the-amd-radeon-rx-5700-xt-rx-5700-review/13
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2019659 · Report as offensive     Reply Quote
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 2020312 - Posted: 23 Nov 2019, 23:40:20 UTC

I check from time to time my invalid WUs.
Very often there are other PCs with e.g. RX5700 GPU, which compute about 50% errors.
As a wing man for my GPUs there are 2 different results for the WU and it is sent out to a third PC.
When the third user also has such a error-GPU, he will validate the wrong result and the good result is rejected.
I looked to this PCs and see about 50% error WUs.
But i am sure the other 50% have also errors.

Shouldn't it be better for the correctness of the results and for the science to block PCs with such high error rates?
Or send then not more then 2-3 WUs per day until the will send back about 90+% correct results?
ID: 2020312 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2020342 - Posted: 24 Nov 2019, 2:30:41 UTC - in response to Message 2020312.  

The "bad host detected" is already in the server code. #3024 But it does not work very well. The project scientists are aware of the AMD 5700/XT problem I'm fairly sure, but the fix probably requires a fair bit of new coding. It should be handled by the project developers but they have lots of more urgent code updates pending.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2020342 · Report as offensive     Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 9
Credit: 391,588
RAC: 19
Canada
Message 2020877 - Posted: 28 Nov 2019, 4:13:30 UTC
Last modified: 28 Nov 2019, 4:41:18 UTC

I run a rx5700 and have noticed this issue. The task runs to completion and returns blatantly incorrect results. The only times when my rx5700 GPU gets a valid result is when it is validated against another AMD rx5700 series GPU (both gets the wrong result). I've currently stopped my computer from accepting GPU work units (it took me way too long to realize something was wrong, sorry). I believe this is an issue with the Navi architecture and not necessarily solely with AMD's OpenCL driver, as I see older AMD GPUs still returning "correct" results.

Someone has to redo all the work units where the results came from Navi AMD GPUs (RX5700, RX 5700XT, RX 5500M, RX 5500), and ban all AMD Navi GPUs until a fix is found.

Interestingly, my RX5700 has not been causing issues with other projects, like Einstein@home, Milkyway@home, Collatz, etc. Something about Navi and OpenCL really does not like Seti@home.

If any of you need any testing or logs on an AMD RX5700, hit me up.

edit: Corrected OpenGl to OpenCl, thanks Keith Myers
ID: 2020877 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2020883 - Posted: 28 Nov 2019, 4:37:33 UTC - in response to Message 2020877.  

Sounds like the issue might come down to the app needing an update rather than the drivers.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2020883 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2020884 - Posted: 28 Nov 2019, 4:38:36 UTC - in response to Message 2020877.  

I believe this is an issue with the Navi architecture and not necessarily with AMD's OpenGL driver,

Correction. Distributed computing does not use the OpenGL component. It uses the OpenCL component of the drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2020884 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2020885 - Posted: 28 Nov 2019, 4:40:05 UTC - in response to Message 2020883.  

Sounds like the issue might come down to the app needing an update rather than the drivers.

Anybody heard or seen any sign of Raistmer lately? He being the developer of the app.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2020885 · Report as offensive     Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 9
Credit: 391,588
RAC: 19
Canada
Message 2020895 - Posted: 28 Nov 2019, 7:00:09 UTC - in response to Message 2020885.  
Last modified: 28 Nov 2019, 7:07:45 UTC

Sounds like the issue might come down to the app needing an update rather than the drivers.

Anybody heard or seen any sign of Raistmer lately? He being the developer of the app.

He was here. It was in September, though.
https://community.amd.com/thread/243179
ID: 2020895 · Report as offensive     Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 9
Credit: 391,588
RAC: 19
Canada
Message 2020896 - Posted: 28 Nov 2019, 7:35:52 UTC - in response to Message 2020895.  
Last modified: 28 Nov 2019, 7:37:03 UTC

I have posted this issue in the AMD subReddit to hopefully get more attention and get owners of Navi GPUs to remove them from Seti@home.
ID: 2020896 · Report as offensive     Reply Quote
catavalon21

Send message
Joined: 2 Nov 01
Posts: 13
Credit: 7,238,152
RAC: 48
United States
Message 2021395 - Posted: 2 Dec 2019, 2:19:57 UTC - in response to Message 2006538.  

The few 5700XT crunchers I have seen when manually perusing the results of folks who validate the same WUs I do largely error out; however, there are occasionally "valid" results returned, and they are STUPID fast...if correct, valid results close to an order of magnitude faster than my 970 on the special sauce, with them apparently running the default Windows AMD app. It would really be nice for solid drivers (if that's the issue) to eventually show up. Maybe with Big Navi <sigh>
ID: 2021395 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 20 · Next

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.