Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 20 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 26945
Credit: 261,360,520
RAC: 489
Australia
Message 2023313 - Posted: 16 Dec 2019, 7:48:58 UTC - in response to Message 2023312.  

another one for me

Work Unit

https://setiathome.berkeley.edu/workunit.php?wuid=3789523858

Muggers

Rocky https://setiathome.berkeley.edu/show_host_detail.php?hostid=8865042

Dzoszi https://setiathome.berkeley.edu/show_host_detail.php?hostid=8643138

Stephen

:(
Both are listed Stephen, but do report those that arn't on the list ASAP. ;-)

I go through my inconclusives here after 11am (EDST)/midnight UTC time for the last 24hrs (it's much better than doing half asleep after the cricket) and I keep a list here of those noted and update it at that time as well. ;-)

Cheers.
ID: 2023313 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3516
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023319 - Posted: 16 Dec 2019, 9:08:11 UTC
Last modified: 16 Dec 2019, 9:39:53 UTC

Some positive news finally: Dr. Anderson responded. The SETI@Home project is having a team meeting this morning and this issue is being raised. That's all I have for now. :^)

Edit: OK, one more... gunsnammo is the latest person to have replied and indicate they've disabled GPU computing, and alffrommars did so though did not reply (thanks both!)

I've updated the list and you finally have your wished strikethrough lol.
ID: 2023319 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 26945
Credit: 261,360,520
RAC: 489
Australia
Message 2023323 - Posted: 16 Dec 2019, 10:08:13 UTC

Thanks for the update Mr. Kevvy, I just updated my text file list again, though I just use a * before the name for those that have replied and it may help those who're having problems with the italics. ;-)

Cheers.
ID: 2023323 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13351
Credit: 208,696,464
RAC: 304
Australia
Message 2023330 - Posted: 16 Dec 2019, 10:37:24 UTC - in response to Message 2023319.  

Some positive news finally: Dr. Anderson responded. The SETI@Home project is having a team meeting this morning and this issue is being raised. That's all I have for now. :^)
Worth pointing out to them that the issue is about to get a lot worse.

The RX 5500 XT has been released, and since it's a lot cheaper than the 5700 XT, and quite a bit cheaper than a similarly performing Nvidia card, we can expect a lot more of them to turn up than has occurred with the 5700XT.

Any 5000 XT series AMD card needs to be blocked till the OpenCL driver issue is resolved.
Grant
Darwin NT
ID: 2023330 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 2023338 - Posted: 16 Dec 2019, 14:14:41 UTC

Well, at least it's made the news: https://www.techpowerup.com/261603/amd-radeon-navi-opencl-bug-makes-it-unfit-for-seti-home. Perhaps try to get other tech sites to post it as well?

Meanwhile, most OpenCL problems at Folding@Home are being tested with a beta app and things are looking up. BOINC projects like Milkyway and Einstein don't have a problem running work on the RX 5700 (XT) with OpenCL. It's strictly something in the Seti apps that doesn't like it, perhaps the extreme optimisation.
ID: 2023338 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3516
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023347 - Posted: 16 Dec 2019, 14:41:47 UTC - in response to Message 2023338.  
Last modified: 16 Dec 2019, 15:33:29 UTC

BOINC projects like Milkyway and Einstein don't have a problem running work on the RX 5700 (XT) with OpenCL.


Please read this E@H thread Right from the get-go in July to one from yesterday they are having the same issue. One person noted that they only validate with other Navi, a rather understated way to say that they are putting junk results in the database... "As you have discovered, tasks completed with a 5700 will only validate if partnered with another machine with a 5700"

Also in the same thread is a link to a thread where even the coin miners can't use the Navi OpenCL drivers as they are broken. Also please see Keith's post further down here where it was acknowledged they were broken from release day but nonetheless they were released regardless.

This project is possibly the only one with a high enough turnover that the issue has become prominent.

Edit: in the Techpowerup link, someone suggests:

There's a suggestion that Navi has bad FFT implementation.
So as of this moment Navi cards are unfit for almost all computing production systems... and rather pointless for development (even students).


This would be really bad, although as far as I know FFT isn't done completely on the hardware but via the OpenCL clFFT library. This also certainly would not affect coin mining... but there may be something to it as (as far as I know) other AMD cards produce good results with OpenCL 2.0, but not on these ones so something underlying may be broken.
ID: 2023347 · Report as offensive     Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 9
Credit: 391,588
RAC: 19
Canada
Message 2023372 - Posted: 16 Dec 2019, 17:09:06 UTC - in response to Message 2023347.  
Last modified: 16 Dec 2019, 17:17:32 UTC

E@H's Gamma Ray app is affected, but their Gravitational Wave Search is fine (I have hundreds of results cross validated by Intel and Nvidia GPUS, and they've all been fine). I've only received one Gammy Ray amongst hundreds upon hundreds of Gravitational Wave Search apps, and that was very early on, so I think they've applied a temporary fix.

I'm certain Milkyway@home does not require FFT, I don't know about E@H. However, knowing which apps are effected and which ones aren't can help us narrow down the culprit.

I think the only solution is to stop giving Navi GPUs work until AMD fixes their drivers.

Even my friends who does not use BOINC knows this issue and has brought it up to me (He only lurks on Chinese forums). So I guess the information is spreading, just not nearly fast enough.
ID: 2023372 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 26945
Credit: 261,360,520
RAC: 489
Australia
Message 2023417 - Posted: 17 Dec 2019, 0:37:02 UTC

Other than those already reported I can report no new users today as well as no invalids caused by them.

Cheers.
ID: 2023417 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4248
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023429 - Posted: 17 Dec 2019, 2:28:51 UTC

I've crossposted my PSA posts to r/BOINC and r/AMD hopefully to grab the attention of anyone else who may be running these cards.

the r/BOINC thread is getting good activity and a couple people who were planning to put their RX5700/XT have indicated that they wont.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023429 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3516
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023430 - Posted: 17 Dec 2019, 2:31:00 UTC - in response to Message 2023429.  
Last modified: 17 Dec 2019, 2:39:36 UTC

Excellent Ian and thanks. I only got mugged twice today by four usual suspects, and Wiggo not at all, so things seem to be noticeably improving.
Edit: ... until the wave of RX 5500s show up of course!
ID: 2023430 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13141
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023438 - Posted: 17 Dec 2019, 2:59:23 UTC

I only got mugged twice today by four usual suspects

Ditto.

I would like a report from Richard at what was decided or discussed during the BOINC steering committee meeting this morning on this problem.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023438 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4248
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023508 - Posted: 17 Dec 2019, 23:55:10 UTC

The cross post in the AMD subreddit is BLOWING up. Over 3300 upvotes. And I know AMD employees browse and contribute to that sub.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023508 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 26945
Credit: 261,360,520
RAC: 489
Australia
Message 2023562 - Posted: 18 Dec 2019, 10:05:29 UTC

Other than the usual culprits I did pick up a new nuisance, but sadly it's an anonymous user, Computer ID 8867726.

Cheers.
ID: 2023562 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3516
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023573 - Posted: 18 Dec 2019, 14:51:52 UTC
Last modified: 18 Dec 2019, 15:02:09 UTC

Happily MadMikeDelta, Earendil and Rocky have joined the list of participants who have replied indicating that they are disabling their affected GPU. Also had a mugging from a new one (Doc_Jebus 10863878) so sent a pesterpost. I'll update the list shortly.

@Iona: vleermuis also pestered and thank you! :^)
ID: 2023573 · Report as offensive     Reply Quote
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 2023574 - Posted: 18 Dec 2019, 14:58:14 UTC

I've been swindled by a pair of RX 5700 XTs 'validating' against my CPU, in the last month, but when it came to posting about it after work, the result had disappeared by the time I got home. However, I have picked-up a recently commissioned machine, by virtue of me having this _2 task.
Don't take life too seriously, as you'll never come out of it alive!
ID: 2023574 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2023589 - Posted: 18 Dec 2019, 16:19:03 UTC

. . I got mugged by Rocky and Dzsozi again, so they are still running their 5700s

Stephen

:(
ID: 2023589 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4248
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023593 - Posted: 18 Dec 2019, 17:46:16 UTC

New drivers today, but don't expect a fix.

https://www.reddit.com/r/Amd/comments/ebz5vk/radeon_software_19123_tomorrow_december_18_2019/fb84hpx/?utm_source=share&utm_medium=web2x

tmakedon
Director of AMD Software Strategy

Does this reply prove to you that we are monitoring Reddit? ;-) Yes this will show up as a known issue and we are investigating it.

Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023593 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3516
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023594 - Posted: 18 Dec 2019, 17:55:15 UTC - in response to Message 2023593.  

1) Excellent work Ian!

2) @Stephen Rocky just indicated has suspended GPU computing so there may be many work units out there still awaiting wingie "validation".

3) Someone indicated that this issue does not affect Linux OpenCL (or Mac for that matter) but I have never seen any OS but Windows that was using these cards... the vast majority of Linux hosts here are so to take take advantage of Petri's NVidia-only client. If anyone can find an actual host number of one, it would be much appreciated.
ID: 2023594 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4248
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023596 - Posted: 18 Dec 2019, 18:05:30 UTC - in response to Message 2023594.  

I've also seen several people on reddit mention that. maybe on other projects, I haven't seen any Linux/Navi systems here at SETI.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023596 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14488
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2023602 - Posted: 18 Dec 2019, 19:40:44 UTC - in response to Message 2023438.  

I would like a report from Richard at what was decided or discussed during the BOINC steering committee meeting this morning on this problem.
I had a PM from Mr. Kevvy about that (I'm not an AMD user, so I don't usually bother reading this thread).

Mr. Kevvy described it as a SETI@Home team meeting, rather than a BOINC steering committee meeting, which sounds sensible for an issue like this. I'm sorry to say that I'm not invited to SETI@Home team meetings, and so far as I'm aware there are - currently - no published minutes or recordings. I wasn't even aware there was a meeting until Mr. Kevvy asked.
ID: 2023602 · Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 20 · Next

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.