Message boards :
Number crunching :
Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This is becoming much more common, similar to the problem with the APs, https://setiathome.berkeley.edu/workunit.php?wuid=3597951375 Task Computer Sent Time reported Status Run time CPU time Credit Application 7934624541 8534188 8 Aug 2019, 5:33:40 UTC 8 Aug 2019, 5:50:25 UTC Completed and validated 17.34 13.97 1.87 SETI@home v8 v8.22 (opencl_ati5_SoG_nocal) windows_intelx86 7934624542 8757016 8 Aug 2019, 5:33:35 UTC 8 Aug 2019, 9:29:35 UTC Completed, marked as invalid 176.24 149.68 0.00 SETI@home v8 Anonymous platform (NVIDIA GPU) 7935750639 7060821 8 Aug 2019, 15:01:01 UTC 8 Aug 2019, 16:11:08 UTC Completed, marked as invalid 35.12 17.83 0.00 SETI@home v8 Anonymous platform (ATI GPU) 7936540283 6942127 8 Aug 2019, 21:19:24 UTC 8 Aug 2019, 21:55:40 UTC Completed and validated 23.18 20.08 1.87 SETI@home v8 v8.22 (opencl_ati_nocal) windows_intelx86All the AMD GPUs have Hundreds of Invalids elsewhere. I suggest action sooner rather than later as it will undoubtedly become worse with more RX 5700 XT GPUs arriving. Maybe just One AMD GPU per WorkUnit? |
rob smith Send message Joined: 7 Mar 03 Posts: 22540 Credit: 416,307,556 RAC: 380 |
The most likely action is akin to the one they used a few years ago when nVidia GPUs were having grief with VLARs. A blanket embargo on all such GPUs until a "rock solid" solution was found (in the current situation "is found"). Naturally there will be people who will be significantly affected by such an action, but as this now appears to be jeopardising the integrity of the data (and thus the science) action should be taken sooner than later. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3807 Credit: 1,114,826,392 RAC: 3,319 |
Thanks for this, TBar. I had seen this mentioned before but didn't read it enough to see that they unfortunately aren't erroring out but instead running to boinc_finish completion. I'm sure he's aware, but just in case, I'll collect the info and let Dr. Korpela know. Edit: He's acknowledged... I am glad that I advised him as I don't think he had been yet. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The new Navi 5700 and 5700XT are useless for compute currently. The drivers are not ready for compute. All projects that rely on AMD OpenCL drivers are producing nothing but garbage results and invalids. The AMD developers and the Khronos group are aware of the problem but not a peep from either of them about what the real problem is or when to expect a fix. In the meantime, I think those cards should be banned until the drivers are fixed for all projects. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3807 Credit: 1,114,826,392 RAC: 3,319 |
Looks like Dr. Korpela is on it as that work unit linked in the OP has now been purged. This host's work unitslinked in the "Invalid Host Messaging" show it only has 64 in progress none of which are opencl_ati_nocal so either that platform has been banned or the owner acted on a private message. Another host from that thread shows active opencl_ati_nocal work but it was sent five days ago. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
No blanket ban for these GPUs yet. I've noticed a huge jump in my Inconclusives with the start of WoW (to be expected), and in addition to the usual suspects, there are a few RX 5700/ RX 5700 XTs with work that was sent out to them in the last day or two. Grant Darwin NT |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
def not banned. this one just popped up for me. I got edged out on this WU by 2 5700s that cross validated each other: https://setiathome.berkeley.edu/workunit.php?wuid=3619893627 Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3807 Credit: 1,114,826,392 RAC: 3,319 |
And by checking the queues of the computers involved, others will inevitably be found. Well, the good word has been passed so it's out of our hands lol. ¯\_(ツ)_/¯ |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I PM'd Alexander, and he agreed to remove his 5700 from the project for now. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Nikolaj_sofus Send message Joined: 25 Aug 19 Posts: 7 Credit: 280,643 RAC: 0 |
I'm new to seti@home.... and just started crunching with my vega 56 today... any issues with the vega cards or is it just an issue with the rdna cards? Also, how does the validation process work? can two vega cards actually validate eachother? |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3807 Credit: 1,114,826,392 RAC: 3,319 |
I PM'd Alexander, and he agreed to remove his 5700 from the project for now. Also that work unit I linked below only two days ago has now been purged as was the last one, so there definitely is some admin. involvement going on. @Nikolaj: You're fine... no issues with that card. The only check the project makes for quorum as far as I know is that both copies of a work unit aren't sent to the same participant. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
There has been a problem with Astropulse and the Newer AMD cards for a couple of Years. Any driver newer than around 1800 will produce the Wrong AstroPulse results. Supposedly there is just One AMD/ATI card for each AP Workunit allowed, however, I've noticed that sometimes that doesn't happen and Two AMD cards will cross-validate with the wrong results. Here's an example of the differences, AstroPulse v7 Anonymous platform (NVIDIA GPU) single pulses: 26 repetitive pulses: 30 percent blanked: 2.40 AstroPulse v7 v7.09 (opencl_ati_100) windows_intelx86 single pulses: 7 repetitive pulses: 30 percent blanked: 2.40 It's been like this for years. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36873 Credit: 261,360,520 RAC: 489 |
Another pair of RX 5700's adding to poor science. :-( https://setiathome.berkeley.edu/workunit.php?wuid=3632373304 They really need to be looked at seriously. Cheers. |
Bluerazor Send message Joined: 22 May 99 Posts: 15 Credit: 3,889,427 RAC: 12 |
Is there any specific information that RX 5700 and 5700 XT owners can pass along to AMD to complain about this issue? It's not great news to buy a new card and find out you're stuck with just slow CPU crunching. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36873 Credit: 261,360,520 RAC: 489 |
Is there any specific information that RX 5700 and 5700 XT owners can pass along to AMD to complain about this issue? It's not great news to buy a new card and find out you're stuck with just slow CPU crunching.It's really very simple, the OpenCL part of AMD's latest drivers is broken for computational work. ;-) Cheers. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OpenCL is controlled by the Khronos Group. Not AMD. They are the ones that need to be contacted that their driver component is broken. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
OpenCL is controlled by the Khronos Group. Not AMD. They are the ones that need to be contacted that their driver component is broken.Khronos might be the ones responsible for OpenCL, but the supplier of the hardware are the ones responsible for the function of their drivers. If there is an issue with the OpenCL specification, then things will become very ugly and protected (lots of figure pointing). But if it's just a case of an issue with the driver development, hopefully it shouldn't take too long for it to be resolved by the manufacturer of the hardware affected. Grant Darwin NT |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And did someone thread about it on AMD OpenCL forums? Anyone with ability to do offline testing and possession of such "broken" hardware+software? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
This is becoming much more common, similar to the problem with the APs, For future reference could anyone posting such comparisons also to grab stderr outputs too while they are available, please. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
And did someone thread about it on AMD OpenCL forums? Phoronix did testing and reviews of the RX 5700XT and could not get the card and drivers to pass the OpenCL parts of their standardized test suite. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.