Message boards :
Number crunching :
Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 20 · Next
Author | Message |
---|---|
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 9 Credit: 391,588 RAC: 19 |
According to Ryan Smith from AnandTech, no. The OpenCL drivers are still garbage. |
lastsworder Send message Joined: 9 Dec 19 Posts: 1 Credit: 13,014 RAC: 0 |
收到,已ç»ç¦ç”¨GPU,但是得解决这个问题,毕竟5700的计算能力应该还是很å¯è§‚的。 |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 9 Credit: 391,588 RAC: 19 |
收到,已ç»ç¦ç”¨GPU,但是得解决这个问题,毕竟5700的计算能力应该还是很å¯è§‚的。 Translation: I've received the message and disabled my GPU. However, this issue must be resolved, since the computational capabilities of the RX5700 is quite impressive. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
According to Ryan Smith from AnandTech, no. The OpenCL drivers are still garbage.Ryan Smith on RX 5xxx OpenCL support. While I’m including compute performance for the sake of completeness here, the compute situation on Navi has not substantially changed since the launch of the Radeon RX 5700 series over 5 months ago. AMD’s Adrenaline 2020 software has improved the state of their OpenCL drivers slightly – there are fewer hard crashes and performance is up in some cases – but their drivers are still dysfunctional and not fit for production use. In particular, Folding@Home and parts of CompuBench are still unable to run. Grant Darwin NT |
Justin Turner Arthur Send message Joined: 20 Oct 03 Posts: 12 Credit: 3,929,052 RAC: 2 |
Unfortunately, the stock multibeam client excludes the ROCm OpenCL runtime in its plan class at the moment, so you'd have to roll your own to use the card on ROCm. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
If you want to see how the plain RX 5700 would work at SETI just look at this Host, seems the RX 5700s work fine in MacOS Catalina, https://setiathome.berkeley.edu/results.php?hostid=8592369&offset=80 Unfortunately, it appears he stopped producing work on that machine some time ago. The best action to take on these cards is to simply restrict the WUs to allow only One AMD GPU per MB task in Windows, basically the same as already is being done with the Astropulse tasks on Main. It shouldn't be that difficult to extend the restriction to include Multibeam. If the problem is ever fixed the same code would still allow AMD work to be produced. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I was kind of wondering how other platforms were performing. are there any RX5700s on Linux? Are they producing bad results too? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Wiggo Send message Joined: 24 Jan 00 Posts: 36339 Credit: 261,360,520 RAC: 489 |
I was kind of wondering how other platforms were performing. are there any RX5700s on Linux? Are they producing bad results too?None that I've come across as yet, they've all been Windows rigs. Cheers. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36339 Credit: 261,360,520 RAC: 489 |
[AfZ]TomServo1 1483720Damn, the mongrels are into me today with a few doing me out of credit and I've 5 more names to add to that list. :-) calendir 9663884 eryndel 10878567 PantherJon 9801065 Rocky 270621 T66 3336343 |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
calendir 9663884 Pestered and thank you again! Er... no need in future to quote the whole list. It's getting annoyingly long and I don't see it getting shorter anytime soon. :^) Edit: I found half a dozen or so more from checking valids on the host queues of the known bad IDs, sent pesterposts and updated my original list on the last page. One of those computers (T66's) demonstrated perfectly why this is such a deceptively severe issue, which I think that the admins. here are missing. It only had 8 invalids, but of the valids, 2 were from the GPU, so it is managing to find another AMD RX to cross-validate a quarter of the time, seemingly very disproportionately with small percentage of AMD RX Windows rigs out there. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
the longer this goes on, the more RX5xxx cards will be in the hands of users, the more cards will be on the project, further increasing the chances that they find another RX5xxx card to cross validate with. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Wiggo Send message Joined: 24 Jan 00 Posts: 36339 Credit: 261,360,520 RAC: 489 |
Pestered and thank you again! Er... no need in future to quote the whole list. It's getting annoyingly long and I don't see it getting shorter anytime soon. :^)Yeah T66 and higemayuge teamed up against me for 3 invalids here and yesterday's download problems meant that I've got teamed up with several of those users of those cards over quite a large number of tasks. Yes I expect that list to get very long if these cards are allowed to continue on in the way that they currently are and the science to be further corrupted the longer they stay. :-( Cheers. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
I had a look at the Stderr of these cross-validated results and I think that the reason that they are not being actioned to any great degree is that they are all finishing with result overflow -9. As far as I know, work units like this are thrown out as noise. So although we are unfortunately losing a lot of observations, as well as wasting large amounts of processing and bandwidth, I don't know if it's actually going to taint the end results (other than by their absence.) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
On my invalid tasks involving cross-validate AMD results I have had five tasks that were not an overflow. Granted that was 5 out of 14, so there are a lot of overflowed tasks. But somewhere along the way either Eric or Richard stated that even overflows were useful science even when they were noisy. The do inject noisy "birdies" on purpose you know. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
Perfect... thanks Keith. That certainly answers that yes, bad data is going to get into the final database and eventually Nebula. Edit: Also Niflhuem has responded and disabled GPU computing. It was a given, but I also found this post which shows that other BOINC projects are affected (as would be expected as OpenCL as far as I know is used for all AMD GPU computing. |
rob smith Send message Joined: 7 Mar 03 Posts: 22443 Credit: 416,307,556 RAC: 380 |
Hopefully when Nebula does its thing the "bad data" being puked by these RX5700s will be seen as single events and so slung into the waste bin. It only becomes a problem if the same frequency/location pair comes up on is second scan with a "valid" result, or if a perfectly good "old" result get slung because it isn't paired due to results from these GPUs. It is worth remembering that the actual data "tapes" are, as far as I'm aware kept, so it would (in theory) be possible to re-run the suspect data, having blocked RX5700s from getting anywhere near it. The sooner RX5700 are blocked as a class (the same as happened a few years back when there were issues with nVidia GPUs and VLARS) the better it will be for all (apart from RX5700 owners). Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
MagicEye Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 |
Is there any way to contact the project responsibles? Are they aware of the problem? Are there any solutions ongoing? I have seen in the last days some tasks that were send out not only to 1 other host but to 2. On the one hand a very good idea. On the other hand the 2 5700 GPUs still overruled the other 2 PCs. :( The good thing is, that astropulse seems to run without error and really quite fast - about 1cr per second of run time. And with these tasks and in most cases also quite good AMD CPUs they get a lot of credits - maybe thats the reason they don't see that the 5700 card are producing so much waste. |
rob smith Send message Joined: 7 Mar 03 Posts: 22443 Credit: 416,307,556 RAC: 380 |
The person to contact is Eric Korpela, and I'm pretty sure from earlier correspondence he is already aware of the situation. I know he monitors the forum, but I also know he's been very busy with things outside the atmosphere just now. Next would be Jeff Cobb. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Wiggo Send message Joined: 24 Jan 00 Posts: 36339 Credit: 261,360,520 RAC: 489 |
The buggers are coming out of the woodwork like termites looking for more to chew on here and I'll have more to add to the list in the morning. :-( Cheers. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.