Message boards :
Number crunching :
Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 20 · Next
Author | Message |
---|---|
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22665 Credit: 416,307,556 RAC: 380 ![]() ![]() |
It certainly worked when there were issues with nVidia GPUs a few years back, so a bit of thinking around the logic identifying the GPU famuily and I can't see why it shouldn't work again (but on a different GPU family...) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
It certainly worked when there were issues with nVidia GPUs a few years back, so a bit of thinking around the logic identifying the GPU famuily and I can't see why it shouldn't work again (but on a different GPU family...)You're not thinking of the Fermi fiasco, are you? That happened because the programmers (NVidia themselves, for those early apps) used an undocumented shortcut which was removed for the later generations. Once identified, NVidia removed it from the SETI application and documented the problem for future programmers. |
![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 3828 Credit: 1,114,826,392 RAC: 3,319 ![]() ![]() |
Dank and Rafael added and pestered and thank you. :^) Also thanks to Swagstergo who is the latest to reply and indicate that they disabled their affected GPU. I also found I was mugged for 3810416472 which is exactly what I was waiting for as it created after the indicated quorum fix was in place but stil "validated" with two -9 overflow RX 5700s overpowering a non-overflow platform. I will be contacting Dr. Korpela and I'll include the work unit posted earlier as well. Edit: The pesterposts seem to be working. I went over the entire list checking work queues and found that an equal number or more people had disabled their affected GPUs (or stopped computing entirely) without replying than otherwise, so there are plenty more strikethroughs in there. ![]() |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
I still suggest the same procedure currently being used for the AstroPulse Tasks simply be extended to include Multibeam tasks. The only solution which will stop Cross-Validation while still allowing the GPUs to participate in SETI is to only allow One AMD/ATI Host per work unit. It seems to work well for the AstroPulse tasks on Main, while Beta still allows you to be robbed. The MacOS 19.1 Catalina Update provides support for the AMD RX 5xxx series, so, you will start seeing more of those GPUs on Macs soon, and they work on Macs. |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37345 Credit: 261,360,520 RAC: 489 ![]() ![]() |
Or was it also before then? (295.xx-296.xx were the "sleepy drivers", but I seem to remember something even before them [edit: wasn't there also a bad driver group back in the 140.xx-160.xx range as well).It certainly worked when there were issues with nVidia GPUs a few years back, so a bit of thinking around the logic identifying the GPU famuily and I can't see why it shouldn't work again (but on a different GPU family...)You're not thinking of the Fermi fiasco, are you? That happened because the programmers (NVidia themselves, for those early apps) used an undocumented shortcut which was removed for the later generations. Once identified, NVidia removed it from the SETI application and documented the problem for future programmers. But IIRC it was Matt that use to work that kind of magic. Anyhow, here's a couple of more bad SETI choice Xmas presents. Daniel Conrad Broom 8059986 rAttmAniA 9002301 Cheers. |
![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 3828 Credit: 1,114,826,392 RAC: 3,319 ![]() ![]() |
|
![]() ![]() Send message Joined: 19 Sep 99 Posts: 70 Credit: 40,327,877 RAC: 75 ![]() ![]() |
Since yesterday, we've fallen back to the old server so anonymous platform apps should be able to get work. Since this morning we should have had the validator that requires 3 results for overflow results. Merry Christmas! What is the difference between the old and the new validator setting? In the past i have seen always quorum 3 if 2 hosts have different results - and this practice is still the same when i look on the today results of some 5700XT . I just want to understand the effort - for me it stills looks the same. From the science point of view the quorum 4 would be a very good idea to reduce the bad results marked as good by 2 5700 GPUs until a solution is found to exclude such hosts completely. Or will the results of the 5700 GPUs be deleted in a later stage of the computing on the servers and we don't see any of these results in the science database? |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37345 Credit: 261,360,520 RAC: 489 ![]() ![]() |
Another 2 that have crossed my path. Alexandr Galushchenko 9609912 Richard Hartland 9781177 Cheers. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
I suppose the recent Server swaps could have removed the AP code again...it has happened before. The AP results are different from the Multibeam results. The AP results actually DO result in bad Data being entered into the Database because the task has already passed the RFI test. In the Multibeam tasks, All of those overflows being done by the AMD 5xxx GPUs are at Chirp = Zero, and Will be removed as RFI. So, the Multibeam overflows are just a waste of Volunteer's time, and missed observations for the Project, whereas, the False AP results really do enter bad Data. Strange all the results matched though, it could be something else...I still suggest the same procedure currently being used for the AstroPulse Tasks simply be extended to include Multibeam tasks. The only solution which will stop Cross-Validation while still allowing the GPUs to participate in SETI is to only allow One AMD/ATI Host per work unit. It seems to work well for the AstroPulse tasks on Main, while Beta still allows you to be robbed. The MacOS 19.1 Catalina Update provides support for the AMD RX 5xxx series, so, you will start seeing more of those GPUs on Macs soon, and they work on Macs. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
new one that robbed Zalster: https://setiathome.berkeley.edu/workunit.php?wuid=3816043248 AMD Jesus: https://setiathome.berkeley.edu/show_user.php?userid=70887 Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 ![]() |
new one that robbed Zalster: Do you mean https://setiathome.berkeley.edu/show_host_detail.php?hostid=8823664 Cause AMD Jesus has a Radeon VII and Vega is known good? |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22665 Credit: 416,307,556 RAC: 380 ![]() ![]() |
That's interesting - if you look at the task summaries for those two robbers one will see that between them they have over 1000 "invalid" returns. Sadly "invalid" returns do not count "as strikes against" in the same way as errors do. Perhaps it is time for invalid returns to be treated in a similar way to errors (too many in a given period and the number of tasks sent out to that host are reduced). Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 3828 Credit: 1,114,826,392 RAC: 3,319 ![]() ![]() |
Alexandr Galushchenko and Richard Hartland pestered... thank you! AMD Jesus is the "correct" one (rame is already in the list) as has many invalids, but as noted not used a card known to be an issue and there are a mix of good and bad in there so I wonder if either it was some other issue or the card was changed. I will give that one a day to verify. ![]() |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
That's interesting - if you look at the task summaries for those two robbers one will see that between them they have over 1000 "invalid" returns. Sadly "invalid" returns do not count "as strikes against" in the same way as errors do. Perhaps it is time for invalid returns to be treated in a similar way to errors (too many in a given period and the number of tasks sent out to that host are reduced). +1 Stephen ! ! ! |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37345 Credit: 261,360,520 RAC: 489 ![]() ![]() |
Cause AMD Jesus has a Radeon VII and Vega is known good?If you read the Stderr output (link may not last long) the 1st GPU listed is a gfx1010 (aka: RX 5700 XT and what the w/u was ran on) while the 2nd is a gfx906 (the Radeon VII), but why the older card is listed as the primary card in that system instead of the RX is a bit strange to me. ;-) Cheers. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Yes, AMD Jesus has two cards, one being the Radeon VII, the other being the RX5700(XT) The reason the Radeon VII is seen as the better card is likely due to the amount of VRAM, 16GB on the RVII vs 8GB on the RX5700. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37345 Credit: 261,360,520 RAC: 489 ![]() ![]() |
I didn't think of that, but you're likely correct. Cheers. |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37345 Credit: 261,360,520 RAC: 489 ![]() ![]() |
A new 1 for me. Richard 8565733 Cheers. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Cause AMD Jesus has a Radeon VII and Vega is known good?If you read the Stderr output (link may not last long) the 1st GPU listed is a gfx1010 (aka: RX 5700 XT and what the w/u was ran on) while the 2nd is a gfx906 (the Radeon VII), but why the older card is listed as the primary card in that system instead of the RX is a bit strange to me. ;-) . . These days it is often the case, I have seen it on many machines, the older or lesser card gets primary listing. Stephen ? ? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13882 Credit: 208,696,464 RAC: 304 ![]() ![]() |
The Radeon VII outperforms the 5700 XT, it is the more powerful card (particularly for single precision work). Grant Darwin NT |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.