Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

AuthorMessage
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023087 - Posted: 14 Dec 2019, 10:28:21 UTC - in response to Message 2023077.  
Last modified: 14 Dec 2019, 16:30:45 UTC

The person to contact is Eric Korpela... Next would be Jeff Cobb.


Unfortunately I didn't get a reply from Dr. Korpela and missed any acknowedgment, so I've written to Dr. David Anderson cc Jeff Cobb. I think that the more on the team that are aware, the better. Also as Dr. Anderson has been working on Nebula for years and this has the potential to do serious damage to its output, I think he should definitely be aware (I'm sure he probably is but we haven't had any wide-ranging official word on this so that would be nice too.)

Edit: And Jeffrey A. Smith is the latest person to reply indicating that they have disabled GPU computing.
ID: 2023087 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36564
Credit: 261,360,520
RAC: 489
Australia
Message 2023139 - Posted: 14 Dec 2019, 20:50:18 UTC

Some more.
Anonymous Computer ID 6692170
Bigthor 480399
Eric 9157146
Haiko_N 9198068
Juraxell 10864786
Wojciech Knapczyk 31308
ID: 2023139 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023150 - Posted: 14 Dec 2019, 21:32:45 UTC - in response to Message 2023069.  

Yes if you read the release day reviews of the Radeon Navi 5700/XT cards that include business and scientific computing benches in their benchmark suite, you can see the testers/reviewers comment that the OpenCL compute was not working. Either the applications would crash or not run or produce invalid data. Consensus was the Navi cards were "not ready for prime-time" for OpenCL computing.

Yes, other projects are having issues with the cards. Some don't. Depends on the application and data structures I assume.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023150 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023166 - Posted: 14 Dec 2019, 23:18:38 UTC - in response to Message 2023139.  

Some more.


Thanks, Wiggo! Haiko_N and Juraxell were already in the list and pestered. Wojciech Knapczyk doesn't have an affected card so may have removed it if was already notified in the "Invalid Host Messaging" thread or otherwise.

I am going to update my list with annotations and check whether the ones who haven't replied are still producing garbage; some of them may have offlined but not replied.
ID: 2023166 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36564
Credit: 261,360,520
RAC: 489
Australia
Message 2023167 - Posted: 14 Dec 2019, 23:42:58 UTC
Last modified: 14 Dec 2019, 23:43:58 UTC

Wojciech Knapczyk doesn't have an affected card so may have removed it if was already notified in the "Invalid Host Messaging" thread or otherwise.
I'm sorry about that 1, but I rechecked that w/u and the culprit was actually Borktron 10682716 ( I really shouldn't do this late at night) and alffrommars 9024750 also turned up this morning.

Cheers.
ID: 2023167 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023168 - Posted: 14 Dec 2019, 23:50:21 UTC - in response to Message 2023167.  
Last modified: 18 Dec 2019, 15:04:55 UTC

Excellent and no worries... I've pesterposted alffrommars. I also found a couple more. Here is the complete list with participant (not computer) IDs so it's fresh. Please, to all, don't quote the whole thing in a reply. Thanks.

[AfZ]TomServo1 1483720
achimbln 138625
alffrommars 9024750
antoi 10856207
aridhol 10288747
Baldarov 9438496
Bigthor 480399
Borktron 10682716
Brandon 8198367
calendir 9663884
Camiron 7449359
Carl 914781
Christopher 9894096
CoffeeSloth 10266313
Crisu 7833612
Daniel frederikson 9813817
Derrek 219419
Doc_Jebus 10863878
dsharbour 10858679
Dzsozi 8002127
Earendil 146007
egon.sauter 494566
Eric 9157146
eryndel 10878567
Esta 10624508
Foaming Mad Cow Industries 219464
fred 1935325
ghostbuster 564989
gunsnammo 137399
Haiko_N 9198068
HawkMedic 10838738
higemayuge 10790664
HMZ 9079227
Jeff 10639246
Jeffrey A. Smith 38247
Jerjes 1291426
Jorge Barrera 9650295
Juraxell 10864786
Kekke 46817
knutella 9880098
lastsworder 10878688
lupaslupas 10002927
MadMikeDelta 8221690
Maulwurf 1516335
MaximusPrometheus 10240426
mgg 279419
mnelsonx 272885
Niflhuem 113140
No Name@Extraterrestrial Intelligence 8116
NYX.consulting 10503661
Oriah 9838773
Otosan 8547502
PantherJon 9801065
Peter Furlong 7965665
phoenix7477 10773411
rgeens 10740140
Rocky 270621
Saint123 159425
Stephen Diem 36679
stogdan 10865456
StrayCat 177967
Strickland 34273
suhail ahmad 9878177
T66 3336343
toby 9442798
Tomik 8972653
Trezy 10367889
Tristan 9778349
vleermuis 1295921
VMS Software Inc 45538
werewolf_007 10880222
xakei 10823091
Zac 100334866

Italicized names have replied and indicated they are disabling their affected GPUs.
Struck-through names are confirmed to no longer be producing these bad results, ie via disabling GPU computing.
ID: 2023168 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36564
Credit: 261,360,520
RAC: 489
Australia
Message 2023203 - Posted: 15 Dec 2019, 10:10:38 UTC - in response to Message 2023200.  

Another computer from toby:

toby 8772813

I don't PM people about this, it's usually not worth it.
99.9% never replies. Most of them don't even know about this forum.

This is out of control by now, and the database is badly corrupted, and the scientific results are totally unreliable.
The number beside the names Sten are user ID numbers not computer numbers unless they're "Anonymous" users. ;-)

Cheers.
ID: 2023203 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023204 - Posted: 15 Dec 2019, 10:16:20 UTC - in response to Message 2023200.  
Last modified: 15 Dec 2019, 10:17:04 UTC

Thanks! However I think that was the one I have in there as the ID numbers I've listed are the participant IDs... toby's RX 5700 host is 8772813. Edit: And Wiggo beat me to it. :^)

I am hoping that, as with epidemiology, once a certain threshold in the population is crossed, "infectable hosts" will no longer statistically find each other, so a small percentage of responses as Wiggo noted may result in a much higher reduction in cross-validations. Unfortunately all we have is hope right now.
ID: 2023204 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023209 - Posted: 15 Dec 2019, 10:40:08 UTC - in response to Message 2023207.  
Last modified: 15 Dec 2019, 10:52:02 UTC

I logged the participant IDs that I contacted to prevent any ambiguity of who was contacted as there are many duplicates, for example there are dozens who are exactly "Eric".
Edit: Also in case any changed their display names.
ID: 2023209 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36564
Credit: 261,360,520
RAC: 489
Australia
Message 2023211 - Posted: 15 Dec 2019, 10:48:19 UTC

And I just followed Mr. Kevvy's lead. ;-)

Cheers.
ID: 2023211 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023238 - Posted: 15 Dec 2019, 18:33:20 UTC

Ok this is crazy. On ONE system alone (and my slowest one even) I have five (5) Invalid tasks from the last day alone, that were cross validated by the SAME two systems with RX5700s. how is that even possible. I was under the impression that WU distribution is more random than this.

https://setiathome.berkeley.edu/workunit.php?wuid=3788317704
https://setiathome.berkeley.edu/workunit.php?wuid=3788317722
https://setiathome.berkeley.edu/workunit.php?wuid=3788317634
https://setiathome.berkeley.edu/workunit.php?wuid=3788317640
https://setiathome.berkeley.edu/workunit.php?wuid=3788317810

The offenders:
No Name@Extraterrestrial Intelligence
PantherJon

I see they are already on the notification list, it's just wild that I would be matched up with the same two systems in such a short amount of time.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023238 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023240 - Posted: 15 Dec 2019, 18:42:34 UTC - in response to Message 2023238.  
Last modified: 15 Dec 2019, 18:45:27 UTC

got a new one to send a message to:

gunsnammo

Kevvy, just a suggestion. I would use the strikethrough to indicate the users who have replied and agreed to remove the GPUs. It will be more visually clear.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023240 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023241 - Posted: 15 Dec 2019, 18:44:11 UTC - in response to Message 2023238.  
Last modified: 15 Dec 2019, 18:48:27 UTC

I see this constantly with blocks of consecutive work units cross-validating from the same two hosts due to the volume of work they are requesting.

Thanks, Ian! gunsnammo 137399 has been pestered. :^)

Also I am saving strikethough for when I validate they are actually gone including ones that didn't reply.
ID: 2023241 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2023256 - Posted: 15 Dec 2019, 22:45:59 UTC - in response to Message 2023240.  
Last modified: 15 Dec 2019, 22:51:34 UTC

got a new one to send a message to:

gunsnammo

Kevvy, just a suggestion. I would use the strikethrough to indicate the users who have replied and agreed to remove the GPUs. It will be more visually clear.


+1

{edit}
. . Well maybe change them to a smaller font or underline them. Apparently Ian has the same problem I do, the difference between normal and italic is too small to readily identify in just the names.

Stephen

:)
ID: 2023256 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023257 - Posted: 15 Dec 2019, 22:53:55 UTC

I've also made PSA posts about this on overclockers.com and Reddit in r/SETI imploring anyone running RX5700/XT to please remove them from the project immediately. Hopefully at least a few more people see the posts and comply.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023257 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36564
Credit: 261,360,520
RAC: 489
Australia
Message 2023264 - Posted: 16 Dec 2019, 0:57:06 UTC

Damn, I got swamped by the usual subjects as well as a few more new 1's in the last 24hrs. :-(

Daniel frederikson 9813817
Esta 10624508
MadMikeDelta 8221690 (R9 390 with RX5700/XT)
mgg 279419
NYX.consulting 10503661
werewolf_007 10880222
ID: 2023264 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2023293 - Posted: 16 Dec 2019, 3:08:40 UTC - in response to Message 2023264.  

Excellent work again Wiggo and thanks! Pesterposted all five and list updated. :^)
ID: 2023293 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 2023306 - Posted: 16 Dec 2019, 6:23:13 UTC - in response to Message 2023238.  
Last modified: 16 Dec 2019, 6:24:34 UTC

Ok this is crazy. On ONE system alone (and my slowest one even) I have five (5) Invalid tasks from the last day alone, that were cross validated by the SAME two systems with RX5700s. how is that even possible. I was under the impression that WU distribution is more random than this.
That sort of thing actually happens a lot- as you both request work at the same time, and get it allocated at the same time, so you will get to share groups of WUs with a particular Wingman.
Grant
Darwin NT
ID: 2023306 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36564
Credit: 261,360,520
RAC: 489
Australia
Message 2023307 - Posted: 16 Dec 2019, 6:25:57 UTC - in response to Message 2023306.  

Ok this is crazy. On ONE system alone (and my slowest one even) I have five (5) Invalid tasks from the last day alone, that were cross validated by the SAME two systems with RX5700s. how is that even possible. I was under the impression that WU distribution is more random than this.
That sort of thing actually happens a lot- as you both request work at the same time, and get it allocated at the same time, so you will get to share groups of WUs with a particular Wingman.
And getting more often as time goes on. :-(

Cheers.
ID: 2023307 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2023312 - Posted: 16 Dec 2019, 7:21:30 UTC

ID: 2023312 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.