Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 20 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022596 - Posted: 10 Dec 2019, 5:37:13 UTC
Last modified: 10 Dec 2019, 5:41:25 UTC

I get burned regularly too by the 5700's being used on the project.
https://setiathome.berkeley.edu/workunit.php?wuid=3779924575
https://setiathome.berkeley.edu/workunit.php?wuid=3781287501
https://setiathome.berkeley.edu/workunit.php?wuid=3781287513
https://setiathome.berkeley.edu/workunit.php?wuid=3780346096
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022596 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2022610 - Posted: 10 Dec 2019, 9:29:41 UTC

If one looks at the task summary for any of the computers that "burned" Kieth one would see that they have very high "invalid" scores - yet another case for counting "invalid" as "error", as then such computers would have substantially fewer tasks per day. However such a step might, in the short term, affect Keith, but given his very high "valid" rate I guess he would escape the trap within the hour.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2022610 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2022695 - Posted: 11 Dec 2019, 1:55:06 UTC
Last modified: 14 Dec 2019, 2:45:30 UTC

Because all the ones I posted earlier went away, I will repost these in the hope that they do as well. Here are the cross-validated work units so far posted by others that are still in the database:

3779924575, 3781287501, 3781287513, 3780346096, 3780179788, 3780408798, 3782950900, 3779723247, 3779723283

Here are more from me: 3781128179, 3781089000, 3781088834

And here are some I found when I went hunting them: 3784413112, 3784300965, 3784224221, 378422425, 3783191211, 3773625441, 3782078981, 3783481584, 3782835475, 3782835493, 3783901857, 3783901627. 3782194669, 3783332726, 3783213329, 3782273057, 3784908129, 3699446653, 3784707978, 3784309715, 3785273063, 3784309634, 3784309640, 3784309580, 3784309586, 3784309552, 3784309558, 3784309563, 3784267651, 3784267716, 3784267728, 3784267626, 3784267669, 3784267687, 3784267699, 3785142714, 3761673400, 3784712782, 3781095787

3784267733 is the only work unit I have seen where the RX 5700 validated against a different platform (NVidia CUDA.) This is because it was a -9 overflow; they merely both agreed it could not complete.

And here are the participants I PMed (many thanks to Wiggo for providing a bunch of these names) advising that they have cards producing nothing but bad results (just so they don't get bothered twice I hope):

[AfZ]TomServo1 1483720
achimbln 138625
antoi 10856207
aridhol 10288747
Baldarov 9438496
Borktron 10682716
Brandon 8198367
calendir 9663884
Camiron 7449359
Carl 914781
Christopher 9894096
CoffeeSloth 10266313
Crisu 7833612
Derrek 219419
dsharbour 10858679
Dzsozi 8002127
Earendil 146007
egon.sauter 494566
eryndel 10878567
Foaming Mad Cow Industries 219464
fred 1935325
ghostbuster 564989
Haiko_N 9198068
HawkMedic 10838738
higemayuge 10790664
HMZ 9079227
Jeff 10639246
Jeffrey A. Smith 38247
Jerjes 1291426
Jorge Barrera 9650295
Juraxell 10864786
Kekke 46817
lastsworder 10878688
lupaslupas 10002927
Maulwurf 1516335
MaximusPrometheus 10240426
mnelsonx 272885
Niflhuem 113140
No Name@Extraterrestrial Intelligence 8116
Oriah 9838773
Otosan 8547502
PantherJon 9801065
Peter Furlong 7965665
phoenix7477 10773411
rgeens 10740140
Rocky 270621
Saint123 159425
Stephen Diem 36679
stogdan 10865456
StrayCat 177967
Strickland 34273
suhail ahmad 9878177
T66 3336343
toby 9442798
Tomik 8972653
Trezy 10367889
Tristan 9778349
VMS Software Inc 45538
xakei 10823091
Zac 100334866
ID: 2022695 · Report as offensive     Reply Quote
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 2022705 - Posted: 11 Dec 2019, 2:58:23 UTC
Last modified: 11 Dec 2019, 2:58:38 UTC

My very stable Nvidea host got thos invalid
My 2 wingmen were both AMD hosts with mostly invalid results but they matched up to each other.
https://setiathome.berkeley.edu/workunit.php?wuid=3780408798
ID: 2022705 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2022711 - Posted: 11 Dec 2019, 3:36:54 UTC - in response to Message 2022705.  

My very stable Nvidea host got thos invalid
My 2 wingmen were both AMD hosts with mostly invalid results but they matched up to each other.
https://setiathome.berkeley.edu/workunit.php?wuid=3780408798


Yup, welcome to the club. All my invalids are matched up with ATI cards.
ID: 2022711 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2022793 - Posted: 11 Dec 2019, 20:16:00 UTC
Last modified: 11 Dec 2019, 20:25:18 UTC

After hunting for bad cross-validations: it's hopeless to manually remove them. There are too many of these bad hosts, they produce invalids every 10-20 seconds so the cross-validation rate is horrific. Here is a computer showing 489 (edit: it went up to 508 in the time it took to write this!) valid tasks as of when I checked. 90% of them are from the RX 5700 and every single one of them cross-validated and is garbage... all from one single card in a few days. And there are probably hundreds of them like this.

Thus there are probably tens if not hundreds of thousands of garbage results in the science database now, which will eventually corrupt Nebula.

I think that this is the most serious threat to the data integrity of this project it has ever faced, as bad as a successful deliberately malicious attack on it. These cards need to be banned from getting any work ASAP... this was done with other broken platforms in the past so the capability is there.
ID: 2022793 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022797 - Posted: 11 Dec 2019, 20:40:22 UTC - in response to Message 2022793.  

After hunting for bad cross-validations: it's hopeless to manually remove them. There are too many of these bad hosts, they produce invalids every 10-20 seconds so the cross-validation rate is horrific. Here is a computer showing 489 (edit: it went up to 508 in the time it took to write this!) valid tasks as of when I checked. 90% of them are from the RX 5700 and every single one of them cross-validated and is garbage... all from one single card in a few days. And there are probably hundreds of them like this.

Thus there are probably tens if not hundreds of thousands of garbage results in the science database now, which will eventually corrupt Nebula.

I think that this is the most serious threat to the data integrity of this project it has ever faced, as bad as a successful deliberately malicious attack on it. These cards need to be banned from getting any work ASAP... this was done with other broken platforms in the past so the capability is there.


+1

Stephen

! ! !
ID: 2022797 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022800 - Posted: 11 Dec 2019, 20:54:18 UTC

Has anyone emailed Eric about it? He was apparently responsible for the GPU/CPU limits change. So maybe he will act on this too.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022800 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2022801 - Posted: 11 Dec 2019, 20:55:23 UTC - in response to Message 2022800.  

Has anyone emailed Eric about it?


Twice... I think he's going through this thread as well as ones that are posted here are getting removed (see the strikeout ones in my list.)
ID: 2022801 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022802 - Posted: 11 Dec 2019, 21:03:24 UTC - in response to Message 2022801.  

Are you sure they are getting removed from the data set? Tasks do have a shelf life of what’s visible on the website right?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022802 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2022803 - Posted: 11 Dec 2019, 21:11:24 UTC - in response to Message 2022802.  

You may be correct... some of them were disappearing almost immediately, but it is possible they were just purged. Even if they are being removed, it's a drop in the bucket.
ID: 2022803 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022804 - Posted: 11 Dec 2019, 21:16:42 UTC - in response to Message 2022803.  

Yeah I thought validated tasks only hung around for 1-2 days on the website.

But I agree with you that something needs to be done about it.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022804 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2022806 - Posted: 11 Dec 2019, 21:34:31 UTC - in response to Message 2022804.  
Last modified: 11 Dec 2019, 21:35:49 UTC

The very unfortunate part of the purging is that it would appear the platform info is lost with the purged results, so only the work unit info and stderr would go into the science database. Thus it may be be nigh-impossible to get rid of these garbage results after a few days once they are submitted because it won't be able to be determined that two RX 5700s "validated" them.
ID: 2022806 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2022810 - Posted: 11 Dec 2019, 21:59:25 UTC - in response to Message 2022695.  
Last modified: 11 Dec 2019, 22:00:20 UTC

[AfZ]TomServo1 1483720
achimbln 138625
antoi 10856207
Baldarov 9438496
Camiron 7449359
CoffeeSloth 10266313
dsharbour 10858679
Dzsozi 8002127
Earendil 146007
egon.sauter 494566
Foaming Mad Cow Industries 219464
ghostbuster 564989
higemayuge 10790664
HMZ 9079227
Jeffrey A. Smith 38247
Jerjes 1291426
lupaslupas 10002927
Maulwurf 1516335
Niflhuem 113140
No Name@Extraterrestrial Intelligence 8116
Oriah 9838773
Otosan 8547502
phoenix7477 10773411
Stephen Diem 36679
StrayCat 177967
Strickland 34273
Tomik 8972653
Tristan 9778349
VMS Software Inc 45538
xakei 10823091
Zac 100334866
I can add several more users to that list. :-(

Brandon 8198367
Christopher 9894096
Derrek 219419
fred 1935325
grcpool.com 10434153
HawkMedic 10838738
Jeff 10639246
Jorge Barrera 9650295
Kekke 46817
mnelsonx 272885
Peter Furlong 7965665
rgeens 10740140
Saint123 159425
toby 9442798
Trezy 10367889

No cheers here.
ID: 2022810 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2022811 - Posted: 11 Dec 2019, 22:02:24 UTC - in response to Message 2022810.  
Last modified: 12 Dec 2019, 10:07:40 UTC

Awesome work... thank you! I'll send them all some pesterpost. :^)
Edit: Well, not grcpool.com as that's a group ID lol.

All pestered... also StrayCat has replied and indicated has disabled GPU computing! Yay!
Edit: and rgeens!
ID: 2022811 · Report as offensive     Reply Quote
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 2022817 - Posted: 11 Dec 2019, 22:52:36 UTC

I wrote to some people a private message - but didn't get any answer. :(
ID: 2022817 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2022825 - Posted: 11 Dec 2019, 23:01:10 UTC - in response to Message 2022817.  

I wrote to some people a private message - but didn't get any answer. :(
On average I only get 1 reply from about every 20 PM's that I've sent out and I've sent out hundreds over the years so don't feel disheartened about it. ;-)

Cheers.
ID: 2022825 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2022884 - Posted: 12 Dec 2019, 19:21:08 UTC - in response to Message 2022810.  

[AfZ]TomServo1 1483720
achimbln 138625
antoi 10856207
Baldarov 9438496
Camiron 7449359
CoffeeSloth 10266313
dsharbour 10858679
Dzsozi 8002127
Earendil 146007
egon.sauter 494566
Foaming Mad Cow Industries 219464
ghostbuster 564989
higemayuge 10790664
HMZ 9079227
Jeffrey A. Smith 38247
Jerjes 1291426
lupaslupas 10002927
Maulwurf 1516335
Niflhuem 113140
No Name@Extraterrestrial Intelligence 8116
Oriah 9838773
Otosan 8547502
phoenix7477 10773411
Stephen Diem 36679
StrayCat 177967
Strickland 34273
Tomik 8972653
Tristan 9778349
VMS Software Inc 45538
xakei 10823091
Zac 100334866
Brandon 8198367
Christopher 9894096
Derrek 219419
fred 1935325
grcpool.com 10434153
HawkMedic 10838738
Jeff 10639246
Jorge Barrera 9650295
Kekke 46817
mnelsonx 272885
Peter Furlong 7965665
rgeens 10740140
Saint123 159425
toby 9442798
Trezy 10367889
And a few more.

lastsworder 10878688
MaximusPrometheus 10240426
stogdan 10865456
suhail ahmad 9878177

Still no cheers here.
ID: 2022884 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2022919 - Posted: 13 Dec 2019, 0:23:33 UTC - in response to Message 2022884.  
Last modified: 13 Dec 2019, 7:16:35 UTC

And a few more.

lastsworder 10878688
MaximusPrometheus 10240426
stogdan 10865456
suhail ahmad 9878177


Pestered and thanks again! And also thanks to dsharbour and VMS Software Inc who have joined StrayCat and rgeens as the four who have confirmed they are turning off GPU computing for now!
Edit: Also lastsworder, Derrek and Camiron!
ID: 2022919 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022930 - Posted: 13 Dec 2019, 2:16:24 UTC

apparently AMD released some new drivers today or yesterday. was supposed to be a big update. I wonder if they helped anything?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022930 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 20 · Next

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.