Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028340 - Posted: 18 Jan 2020, 16:42:18 UTC - in response to Message 2028249.  

And the list of applications at https://setiathome.berkeley.edu/apps.php for main shows windows 8.24 to be on main.
I notice there isn't one for Linux there yet.

I don't know if Eric can apply the same command line fix to the executable and strip out the offending math optimization that he did to the Windows executable. Probably means at minimum he would need access to a Linux host which I don't think he has.

But it needs to be done since Linux hosts had the same issues with the 5700XT as the Windows ones. They were just not as visible since the number of Linux hosts is a fraction of the Windows hosts.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028340 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028344 - Posted: 18 Jan 2020, 17:08:09 UTC - in response to Message 2028340.  

yeah I had the same thought. since Eric didn't recompile the app from scratch, he just pulled out the offending flags from the EXE. I'm not sure if it's possible to do that on the linux apps.

it's also possible that it might not even be necessary if the linux apps weren't built with those offending options to begin with. also on the linux side, it doesn't appear that AMD has even released a driver for linux that addresses this issue. their release notes don't even mention it like the Windows driver notes do.

can anyone post a Linux system that is having this problem? or post any linux system running this card? or is it just speculation that the same issue exists on linux? I'd like to see proof of it.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028344 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2028381 - Posted: 18 Jan 2020, 20:22:56 UTC

I came across 2, fredi 7913572, and the other was Anonymous.

Cheers.
ID: 2028381 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028393 - Posted: 18 Jan 2020, 21:00:40 UTC - in response to Message 2028344.  

Half a dozen Einstein Linux users said the new drivers fixed the issues with the 5700 cards. They didn't need a new app, just new drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028393 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2028394 - Posted: 18 Jan 2020, 21:02:02 UTC - in response to Message 2028393.  

Half a dozen Einstein Linux users said the new drivers fixed the issues with the 5700 cards. They didn't need a new app, just new drivers.
So the new Application requirement looks like just a Seti thing.
Grant
Darwin NT
ID: 2028394 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028400 - Posted: 18 Jan 2020, 21:12:47 UTC - in response to Message 2028381.  
Last modified: 18 Jan 2020, 21:14:10 UTC

it does indeed appear that fredi's system is/was having the problem on linux.

we need confirmation that the app was built with the bad options. it may not have been. but without a linux driver fix we can't know yet, we need to wait for the updated driver there. but the navi cards on Linux are incredibly scarce it seems and AMD might not be aware it's still an issue on Linux.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028400 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2028403 - Posted: 18 Jan 2020, 21:24:39 UTC

Since, according to his stats page fredi's Linux computer last contact was 2nd Jan then it is fairly safe to say that any GPU task returned were done on the old (8.22) app.

Looking at his Windows computer I would hazard a guess that it is the same hardware and he may have abandoned Linux (for now) - the Windows one came to life 15th Jan, and has (on the face of it) the same hardware spec as his Linux one (which had been around for less than a month).
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2028403 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028404 - Posted: 18 Jan 2020, 21:32:51 UTC - in response to Message 2028403.  

well considering there is no 8.24 Linux app, yeah, he couldnt run anything other than the 8.22 app, even now. that's the whole topic at hand, whether the Linux app even needs updating. I was just looking for an example of a Linux host with this issue. We still need confirmation on if the Linux app needs a fix, or just a driver fix, or both. I still think that AMD is likely unaware that the problem is affecting Linux as well.

his windows system, does not appear to be processing any GPU work anyway.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028404 · Report as offensive     Reply Quote
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 2028418 - Posted: 18 Jan 2020, 23:36:00 UTC

Thanks for all the sleuthing. I ask because I'm planning a GPU purchase soon, though it sounds like the solution is in progress, either way. 5700 XT is of course what is being recommended, though I have my eye out for a Vega 56. I've been running ATi/AMD GPUs for SETI on Linux for about as long as it has been possible. I've had a lot of problems specifically with BOINC and especially SETI in the past several years, but I think we are on the upswing now. I'm *always* willing to test if that is helpful.
ID: 2028418 · Report as offensive     Reply Quote
fredi

Send message
Joined: 11 Aug 04
Posts: 4
Credit: 7,983,668
RAC: 1
Finland
Message 2028457 - Posted: 19 Jan 2020, 8:33:15 UTC - in response to Message 2028403.  

Since, according to his stats page fredi's Linux computer last contact was 2nd Jan then it is fairly safe to say that any GPU task returned were done on the old (8.22) app.

Looking at his Windows computer I would hazard a guess that it is the same hardware and he may have abandoned Linux (for now) - the Windows one came to life 15th Jan, and has (on the face of it) the same hardware spec as his Linux one (which had been around for less than a month).


It is the same computer just dual booting because gaming turned out to not work very well on linux with 5700XT compared to my old GTX970.
Since Windows is still needed I installed Boinc on it also.. did I get it right that with new drivers and app version the computation should work with 5700XT card now?
ID: 2028457 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028458 - Posted: 19 Jan 2020, 8:39:52 UTC - in response to Message 2028457.  

did I get it right that with new drivers and app version the computation should work with 5700XT card now?

Yes, all indications that the problem with the Navi cards and compute have been solved, (for Windows at least), with the new 8.24 AMD app and the latest Adrenalin 2020 Edition 20.1.2 Optional drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028458 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028470 - Posted: 19 Jan 2020, 11:58:31 UTC - in response to Message 2028458.  

did I get it right that with new drivers and app version the computation should work with 5700XT card now?

Yes, all indications that the problem with the Navi cards and compute have been solved, (for Windows at least), with the new 8.24 AMD app and the latest Adrenalin 2020 Edition 20.1.2 Optional drivers.

+1, if you're staying on Windows, just update to the latest drivers and re-enable the GPU for processing. you should be sent the 8.24 app automatically.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028470 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2028502 - Posted: 19 Jan 2020, 16:01:22 UTC

I've done a quick update to the Lunatics installer, to include the new ATI files and mention the NVidia driver bug in the documentation. If anyone with one of the affected cards is willing to give it a quick test, could they PM me for a download link, please? (64-bit only so far). Ta.
ID: 2028502 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2028510 - Posted: 19 Jan 2020, 16:32:24 UTC - in response to Message 2028502.  

Thanks, Richard... I'm busily PMing all the known hosts' owners and I just ran into an Anonymous Windows one doubtless on Lunatics so your update was fortuitous. I'll link to this post in the private message to them.
ID: 2028510 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028518 - Posted: 19 Jan 2020, 17:21:20 UTC - in response to Message 2028502.  

I've done a quick update to the Lunatics installer, to include the new ATI files and mention the NVidia driver bug in the documentation. If anyone with one of the affected cards is willing to give it a quick test, could they PM me for a download link, please? (64-bit only so far). Ta.

that should be helpful. I was just looking at some of the bad systems and noticed a few were running anonymous platform and was thinking how they would get the new app if they stayed on it.

+1
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028518 · Report as offensive     Reply Quote
Profile Recedham

Send message
Joined: 20 May 99
Posts: 6
Credit: 1,577,424
RAC: 1
United Kingdom
Message 2028536 - Posted: 19 Jan 2020, 18:59:55 UTC

What about the World Community Grid program, which is just BONIC under a different name. Will this be updated as well?
As I'm using WCG program and not BOINC.
I'm a Drunk
ID: 2028536 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2028537 - Posted: 19 Jan 2020, 19:01:43 UTC - in response to Message 2028536.  
Last modified: 19 Jan 2020, 19:03:27 UTC

The BOINC platform code itself isn't affected, rather it's the client executable that BOINC (or WCG) downloads. Each project will have to recompile their client for the new drivers to set the required flag. BOINC will then download the new client automatically (unless they have an "anonymous" manually installed client lke the Lunatics one here.)
ID: 2028537 · Report as offensive     Reply Quote
Profile Recedham

Send message
Joined: 20 May 99
Posts: 6
Credit: 1,577,424
RAC: 1
United Kingdom
Message 2028538 - Posted: 19 Jan 2020, 19:11:05 UTC - in response to Message 2028537.  

Thank you. You're a star!
I'm a Drunk
ID: 2028538 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2028540 - Posted: 19 Jan 2020, 19:20:14 UTC - in response to Message 2028538.  
Last modified: 20 Jan 2020, 1:44:36 UTC

Lol thanks... and a rotating one. ;^)

OK, here are the new bad host owners I contacted for the first time today if required (some were not):

aplrapid 7807183
Leigh Green 123169
Vytautas Liesis 173783
Adam Tadian 9295811
firecrypt 10881832 (already had updated)
Capizzi 10504781
AshlandPony 9004257
cprince1977 10886783 (already had updated)
Illyria 10845292
jcr 3428
teargasm 10886461
BigDaddyDave 265982
killerepicprofurrygamer6969 10885981
dcox 10884993
iinkabob 102908
Simgiov 8796082
Hozer 61288 (no affected GPUs)
unbound 10885610

Thankfully I was able to give them a solution immediately rather than just to disable their GPU and wait. I also re-contacted every single one of the known bad host owners to advise them of the solution as well. And here is the updated list:

李溪伦 9302807
[AfZ]TomServo1 1483720
achimbln 138625
Adam Tadian 9295811
Alexandr Galushchenko 9609912
alffrommars 9024750
AMD Jesus 70887
antoi 10856207
aplrapid 7807183
aridhol 10288747
Arnab 10093567
AshlandPony 9004257
Baldarov 9438496
BigDaddyDave 265982
Bigthor 480399
Borktron 10682716
Brandon 8198367
calendir 9663884
Camiron 7449359
Capizzi 10504781
Carl 914781
Christopher 9894096
CoffeeSloth 10266313
cprince1977 10886783
Crisu 7833612
dalex 10881818
Daniel Conrad Broom 8059986
Daniel frederikson 9813817
Daniel Penz 91581
Dank 49802
dcox 10884993
Derrek 219419
Doc_Jebus 10863878
dsharbour 10858679
Dzsozi 8002127
Earendil 146007
egon.sauter 494566
Eirikafh 10883218
Eric 9157146
eryndel 10878567
Esta 10624508
firecrypt 10881832
Foaming Mad Cow Industries 219464
Ffred 1935325
fredi 7913572
George Ko 639539
ghostbuster 564989
gunsnammo 137399
Haiko_N 9198068
HawkMedic 10838738
higemayuge 10790664
HMZ 9079227
Hozer 61288
iinkabob 102908
Illyria 10845292
jcr 3428
Jeff 10639246
Jeffrey A. Smith 38247
Jerjes 1291426
JohnDoe 9166075
Jorge Barrera 9650295
Juraxell 10864786
Kekke 46817
killerepicprofurrygamer6969 10885981
knutella 9880098
lastsworder 10878688
Leigh Green 123169
lupaslupas 10002927
MadMikeDelta 8221690
Maulwurf 1516335
MaximusPrometheus 10240426
mgg 279419
mnelsonx 272885
Niflhuem 113140

No Name@Extraterrestrial Intelligence 8116
NYX.consulting 10503661
Oriah 9838773

Otosan 8547502
PantherJon 9801065
Peter Furlong 7965665
phoenix7477 10773411
Rafael 8249913
rame 10738
rAttmAniA 9002301
Recedham 954834

rgeens 10740140
Richard 8565733
Richard Hartland 9781177
Rocky 270621

Saint123 159425
Simgiov 8796082
SimplisticNerd 100334866 (formerly known as "Zac")
Stephen Diem 36679
stogdan 10865456
StrayCat 177967

Strickland 34273
suhail ahmad 9878177
Swagstergo 10882690
T66 3336343
Tanis 10773581
teargasm 10886461
toby 9442798
TomasFraus 8445239
Tomik 8972653
Trezy 10367889
Tristan 9778349
unbound 10885610
vleermuis 1295921
VMS Software Inc 45538

Vytautas Liesis 173783
werewolf_007 10880222
xakei 10823091

Italicized names have replied and indicated they are disabling their affected GPUs.
Struck-through names are confirmed to no longer be producing these bad results, ie via disabling GPU computing.
ID: 2028540 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029528 - Posted: 27 Jan 2020, 17:17:34 UTC

OK guys, let's make this an open Beta test.

Lunatics Installer v0.46 64-bit is available for testing.

For AMD/ATI: updated all apps to match the v8.24 stock release, with safety patch for RX 5700-series 'NAVI' GPUs. You must also upgrade to the Adrenalin 2020 Edition 20.1.2 Optional driver or later.

For NVidia: includes the 8.16 (opencl_nvidia_sah) application from Beta. This is intended as a temporary workround for drivers above 431.60 on Windows 10. If Microsoft has updated your drivers, or you have updated them yourself for gaming, use the default option 'MB8_win_x86_SSE3_OpenCL_NV_sah_r3486' (the original name for the same file). If you have older drivers, or if you are running a different version of Windows, go ahead and choose the 'SoG' app - with minor update to r3584.

The following link is for a Google Drive folder with two files: the main Installer, and a small configuration file. If you put both files in the same folder, and run the installer, it will run in 'test mode': it will extract the files you request and go through the installation process, but place the results into a separate test folder, leaving your main BOINC installation untouched.

If you want to perform the actual installation for real, simply delete the configuration file.

The installation process (which is unchanged) has proved itself over the years, so you shouldn't have any problems. The installer will locate your BOINC installation, stop it, install the new files, and restart it. The restart process sometimes fails: if that happens, just wait a few seconds after the installer has closed, and restart it manually. Or you may prefer to stop and then restart BOINC manually - either will do.

The installer is designed to preserve all SETI tasks in your cache, and run them with the new applications. This is the potentially difficult part, and I'd like to hear if there are any problems with disappearing caches. Because it's a possibility, and we're likely to go into a long maintenance outage within 24 hours, you may prefer to defer testing until SETI is back up and running.

Download from Lunatics v0.46 installer test, and enjoy.
ID: 2029528 · Report as offensive     Reply Quote
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 · Next

Message boards : Number crunching : Flakey AMD/ATI GPUs, including RX 5700 XT, Cross Validating, polluting the Database


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.