Computation error in AstroPulse v7 v7.08 (opencl_ati_100) x86_64-pc-linux-gnu

Message boards : Number crunching : Computation error in AstroPulse v7 v7.08 (opencl_ati_100) x86_64-pc-linux-gnu
Message board moderation

To post messages, you must log in.

AuthorMessage
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1970709 - Posted: 16 Dec 2018, 20:10:35 UTC

For about three years, multiple BOINC projects have not been able to crunch on Linux with AMD GPU hardware and OSS driver stack. I've reported this elsewhere, and to SETI@Home, but mostly elsewhere.

Last Sunday, I got updates from my distribution to MESA and, I think, LLVM and maybe some related kernel updates. Now Einstein@Home GPU app is working, again! So, I tried SETI, but it is still broken.

<core_client_version>7.10.1</core_client_version>
<![CDATA[
<message>
too many boinc_temporary_exit()s</message>
<stderr_txt>
OpenCL platform detected: Mesa
OpenCL platform detected: The pocl project
WARNING: BOINC supplied wrong platform!
ERROR: OpenCL kernel/call 'clGetDeviceIDs' call failed (-1) in file ../../src/GPU_lock.cpp near line 819.
Waiting 30 sec before restart...
OpenCL platform detected: Mesa
OpenCL platform detected: The pocl project
WARNING: BOINC supplied wrong platform!
...skipping repeated lines...
ERROR: OpenCL kernel/call 'clGetDeviceIDs' call failed (-1) in file ../../src/GPU_lock.cpp near line 819.
Waiting 30 sec before restart...
OpenCL platform detected: Mesa
OpenCL platform detected: The pocl project
WARNING: BOINC supplied wrong platform!
ERROR: OpenCL kernel/call 'clGetDeviceIDs' call failed (-1) in file ../../src/GPU_lock.cpp near line 819.
Waiting 30 sec before restart...
OpenCL platform detected: Mesa
OpenCL platform detected: The pocl project
WARNING: BOINC supplied wrong platform!
ERROR: OpenCL kernel/call 'clGetDeviceIDs' call failed (-1) in file ../../src/GPU_lock.cpp near line 819.
Waiting 30 sec before restart...

</stderr_txt>
]]>

ID: 1970709 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970765 - Posted: 17 Dec 2018, 6:19:13 UTC - in response to Message 1970709.  

Since you have your computers hidden it is very hard for anyone to help you try to diagnose your issue.
A proud member of the OFA (Old Farts Association).
ID: 1970765 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1970766 - Posted: 17 Dec 2018, 6:23:59 UTC - in response to Message 1970765.  

I am more than willing to provide any information the community needs to help me. If you need something, just ask.

I have provided the error task output here, already. What else would you like to see?
ID: 1970766 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1970767 - Posted: 17 Dec 2018, 6:35:34 UTC

What GPU(s) do you have installed - each family behaves in a different manner. What applies to nVidia does not apply to AMD/ATI GPUs.
What CPU do you have?
What was the previous version of the OS?

The reason for asking you to un-hide your computers is that it allows most of these questions to be answered in the context of your computer - un-hiding does NOT give access to them, only the information stored by SETI about them.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1970767 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1970770 - Posted: 17 Dec 2018, 6:56:51 UTC - in response to Message 1970767.  

I understand. This is the current situation, as BOINC sees it.

Sun 09 Dec 2018 06:19:05 PM PST | | OpenCL: AMD/ATI GPU 0: AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.27.0, 4.19.6-300.fc29.x86_64, LLVM 7.0.0) (driver version 18.2.6, device version OpenCL 1.1 Mesa 18.2.6, 8192MB, 8192MB available, 3709 GFLOPS peak)

My update was just a weekly update, just some bug fixes. I didn't update major OS version to get this fix, AFAIK. It looks like LLVM and MESA got major version updates, however. A couple months ago I upgraded from F28 to F29, though. It is possible this fix occurred then, but I really doubt it. The last time SETI GPU app worked was F23 or something.
ID: 1970770 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1970891 - Posted: 18 Dec 2018, 6:59:18 UTC

Okay, so I've been working with Tom, a bit, and here is some information we discussed:

S@H 8 app has been crunching along, no problem. Also, AP sse2 is good. Also other projects CPU apps. I have only tried E@H GPU app since I noticed it got fixed, not other project's GPU apps, although some of those also used to work. I use E@H as my test case to see if it is BOINC in general or just SETI that is broken. Not 100% conclusive, but... If you want, I can try MW.

I'm using BOINC GUI 7.10.1, but in all my nearly 20 years working on the project, I don't think the GUI ever caused a problem with the apps. I rebuilt the GUI, anyway, so, I'll update it soon.

One of the problems is that I cannot really instrument or debug the app, myself. I've tried to get help doing that, but I've never received instructions I could follow without a lot of research I would have to do by myself. I want to give you a crash dump or backtrace that you can understand, but I cannot figure out how to bring that about.
ID: 1970891 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970904 - Posted: 18 Dec 2018, 8:50:48 UTC - in response to Message 1970770.  

DRM 3.27.0, 4.19.6-300.fc29.x86_64, LLVM 7.0.0
You like living on the bleeding edge I see. Yes there have been MAJOR changes to both LLVM and MESA. Clang compiler now in use. I believe the MESA updates are in a daily flux of commits and changes based on the notices and articles at Phoronix.com. You should be reading that site if you are installing the latest kernels, drivers and platforms.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970904 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1971058 - Posted: 19 Dec 2018, 17:19:24 UTC - in response to Message 1970904.  

Well, I wouldn't say that. If I was on the bleeding edge, I'd be running the pre-release kernel from the AMDGPU integration branch, which I've been meaning to try, but I don't have the time to do that. That's actually what I was told to do the last time I asked the Phoronix community for help with this problem!

I'm running the current, stable release of Fedora; nothing special. There is a whole release that is even more current, called rawhide, that is ahead of me.

I used to have time for reading phoronix, too. I check in periodically, and there are always MESA and driver updates in the works. This problem started *before* AMDGPU got released, before kernel 4.10 or something. I thought it would get fixed in 4.12, then 4.14, and so on. I've lost track. Whatever state my system is in now is better than it has been in three years. And, there is no going back because I now have a POLARIS-based AMD card and it is supposed to run with AMDGPU driver, not radeonhd. So, I'm glad to hear that there were major updates.

My point is, I don't think it the driver stack, now. This stack used to work, it got broken, and now other BOINC apps are working again. The stack looks okay. So, whatever it is, it is specific to the implementation in opencl_ati_100 from S@H.

You guys who are official testers, how do you "test" for the project? Do you get new, development apps via the regular BOINC system? Or do you have to run a separate thing? I'm wondering if you know how to get work units to test with and how to run the S@H apps manually for testing purposes. Without that, I cannot debug further, myself.

The other option is that someone will recognize this error. I tried, but didn't get to the bottom of it. The error from clGetDeviceIDs() is actually CL_DEVICE_NOT_FOUND. Combine this with the fact that AP complains that BOINC gave it wrong information and it makes me thing the code doesn't understand the info it is getting from the API. But, i cannot find the code to look at, so, I'm not sure what to do.
ID: 1971058 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971063 - Posted: 19 Dec 2018, 18:02:42 UTC - in response to Message 1971058.  

One of our team members has developed a nice offline benchmark program. He runs ATI cards and one of his reference apps is ATI along with its expected result. All of Seti app developers are volunteers. No official support from the project. They develop the apps and post it to Beta for testing or just test it on Main with close friends. Raistmer is the primary OpenCL developer. There was another volunteer developer that developed for MAC and Linux but he has dropped development for Linux.

The apps codebase all are available in the repository. So you could try and compile your own version.

https://github.com/BOINC/boinc
https://github.com/Ricks-Lab/benchMT
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971063 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971065 - Posted: 19 Dec 2018, 18:10:03 UTC

You also might want to post to the specific Linux/Unix Help forum about your problem. https://setiathome.berkeley.edu/forum_forum.php?id=13

There is not very good Linux support from the support staff as they are primarily Windows based. But fellow Linux users do jump in to assist when they can.

I forgot to ask to clarify, the problem is only with the AP OpenCL app and not the MB OpenCL app? Your thread title leads me to believe the former.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971065 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1971071 - Posted: 19 Dec 2018, 18:47:31 UTC - in response to Message 1971065.  

Thanks for your help, Keith!

I don't have MB GPU app because there is not any official one, AFAIK. I've never got one through BOINC.

I used to run Lunatics optimized apps, but that was a lot of work, maintaining the xml file with the right override codes. I stopped doing that, and I just run the official apps now. I hoped this would provide better support, but that has not worked out.

Do you have an GPU MB app running?

I see the benchMT code. Thanks. That looks like something I can really use. There is a test WU in there, too. I bet this is what I need.

As for the official app code, I still don't see it. I have the boinc repo already and I build the GUI from it so I have the latest one. But I don't see any apps there, if that's what you meant. There are some code stubs, which may also be useful. I still don't see how to get the code to build. I have some old Lunatics versions of the code, but not MB v8. I found and old note to myself that says "try compiling my own app from SVN source". So, maybe I had the SVN repo at one point, but I don't see it anywhere on my computer now.
ID: 1971071 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1971073 - Posted: 19 Dec 2018, 18:49:02 UTC - in response to Message 1971071.  

Oh, sorry. The only two GPU apps I've tried recently are SETI AP and the Einstein@Home one. One works and the other doesn't.
ID: 1971073 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971105 - Posted: 19 Dec 2018, 20:47:27 UTC - in response to Message 1971071.  

Thanks for your help, Keith!

I don't have MB GPU app because there is not any official one, AFAIK. I've never got one through BOINC.


There are most certainly stock MB Linux apps for ATI cards. From the Applications web page on the site.

Linux/x86_64 8.22 (opencl_ati5_cat132) 5 Jan 2017, 23:13:45 UTC 35 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati5_nocal) 5 Jan 2017, 23:13:45 UTC 279 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati5_sah) 5 Jan 2017, 23:13:45 UTC 0 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati5_SoG) 5 Jan 2017, 23:13:45 UTC 2 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati5_SoG_cat132) 5 Jan 2017, 23:13:45 UTC 4 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati5_SoG_nocal) 5 Jan 2017, 23:13:45 UTC 440 GigaFLOPS
Linux/x86_64 8.22 (opencl_atiapu_sah) 5 Jan 2017, 23:13:45 UTC 140 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati_cat132) 5 Jan 2017, 23:13:45 UTC 30 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati_nocal) 5 Jan 2017, 23:13:45 UTC 73 GigaFLOPS
Linux/x86_64 8.22 (opencl_ati_sah) 5 Jan 2017, 23:13:45 UTC 9 GigaFLOPS

I used to run Lunatics optimized apps, but that was a lot of work, maintaining the xml file with the right override codes. I stopped doing that, and I just run the official apps now. I hoped this would provide better support, but that has not worked out.

There is a direct access to the stock apps directory but I can't find the link for the life of me. I'll have to ask Richard or Jord.

Do you have an GPU MB app running?

Yes, I run the special Nvidia CUDA9.2 and CUDA10 apps on the gpus. But they won't help you with an OpenCL app and ATI card.

I see the benchMT code. Thanks. That looks like something I can really use. There is a test WU in there, too. I bet this is what I need.

Have you tried the stock ATI app that is in Rick's MT benchmark utility? It is in the Ref Apps directory. I know that there was somebody else that had compiled the ATI SoG app for the new Vega cards but I will have to search through Number Crunching to find it.

As for the official app code, I still don't see it. I have the boinc repo already and I build the GUI from it so I have the latest one. But I don't see any apps there, if that's what you meant. There are some code stubs, which may also be useful. I still don't see how to get the code to build. I have some old Lunatics versions of the code, but not MB v8. I found and old note to myself that says "try compiling my own app from SVN source". So, maybe I had the SVN repo at one point, but I don't see it anywhere on my computer now.

I'll have to find the post by Jord or Richard or whoever answered my question about where the application codebase is located. I thought I had bookmarked it but I guess I didn't. The applications have to be published because of GPL and all that so the codebase is available. Let me do the sleuthing to find all the relevant posts again.

There are tons of people using RX480 cards in Seti. So the problem is with your software somewhere. Platform support. Driver support. Something. The history of ATI drivers working with BOINC has always been iffy from all my reading of posts in various project forums. One of the reasons I have always stuck with Nvidia cards. I didn't need the complications from trying to figure out what is required to get ATI cards to run.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971105 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971109 - Posted: 19 Dec 2018, 20:57:47 UTC
Last modified: 19 Dec 2018, 21:24:52 UTC

At least found the thread where someone had compiled the stock OpenCL MB SoG app for ROCm.
how to install newly compiled seti app in linux?

Now to find the url for the application directory codebase.

Ok, finally found it. https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AKv8?rev=2896&order=name
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971109 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1971120 - Posted: 19 Dec 2018, 23:10:24 UTC - in response to Message 1971109.  

Okay, now I'm really upset. So, I took another look at the apps page and I see the apps you are talking about, which is terrible, since I don't think I've ever received any MB GPU apps. I'm pretty sure they did not exist 10 years ago, but, I could be wrong there, too. This is a disaster. I'll try over the Linux forum.

However, I think that thread you linked shows someone with the same problem I am expressing, which is that, he had to compile his own GPU app and force BOINC to use it.

But, yes, now I see the SVN repo that I once had bookmarked, but lost. Thanks. I see how I could use this to test myself, I think.
ID: 1971120 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1971124 - Posted: 19 Dec 2018, 23:31:49 UTC - in response to Message 1971120.  

Okay, seems like I keep forgetting stuff I learn. The OSS stack just isn't supported anymore, according to very recent posts in Linux forum. I'm not sure what is the nature of the issue, but I think that ends this discussion.

Someday, when I have lots of time, I'll figure out what to do about it.
ID: 1971124 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971125 - Posted: 20 Dec 2018, 0:09:35 UTC - in response to Message 1971124.  

OK Paul. Yes I have terrible memory now too. If I don't bookmark something or make a text file I can't remember anything. I don't know what a OSS stack is but if that is what you have to use and it isn't supported anymore I guess you are out of luck. The stock MB gpu apps seem to work well for most users here in Seti. I am paired with RX480 and 580 wingmen all the time and they produce good science. I just produce the same result about ten times faster with my Nvidia cards and CUDA9/10 apps.

I am slowly learning Linux. I have only been using it for about a year now. I got convinced because of the performance upgrade with using a Linux only optimized science application and I was constantly having issues with Windows 10 so decided to finally cut the cord with Microsoft. I was only using Windows since 2011 as I was on OS/2 and eComStation for ten years before that. So going back to an OS that used the command line was not so traumatic for me as for anybody that only has ever used Windows.

So I can conclude this thread has been resolved for you and that any further discussion is better located over in the dedicated Linux forums.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971125 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1971143 - Posted: 20 Dec 2018, 3:30:10 UTC - in response to Message 1971125.  

Yes, I agree with your conclusion.

Wow, OS/2. That's a blast from my past. I did a little OS/2 support in my youth.

I'm glad to hear about your Linux conversion and that there is some sense in the computational computing community that Linux is a good choice, enough to make some people try it, at least. That's a good sign.

OSS is Open Source Software. Just like nVidia and AMD have their own compute APIs (CUDA and CAL, now dead, replaced by ROCm), they also have their own OpenCL implementations. Even though OpenCL is an abstraction layer API, that is not supposed to depend on implementation, often, people use implementation-specific OpenCL code. Now, this is often for good reason, as the best performance gains require this sort of hardware-specific tweaks. But, the OSS implementations lack all the hardware-specific optimizations and feature support. At least that is my understanding of the situation. The app I have is an ATI/AMD-specific version built with OpenCL bindings and *should* work on my system; that is point of the whole OpenCL API. Einstein app is working and has the same library links as the SETI one, but it works and the SETI one doesn't.

Hey check this out. In the SETI project directory, I found a file with this in it:
Linux, 64bit, AMD/ATI OpenCL :
This executable works on 2.6.32 or newer kernel versions.
It is mandatory to have the AMD Catalyst fglrx driver for linux,
coming with an OpenCL runtime component, installed on your host
to run this application.

So, it's right there in black and white. My long history with the project has made me too comfortable, and I skip past the careful investigation a new user might go though. (Also, the project directory has many files in it and I didn't notice this file before.) This is a very old message, though, and I know the app worked before with OSS drivers. I guess that was a fluke and the current state is the expected one.

In any case, cheers. Thanks again.
ID: 1971143 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971153 - Posted: 20 Dec 2018, 4:59:04 UTC - in response to Message 1971143.  
Last modified: 20 Dec 2018, 5:05:49 UTC

I'm almost positive fglrx is deprecated. I'm pretty sure I've read posts that state that. I think the current AMD drivers are AMDGPU-Pro for Linux. I keep seeing almost daily stories on the open source AMD drivers for Linux at Phoronix.com and the current development focus seems to be Vulkan. I know that is gaming focused but I hope the compute side is not being forgotten.

The one thing I give kudos toward AMD is that they seem to have embraced the Open Software idea. Very different from Nvidia which is still closed and proprietary.

This document seems to lay out the compatibility with ROCm and the AMD cards.
https://rocm.github.io/hardware.html
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971153 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1971166 - Posted: 20 Dec 2018, 6:33:33 UTC - in response to Message 1971153.  

Yes, that is exactly why I jumped on the AMD train: it stands to reason that, if AMD follows through with their commitment to OSS, my approach would work out, as they deliver the necessary implementation code support to the OSS development community. So far, it is only promises, and it has been a long time. I'm not asking for anything that hasn't been suggested by many other people. Theoretically, it should work, since it is OpenCL. And, AMD says it will work, at some point. I'm just trying to find out if it has happened yet or if I am making some error on my end.
ID: 1971166 · Report as offensive

Message boards : Number crunching : Computation error in AstroPulse v7 v7.08 (opencl_ati_100) x86_64-pc-linux-gnu


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.