Exception when running AstroPulse OpenCL unit with Mesa/Clover

Questions and Answers : Unix/Linux : Exception when running AstroPulse OpenCL unit with Mesa/Clover
Message board moderation

To post messages, you must log in.

AuthorMessage
Aaron Puchert

Send message
Joined: 28 Mar 08
Posts: 5
Credit: 432,715
RAC: 0
Germany
Message 1777570 - Posted: 9 Apr 2016, 19:17:29 UTC

AstroPulse GPU work units fail on my machine because a string is constructed from a null pointer. The GPU is an AMD Radeon HD 8570M, known to Linux as AMD HAINAN. Instead of the proprietary driver from AMD (fglrx), I run the open source driver (radeon) with Mesa, which has an OpenCL implementation (Clover).

The kernel sources didn't compile at first because the LLVM OpenCL compiler apparently doesn't support inlining. Hence I removed the "inline" specification of calc_chirp.

Then the code compiles but AstroPulse fails (https://setiathome.berkeley.edu/result.php?resultid=4844090137) with the following error:

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct null not valid
SIGABRT: abort called
Stack trace (18 frames):
../../projects/setiathome.berkeley.edu/astropulse_7.08_x86_64-pc-linux-gnu__opencl_ati_100(boinc_catch_signal+0x4d)[0x4c6a6d]
/lib64/libpthread.so.0(+0x10d10)[0x7fc0e695cd10]
/lib64/libc.so.6(gsignal+0x38)[0x7fc0e5900bf8]
/lib64/libc.so.6(abort+0x13a)[0x7fc0e590204a]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x7fc0e600a80d]
/usr/lib64/libstdc++.so.6(+0x94896)[0x7fc0e6008896]
/usr/lib64/libstdc++.so.6(+0x948e1)[0x7fc0e60088e1]
/usr/lib64/libstdc++.so.6(+0x94af8)[0x7fc0e6008af8]
/usr/lib64/libstdc++.so.6(_ZSt19__throw_logic_errorPKc+0x3f)[0x7fc0e603000f]
/usr/lib64/libstdc++.so.6(_ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag+0x1f)[0x7fc0e6048f6f]
/usr/lib64/libstdc++.so.6(_ZNSsC2EPKcRKSaIcE+0x36)[0x7fc0e6049356]
../../projects/setiathome.berkeley.edu/astropulse_7.08_x86_64-pc-linux-gnu__opencl_ati_100[0x4844fc]
../../projects/setiathome.berkeley.edu/astropulse_7.08_x86_64-pc-linux-gnu__opencl_ati_100[0x484b62]
../../projects/setiathome.berkeley.edu/astropulse_7.08_x86_64-pc-linux-gnu__opencl_ati_100[0x47121a]
../../projects/setiathome.berkeley.edu/astropulse_7.08_x86_64-pc-linux-gnu__opencl_ati_100[0x4619fc]
../../projects/setiathome.berkeley.edu/astropulse_7.08_x86_64-pc-linux-gnu__opencl_ati_100[0x46a345]
/lib64/libc.so.6(__libc_start_main+0xf0)[0x7fc0e58ec5b0]
../../projects/setiathome.berkeley.edu/astropulse_7.08_x86_64-pc-linux-gnu__opencl_ati_100[0x40bda9]

Exiting...


The error is thrown in _ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag, which is
char* std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag)
when the first argument (the begin iterator) is a null pointer. The parent function is
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
which is apparently called with a null pointer as first argument. Because there are no debug symbols for the AstroPulse executable, I cannot dig deeper.

Any idea how this happened?[/url]
ID: 1777570 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1777597 - Posted: 9 Apr 2016, 21:03:20 UTC
Last modified: 9 Apr 2016, 21:05:41 UTC

You may want to read through this thread of the last endeavours of a soul trying to use the Mesa drivers. He ran into the same error as you did.

Error : Building Program (binary, clBuildProgram):main kernels: not OK code -43
CL file build log on device AMD HAINAN (DRM 2.43.0, LLVM 3.8.0)

Juha wrote:
-43 is CL_INVALID_BUILD_OPTIONS. The compiler is probably choking on some ATI specific compiler option. Too bad it doesn't actually include the build log even though it says so.

As the app isn't going to work I suppose you could just abort the AstroPulse tasks and move to anon platform to test the Multibeam app. I'm a bit afraid you'll have similar results with it.

To tell the truth I would edit the program file with hex editor and blank the bad compiler options just to see how far I can push it. That would make it necessary to switch to anon platform. BOINC doesn't like when files it has downloaded are changed.

Mesa uses LLVM and libclc uses Clang but those would be in library form. I think we can trust the packager to have done his/her job right and you have all the necessary dependencies installed.

The user in that thread (Paul) has given up, as far as I can see from his returned tasks. But you could probably send him a message in PM and ask how far he got, and if the two of you can try work together getting it fixed.

Edit: you may also want to stroll through this thread at the BOINC boards, with the same user and the same problems. And with Juha and me. :)
ID: 1777597 · Report as offensive
Aaron Puchert

Send message
Joined: 28 Mar 08
Posts: 5
Credit: 432,715
RAC: 0
Germany
Message 1777641 - Posted: 9 Apr 2016, 23:24:06 UTC

Thanks for the links! I wasn't aware that there were other threads for this issue already.

Of course the proprietary driver and the open source driver are different, so it's no surprise that an application hand-tuned for one OpenCL platform doesn't work for another out-of-the-box.
ID: 1777641 · Report as offensive
Aaron Puchert

Send message
Joined: 28 Mar 08
Posts: 5
Credit: 432,715
RAC: 0
Germany
Message 1777649 - Posted: 9 Apr 2016, 23:47:56 UTC

I've noted that compiling the AstroPulse kernels (without any options) yields the following error:

AstroPulse_Kernels_r2751.cl build error:
input.cl:254:22: warning: double precision constant requires cl_khr_fp64, casting to single precision
/usr/local/include/clc/float/definitions.h:50:25: note: expanded from macro 'M_PI'
unsupported call to function calc_chirp in dechirp_range1_kernel
error: build error


The warning is a litte bit annoying, but apparently harmless. The error however is a bit mysterious. Both occurrences of calc_chirp are declared as "inline", and removing this specifications eliminates the error. At least the kernels compile now. (without options)

I haven't checked yet if this really works, I'm waiting for new tasks.

Are the optimized AstroPulse apps open-source, so I could have a look at how the kernels are build? I can only find the sources for the stock application. (http://setiboinc.cvs.sourceforge.net/viewvc/setiboinc/?view=tar) This would be nice if there is really a problem with the build options.
ID: 1777649 · Report as offensive
Aaron Puchert

Send message
Joined: 28 Mar 08
Posts: 5
Credit: 432,715
RAC: 0
Germany
Message 1777666 - Posted: 10 Apr 2016, 0:40:38 UTC

Never mind, I found the source code: https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt/.

It's hard to look through the forest of #ifdefs, but it seems that for AMD GPUs we have an option "-fno-bin-amdil", which is not part of the standard. So Mesa doesn't understand it.

This options seems to suppress the generation of some AMD-specific intermediate language code, so it could probably just be omitted.
ID: 1777666 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1778470 - Posted: 12 Apr 2016, 21:01:09 UTC - in response to Message 1777666.  

I was waiting for Paul to test the Multibeam app in anon platform mode before forwarding everything to the developers but he disappeared and I guess I kind of forgot it.

Anyway, you could either blank out the bad compiler option with hex editor or you could edit the source code and re-compile. If you decide to compile the app yourself you need to first compile BOINC API and libs (from master branch).

To compile Seti GPU apps checkout the entire sah_v7_opt tree. The tree is not stable so you'll want to use the exact same revision that the stock apps are built from. Use the configure line from one of the configure_*.txt files.

The currently distributed Astropulse app appears to be have been compiled from AP_BLANKIT tree and Multibeam app is from AKv8 tree.
ID: 1778470 · Report as offensive
Aaron Puchert

Send message
Joined: 28 Mar 08
Posts: 5
Credit: 432,715
RAC: 0
Germany
Message 1778805 - Posted: 13 Apr 2016, 20:14:42 UTC

Thank you. I think I'll first try it with the hex editor.

But I don't have any work units at the moment and it seems they are hard to get. Are there maybe dummy work units out there that can be used for testing?
ID: 1778805 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1778820 - Posted: 13 Apr 2016, 21:01:28 UTC - in response to Message 1778805.  

Lunatics have test material available.

Or you can download one straight from the server. Workunit 2118565406, file.
ID: 1778820 · Report as offensive

Questions and Answers : Unix/Linux : Exception when running AstroPulse OpenCL unit with Mesa/Clover


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.