How can I build a working MB8 OpenCL ATi HD5 exe?

Message boards : Number crunching : How can I build a working MB8 OpenCL ATi HD5 exe?
Message board moderation

To post messages, you must log in.

AuthorMessage
BoMbY

Send message
Joined: 3 Apr 99
Posts: 8
Credit: 759,919
RAC: 0
Germany
Message 1916714 - Posted: 3 Feb 2018, 19:59:38 UTC

Hi,

for hours I'm trying to build a working OpenCL ATi HD5 version, but everything failed so far.

I tried several different versions from https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt, which is the only public source repository for the OpenCL version I can find. I also tried Visual Studio 2010 and 2017, and the best I can get is a build of MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3714.exe, which then fails to run with, for example:

Unhandled exception at 0x77989ABA (ntdll.dll) in MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3714.exe: 0xC0000374: A heap has been corrupted (parameters: 0x779C58E8).

This happens while parsing the XML WU headers.

Other builds with older versions fail with errors calling clGetDeviceIDs, if they compile at all - most need some source code fixes.

Is there any stable version I can try, and with what Visual Studio, or other compiler, version are they working? Or should I use an older AMD App SDK? Or do I really have to install a VS 2003 or VS 2008?

Thanks and Regards,
BoMbY
ID: 1916714 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1916965 - Posted: 5 Feb 2018, 0:06:11 UTC
Last modified: 5 Feb 2018, 0:11:17 UTC

I'm not sure what you are trying to build, but you can download the current r3584 version from
Mike's World http://mikesworld.eu/download.html or Raistmer's cloud storage https://cloud.mail.ru/public/3nxq/SgQBZXcM7/

If you need the current source you could contact Raistmer and see where it is stored.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1916965 · Report as offensive
BoMbY

Send message
Joined: 3 Apr 99
Posts: 8
Credit: 759,919
RAC: 0
Germany
Message 1916985 - Posted: 5 Feb 2018, 1:15:09 UTC
Last modified: 5 Feb 2018, 1:15:28 UTC

Ehm yeah, you know, I wouldn't try to compile it, if I hadn't something in mind - like fixing some of the mess it is (sorry, don't want to offend anyone, but it really ain't pretty).

I'm pretty sure what I found is his source, but I'm not yet sure if anyone can build anything useful with it. Maybe I really need to start searching for 10, or 15, years old compiler versions, but by now I guess it could be easier to start from scratch with a new implementation.
ID: 1916985 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1917035 - Posted: 5 Feb 2018, 10:01:06 UTC - in response to Message 1916985.  
Last modified: 5 Feb 2018, 10:32:29 UTC

Well, not quite that old, maybe 3 years or so. In Linux I found the newest GCC I could use was 4.9.2 which comes with Ubuntu 15.04. Looking at the recent changes here, https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt it appears it is being updated from VS2010 to VS2017. I've never used VS or MS for compiling. I understand you can use GCC in Windows? You might try GCC 4.9 in Windows, if you can, and see how that works. I have recently compiled working AMD GPU Apps in both Linux and MacOS without any problems using 3 year old compilers. As for the AMD SDK, I'd use one about the same age as the compiler. You do need the entire sah_v7_opt folder, and since you still get the same old error when trying to download the zip file, https://setisvn.ssl.berkeley.edu/trac/changeset/3732/branches/sah_v7_opt?old_path=%2F&format=zip, you have to be creative. I use the Go feature in MacOS to mount https://setisvn.ssl.berkeley.edu/svn/branches/ and then drag and drop sah_v7_opt to my HD. Usually I have to download it in pieces, as it errors out when trying the complete folder.


Look at that, I was able to download the entire sah_v7_opt folder in one piece. Maybe I'll see if it will compile in MacOS Sierra since I don't have any SETI Work to keep the machine busy.
ID: 1917035 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1917062 - Posted: 5 Feb 2018, 12:47:40 UTC

Yep, got it to work in Sierra 12.6 compiling the ATIHD5 app. Unfortunately I don't have any ATI cards installed at present, but it did work with my NV 1060 in the Benchmark App. It would probably work better on an AMD card, seeing as how Apple broke OpenCL on nVidia cards back with El Capitan 15.4 and the only build that works decently on NV cards is the Intel iGPU build.
But, it did work with a recent compiler, Sierra 10.12.6 with Xcode 8.3.3 using the Terminal.

The only problem was with the compiler finding stdint.h in two files, and I had to go back to boinc-master 7.7.
The biggest problem was getting boinc-master to work.
I suppose I could try the Intel iGPU build and see how well that works with the 1060...seeing as how I Still don't have any WORK.
ID: 1917062 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1917087 - Posted: 5 Feb 2018, 15:19:18 UTC - in response to Message 1916985.  
Last modified: 5 Feb 2018, 15:25:12 UTC

Ehm yeah, you know, I wouldn't try to compile it, if I hadn't something in mind - like fixing some of the mess it is (sorry, don't want to offend anyone, but it really ain't pretty).

Could you be more specific? Any new ideas/optimisations are always welcomed.


I'm pretty sure what I found is his source, but I'm not yet sure if anyone can build anything useful with it. Maybe I really need to start searching for 10, or 15, years old compiler versions, but by now I guess it could be easier to start from scratch with a new implementation.


Last set of builds was on VS 2010. So OpenCL configurations should be OK there.
XML parsing is part of setilib so perhaps you should rebuild that lib .


If you need the current source you could contact Raistmer and see where it is stored.

SETI repo.

but I'm not yet sure if anyone can build anything useful with it.

Well, at least one positive example runs on ATi cards ;) Source in repo, maybe just not as straighforward to compile as could be.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1917087 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1917100 - Posted: 5 Feb 2018, 15:52:29 UTC - in response to Message 1917087.  

Well, at least one positive example runs on ATi cards ;) Source in repo, maybe just not as straighforward to compile as could be.
Well, I did get the r3732 Intel build to work on my 1060 once I figured out you can't use -oclfft_tune_gr 256 with this card. However, I think I'll keep using the CUDA App;

Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18dc09ah.26284.16432.6.33.125.wu

Listing executable(s) in /APPS :
MBv8_8.22r3732_NV_ssse3_x86_64-apple-darwin setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : MBv8_8.22r3732_NV_ssse3_x86_64-apple-darwin -sbs 192 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 10 -device 1
      262.43 real        59.83 user       135.11 sys
Elapsed Time : ……………………………… 262 seconds
Speed compared to default : 1342 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.48%
---------------------------------------------------
Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -nobs -device 1
       81.71 real        68.75 user         8.33 sys
Elapsed Time : ……………………………… 82 seconds
Speed compared to default : 4289 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.

Done with Benchmark run! Removing temporary files!
TomsMacPro:KWSN-OSX-bench-MB_v2.1.07 Tom$
ID: 1917100 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1917108 - Posted: 5 Feb 2018, 16:14:31 UTC - in response to Message 1917100.  

However, I think I'll keep using the CUDA App;

Well, Petri's implementation is fastest so far for NV hardware, would be really good to see it being ready for mass-production launch.

Regarding your test there is interesting detail:

Running app with command : MBv8_8.22r3732_NV_ssse3_x86_64-apple-darwin -sbs 192 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 10 -device 1
262.43 real 59.83 user 135.11 sys
Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -nobs -device 1
81.71 real 68.75 user 8.33 sys

Most of CPU consumption comes from kernel-mode it seems. NV's great and shiny implementation of OpenCL stack for own hardware?...
User-mode CPU consumption is even little less for OpenCL...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1917108 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1917113 - Posted: 5 Feb 2018, 16:23:09 UTC

This is a question for you, Raistmer - Could Petri's techniques be applied to other platforms (OpenCL, AMD GPU, Intel GPU....)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1917113 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1917116 - Posted: 5 Feb 2018, 16:26:56 UTC - in response to Message 1917113.  

This is a question for you, Raistmer - Could Petri's techniques be applied to other platforms (OpenCL, AMD GPU, Intel GPU....)

To answer I need to read his' code first.
If there are some high-level optimizations then most probably they could be ported/re-implemented on other architectures too.
Low-level optimizations like asm commands choosing and pairing hardly very portable.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1917116 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1917123 - Posted: 5 Feb 2018, 16:50:03 UTC - in response to Message 1917108.  
Last modified: 5 Feb 2018, 17:10:52 UTC

However, I think I'll keep using the CUDA App;

Well, Petri's implementation is fastest so far for NV hardware, would be really good to see it being ready for mass-production launch.

Regarding your test there is interesting detail:

Running app with command : MBv8_8.22r3732_NV_ssse3_x86_64-apple-darwin -sbs 192 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 10 -device 1
262.43 real 59.83 user 135.11 sys
Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -nobs -device 1
81.71 real 68.75 user 8.33 sys

Most of CPU consumption comes from kernel-mode it seems. NV's great and shiny implementation of OpenCL stack for own hardware?...
User-mode CPU consumption is even little less for OpenCL...

The high CPU use is caused by the -nobs cmd. That shaves a few seconds off the time by using a full cpu. The default setting is to use Blocking Sync, which gives the times seen here, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959&offset=60 I ran a few OpenCL tasks at Beta with the new OpenCL App. While being a little slower than the Windows SoG version, it's a little better than the Macs at Beta, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=74755&offset=80
So, I guess it's not bad for an Intel iGPU build. You can see the features here, https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=29637526

It's a Shame the OpenCL build is once again Spiking the Idle Wake-ups on the NV cards though.
It's Spiking up to 20000. Apple says 150 is Too high. The CUDA App runs around 20-30, much better than 20000.
ID: 1917123 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1917127 - Posted: 5 Feb 2018, 17:19:39 UTC - in response to Message 1917116.  

Thanks Raistmer - I have a feeling (and I too haven't looked at his code) that it is combination of both high and low level "tricks".
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1917127 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1917142 - Posted: 5 Feb 2018, 18:06:15 UTC - in response to Message 1917123.  

It's a Shame the OpenCL build is once again Spiking the Idle Wake-ups on the NV cards though.
It's Spiking up to 20000. Apple says 150 is Too high. The CUDA App runs around 20-30, much better than 20000.

You mentioned those "idle wakeups" many times here and on beta but I still have very dim view what is it. No telling how to avoid them.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1917142 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1917149 - Posted: 5 Feb 2018, 18:21:00 UTC - in response to Message 1917142.  
Last modified: 5 Feb 2018, 18:23:10 UTC

That's about the best I can do for you,
https://www.google.com/search?&q=idle+wake+ups+mac
Interesting what the people in those threads consider High. Much lower than 20000, isn't it.

Number of times a thread caused the system to wake up from idleness to begin executing the thread.
ID: 1917149 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1917154 - Posted: 5 Feb 2018, 18:44:51 UTC - in response to Message 1917149.  

So it's not GPU-related counter? But CPU-related one instead?
If CUDA app runs w/o sleep (-no_bs or smth alike) and OpenCL app runs with -use_sleep what changes in that counter do you see?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1917154 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1917168 - Posted: 5 Feb 2018, 19:38:20 UTC - in response to Message 1917154.  

If I use -use_sleep with the Intel build the thing crashes in about 20 secs with;
ERROR: OpenCL kernel/call 'clGetEventProfilingInfo' call failed (-7) in file analyzePoT.cpp near line 1428.
Waiting 30 sec before restart...
If I don't use sleep it will run around 11000 to 13000 with higher Spikes.

The CUDA App runs between 15 to 30 Idle Wake Ups with or without the -nobs cmd.
ID: 1917168 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1917190 - Posted: 5 Feb 2018, 21:11:07 UTC - in response to Message 1917168.  

Well, maybe at CUDA case it just not going into sleep state at all?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1917190 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1917210 - Posted: 5 Feb 2018, 22:00:47 UTC - in response to Message 1917190.  

Well, maybe at CUDA case it just not going into sleep state at all?
I don't know, I mostly just copy & paste and hit the enter key...mostly.
Why don't you install Ubuntu 15.04 on this machine and try the CUDA App yourself, https://setiathome.berkeley.edu/show_host_detail.php?hostid=8331006
15.04 is the last one I could get to compile from the AKv8 folder and it still allows installing all the development packages, http://old-releases.ubuntu.com/releases/vivid/
The Current Working CUDA version is zi3v and is here, https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha/PetriR_raw3
zi3v works with the CUDA Toolkits from version 6.0 to 9.1...for me.
ID: 1917210 · Report as offensive

Message boards : Number crunching : How can I build a working MB8 OpenCL ATi HD5 exe?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.