RX 480 OpenCL Question

Message boards : Number crunching : RX 480 OpenCL Question
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1866457 - Posted: 10 May 2017, 7:31:47 UTC - in response to Message 1866452.  

I did try cpu_usage set to 1, and boinc started shutting down cpu tasks to try and run the gpu tasks.

Yep, that's what is supposed to happen, The OpenCL GPU applications work best when they have 1 CPU thread for each GPU WU being processed and that setting reserves 1 CPU thread is solely for the use of each GPU WU.
With the GPU WUs having to fight for CPU resources increases their processing run times.

Like Wiggo, I use all of my CPU resources- but reserve 1 CPU thread for each GPU WU, so I process 6 WUs on the CPU and 2 on my GPUs (1 WU on each GPU).
Grant
Darwin NT
ID: 1866457 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1866462 - Posted: 10 May 2017, 8:00:59 UTC - in response to Message 1866452.  

I did try cpu_usage set to 1, and boinc started shutting down cpu tasks to try and run the gpu tasks. If I can get my screen capture to work and then figure out how to show the image in a post i can show you that with the preference setting set to 51% and gpu_usage set to 0.5 and cpu_usage set to 0.17 that Process Explorer will show that the three CPU tasks are each taking ~16.5% of the CPU(i.e. 1 core each) and two GPU tasks are each taking ~16.5% of the CPU {i.e. 1 core each} (Note: The GPU tasks only use a full core - 16.5% when starting and ending their run, during their run they usually are less than 5%).
Then it is doing Exactly what you are telling it to do.

51% of 6 is 3.06, BOINC uses 3 because it is < 4. So you have 3 cores for BOINC.
For the GPU you are reserving 0.17 cores for tasks. So even 3 GPU tasks at 0.17 is 0.51 which is les than 1 core, so it does NOT reserve anything for the GPU.

Since you have now reserved 3.57 (<4) cores for BOINC it will run 3 CPU (<4) tasks and whatever you assign for GPU. At 0.17 CPU reservation you would have to run 6 GPU tasks for it to shut down 1 more core. And your GPU tasks will just run without limits on any of the 3 cores remaining.

REMEMBER these numbers RESERVE cores. They do NOT limit/throttle the usage of a task (i.e. to 0.17) of a core. So YES if you reserve more than 0.96 for GPU tasks, it WILL only run 2 CPU tasks.

What exactly are you aiming for? In CPU / GPU usage?
ID: 1866462 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1866464 - Posted: 10 May 2017, 8:14:02 UTC - in response to Message 1866452.  

But this distracts from the problem I was trying to help Karsten with and that is figuring out what is causing his each of his tasks to use all of the available OpenCL memory.
Are you comparing identical apps?

I know there was a discussion about that topic (which app I don't recall) where the app reported different numbers for different cards. One would have to compare the specs of the 2 cards to see if their is something different. Or it could be different BIOSs reporting differently. And also you the same app (i.e. afterburner or whatever) to see the results of changes made and impact on the cards. It's just a guess when you are trying to compare apples to oranges.

One other thing, are the desired command line settings showing up in the output correctly. This has arisen before that copy/pasting command lines from the forum CAN introduce non standard characters into the command_line.txt file resulting in part/all of the string being ignored. These normally show up as spaces, but are not really. A cure is to retype the string, or delete/replace anything that looks like a space. I'm not sure if that's it, but worth looking at.
ID: 1866464 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1866470 - Posted: 10 May 2017, 8:35:01 UTC - in response to Message 1866464.  
Last modified: 10 May 2017, 8:36:44 UTC

Just a short followup.

I didn't get to do much work on this yesterday, and when I was about to, Seti was down for its weekly maintenance.

I hope to have time for it later today.

I will run the GPU with 1 / 2 tasks, with different -sbs settings, and at the same time monitor mem usage, and make notes of the results.
I will also check out the results file on seti, when the WU's are reported, to see the reported -sbs settings.

Then we will see, what is happening.

Darell's and my card are not the exact same models.

His is an "MSI RX480 8gb Armor OC".
Mine is a "XFX RX480 RS 8gb"

I dont think there is much difference , but the BIOS is most certainly different, and the Cooling on his MSI card is better. Perhaps his card has higher clocks, I havent looked into that.
But basically the cards should work the same, if we have the same drivers installed..
ID: 1866470 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866565 - Posted: 10 May 2017, 16:06:33 UTC - in response to Message 1866462.  

[quote]

What exactly are you aiming for? In CPU / GPU usage?


What am I aiming for? I now have a six core(thread) processor. I want 3 cores to run non-GPU tasks from other projects, 2 cores to feed two GPU tasks running on the GPU (i.e. 1 core feeds one task on half a GPU), and leave one core for the operating system and everything else.

With preference set to 51% and gpu_usage set to 0.5 and cpu_usage set to 0.17 this is what I get as shown by Process Explorer and Boinc Manager. As of this moment I am crunching one Astroids@Home task, one Cosmology@Home task, and one LHC@Home task on the CPU, each using 1 core (16.5%) of the CPU. There is one Collatz GPU task running on the GPU it uses 16.5% of the CPU when starting and ending and during the middle of its run like its is now it is using 0.5% of the CPU. This is confirmed by the message in the manager window:

Collatz Conjecture Running (0.17 CPU and 1 ATI/AMD GPU) ---- it is set to use the entire GPU because it does not share the GPU well with other projects

Einstein, Milkyway, Seti, and Seti_beta are set to run two tasks on the GPU because they share it well and the Manager messages will be:

[Project] Running(0.17 CPU and 0.5 ATI/AMD GPU)
[Different Project] Running(0.17 CPU and 0.5 ATI/AMD GPU)

Do to the way the GPU scheduler is programed, you will seldom get two GPU tasks from the same project running on the GPU, you can get two Millkyway tasks running at the same time as Boinc uses this project with its short run times as a filler between starting the other longer running GPU projects or transitioning to single GPU project tasks. I pointed out this behavior to Dr. Anderson years ago and his response was that to get Boinc to run two tasks from the same project at the same time (all the time) would require a major rewrite of the GPU scheduler that he was unwilling to take on at the time.
ID: 1866565 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866568 - Posted: 10 May 2017, 16:36:49 UTC - in response to Message 1866470.  


Darell's and my card are not the exact same models.

His is an "MSI RX480 8gb Armor OC".
Mine is a "XFX RX480 RS 8gb"

I dont think there is much difference , but the BIOS is most certainly different, and the Cooling on his MSI card is better. Perhaps his card has higher clocks, I havent looked into that.
But basically the cards should work the same, if we have the same drivers installed..


The Armor OC comes overclocked with the core at 1291MHz and memory clock at 2000MHz, I am currently running AMD driver version 17.5.1. I Have pushed the GPU faster, but then HWinfo64 will start showing GPU memory errors. These memory error counts are new with the Polaris GPU architecture, but unfortunately there is no means to differentiate between a request to re-transmit data or an actually detected memory failure [they are counted as the same]. Thus I play it safe leaving the card at the factory overclock setting.

P.S. I guess I should note that I am currently running a debug 64bit version of Boinc that I built. I trying to figure out why OpenCL has started reporting my GPU as having 7536MB of memory instead of the 8192MB of memory it had when I got it. All of the other GPU monitoring software report the GPU as having 8192MB. I hoping that OpenCL is showing some sort of software glitch and not a fried VRM from pushing the card too hard.
ID: 1866568 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1866571 - Posted: 10 May 2017, 17:04:16 UTC - in response to Message 1866568.  
Last modified: 10 May 2017, 17:32:32 UTC

Ok then. There cant be much difference to our cards.
Mine is factory clocked 1288/2000. Right now its running 1300/2000.


I have been running a lot of tests this afternoon. Its hard to get conclusive results, but there are some observations.
Arecibo vs Guppi WU make comparisons hard.

I have run a number of tests.

-sbs 256 one WU at a time
-sbs 256 two WU's at a time
-sbs 1024 one WU at a time
-sbs 1024 two WU's at a time
These with my standard optimisations.
Then I also ran these, where I removed every single optimisation.
_no_ optimisations one WU at a time
_no_ optimisations two WU's at a time

During these test I have monitored GPU mem, and checked every single result on the Seti servers to see memory allocated.

Fresh boot
GPU memory used 135Mb

-sbs 256 x1 WU
GPU memory used 478Mb
Maximum single buffer size set to:256MB
Single buffer allocation size: 256MB
Total device global memory: 3072MB
WU crunch time 234.2 seconds Arecibo WU

-sbs 256 x2 WU's
GPU memory used 1127Mb
Maximum single buffer size set to:256MB
Single buffer allocation size: 256MB
Total device global memory: 3072MB
WU crunch time 2401.8 seconds Guppi WU

-sbs 1024 x1 WU
GPU memory used 1591Mb
Maximum single buffer size set to:256MB
Single buffer allocation size: 1024MB
Total device global memory: 3072MB
WU crunch time 231.4 seconds Arecibo WU

-sbs 1024 x2 WU's
GPU memory used 2863Mb
Maximum single buffer size set to:1024MB
Single buffer allocation size: 1024MB
Total device global memory: 3072MB
WU crunch time 562.0 seconds Arecibo WU

No optimisations x1 WU
GPU memory used 700Mb

Single buffer allocation size: 128MB
Total device global memory: 3072MB
WU crunch time 542.9 seconds Guppi WU

No optimisations x2 WU
GPU memory used 913Mb

Single buffer allocation size: 128MB
Total device global memory: 3072MB
WU crunch time 1175.6 seconds Arecibo WU

This is probably not all that clear, but with alle my settings, the app has followed the -sbs settings perfectly. Mem usage on the card itself is higher, but nothing to be alermed about, and well below 3072Mb.
Every time I have enabled more than one WU, times have more than doubled. Guppi WU's seem to bring much worse results, making 2 wu crunching much slower than with only one WU.

I would very much like a way to test this with one or two specific WU's in a more controlled invironment. I tried using knabench, but couldn't work it out.

For now I will be crunching one WU at a time. I will be trying these settings:
-sbs 2048 -period_iterations_num 2 -tt 300 -hp -high_prec_timer -high_perf -no_cpu_lock -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

And then I'll see what they do.
ID: 1866571 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866584 - Posted: 10 May 2017, 18:30:56 UTC - in response to Message 1866571.  
Last modified: 10 May 2017, 19:15:51 UTC


For now I will be crunching one WU at a time. I will be trying these settings:
-sbs 2048 -period_iterations_num 2 -tt 300 -hp -high_prec_timer -high_perf -no_cpu_lock -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

And then I'll see what they do.


With -sbs set to 2048 watch your tasks details closely when the units get uploaded to Seti. From my results, anything above 1472 will cause errors in Rastimer's app. Unfortunately the sight has deleted the units so I have nothing to copy and post for you to watch for, but you'll notice them easily enough.
ID: 1866584 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866588 - Posted: 10 May 2017, 18:41:50 UTC
Last modified: 10 May 2017, 19:05:12 UTC

ID: 1866588 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1866590 - Posted: 10 May 2017, 18:44:50 UTC

You cannot post an image direct from your computer, you will have to upload it to a photo sharing website like Flickr or Photobucket and then paste the image link between the {img} tags
ID: 1866590 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866594 - Posted: 10 May 2017, 19:13:52 UTC - in response to Message 1866590.  

Thank You, that's what I was trying to figure out. In test post 4 I've have now got a url to a screen capture uploaded to OneDrive. Please take a look and tell me what you see about Boinc. This is with preference set to 51%, gpu_usage set to 0.5 and cpu_usage set to 0.17. One of the cpu tasks is running in a Vbox which is why it shows in the screenshot as < 0.01. If I were to scroll Process Explorer up you could see where Vbox is running using 16,5% of the CPU.
ID: 1866594 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866597 - Posted: 10 May 2017, 19:34:56 UTC
Last modified: 10 May 2017, 19:52:47 UTC

https://1drv.ms/i/s!ArIvftV8roEagQgHIDQJNQCP20z6
https://1drv.ms/i/s!ArIvftV8roEagQcpFtoceCyxSA6-
https://1drv.ms/i/s!ArIvftV8roEagQn1W-9RMXDQX1Fx

A couple more screenshots, The first is a screenshot showing Process Explorer focused on Boinc and HQinfo64 on the GPU, the second one has ProcessExplorer scrolled up to show the Vbox usage and HWinfo64 scrolled up to show CPUs usages. The third is a screenshot showing this host's daily average showing what playing with the settings has changed.
ID: 1866597 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1866603 - Posted: 10 May 2017, 21:11:15 UTC - in response to Message 1866597.  

Thanks for the warning.

If you check my valid tasks , you will see that most of the tasks have gone through clean. You wil also see the app chosing 3072 as the "Single buffer allocation size", which it never did in any of my tests this afternoon.
So there must be some limit, that if its crossed, the app simply allocates all memory.

But I do have 2 error tasks for this evening, and that must be because of some setting I changed since yesterday.

As I ran the whole day yesterday and some of the day before that, with a setting of 2560, without getting errors, it makes me wonder if the "-period_iterations_num", which I changed from 1 to 2, is the culprit.

As a test I will reduce -sbs to 1472. There doesn't seem to be much difference in crunching times anyway.
ID: 1866603 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866610 - Posted: 10 May 2017, 22:26:38 UTC - in response to Message 1866603.  
Last modified: 10 May 2017, 22:28:29 UTC

Looking at your valid tasks with -sbs set to 2048 and Num_period_iterations set to 2:

Task 5726871086:
[indent]OpenCL-kernels filename : MultiBeam_Kernels_r3557.cl[/indent]
[indent]ar=2.003587 NumCfft=99835 NumGauss=0 NumPulse=28098965800 NumTriplet=56368203049[/indent]
[indent]Currently allocated 1545 MB for GPU buffers[/indent]
{indent]Single buffer allocation size: 1472MB[/indent]
[indent]Total device global memory: 3072MB[/indent]

Task 5722976273:
[indent]ar=1.374821 NumCfft=101073 NumGauss=0 NumPulse=56268705866 NumTriplet=56268705866[/indent]
[indent]Currently allocated 3145 MB for GPU buffers[/indent]
[indent]Single buffer allocation size: 3072MB[/indent]
[indent]Total device global memory: 3072MB[/indent]

Task 5722954741:
[indent]ar=0.009869 NumCfft=105899 NumGauss=0 NumPulse=36102529920 NumTriplet=49060162720[/indent]
[indent]Currently allocated 3145 MB for GPU buffers[/indent]
[indent]Single buffer allocation size: 3072MB[/indent]
[indent]Total device global memory: 3072MB[/indent]

These three tasks all appear to be using the same command line parameters, the same type work unit, but for some reason the first gets allocated 1545MB of GPU buffers and the other two get 3145MB. When the allocated buffers go from 1545MB to 3145MB or 3173MB as shown in some other tasks, the single buffer allocation goes from 1472MB to 3072MB. At 1472MB you could easily run two tasks at at time on the GPU, but at 3072MB you can only run one as that one task is using all of the OpenCL memory. I see nothing in the task results that explain this. So unless there is some setting in the app_info file, it is going to take someone like Rastimer who wrote the OpenCL code to figure out what is happening. I will try changing my -sbs to 2048, but I think it crash his app.

P.S. Sorry the indent code is not working
ID: 1866610 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1866634 - Posted: 11 May 2017, 0:27:06 UTC
Last modified: 11 May 2017, 1:09:15 UTC

Hi Karsten, tried setting -sbs to 2048 and it failed like I thought it would, fell asleep watching the evening news and woke up to Boinc postponing the Seti task, once I reset -sbs to 1472, the task crunched to the end. Here are some interesting bits of the result:

ar=0.008590 NumCfft=107143 NumGauss=0 NumPulse=37404169344 NumTriplet=50363768224
Currently allocated 3145 MB for GPU buffers <-----Just like yours

Single buffer allocation size: 3072MB <----- just like yours
Total device global memory: 3072MB

period_iterations_num=1
ERROR: OpenCL kernel/call 'Enqueueing kernel:Transpose4_kernel_cl(pulse)' call failed (-4) in file ..\analyzePoT.cpp near line 3113. <----Not like yours
Waiting 30 sec before restart...

It had been trying to restart and postponing the task for 20 minutes.

Addendum:

Went and checked a couple more of your recent valid tasks (newer than the three I posted about earlier) and noticed something about them and the first one I had posted about:

<stderr_txt>
Maximum single buffer size set to:2048MB <------- indicates -sbs set to 2048
Number of period iterations for PulseFind set to:2

oclFFT minimal memory coalesce width set to:64
Maximum single buffer size set to:1472MB <-------- indicates -sbs was reset, by app or by you?
Priority of worker thread raised successfully

ar=1.512955 NumCfft=100889 NumGauss=0 NumPulse=55995320508 NumTriplet=55995320508
Currently allocated 1545 MB for GPU buffers <--- smaller amount

Single buffer allocation size: 1472MB <------ what is needed for running two tasks at a time
Total device global memory: 3072MB


Makes me wonder if Brent wasn't right and there are some unprintable control characters in your command line text file, and like he suggested you might want to open it up in notepad, do a select all, and delete, and retype the settings in.
ID: 1866634 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1866683 - Posted: 11 May 2017, 10:22:43 UTC - in response to Message 1866634.  
Last modified: 11 May 2017, 11:14:52 UTC

Thanks for all your detective work.

There is definately something funny going on.

I have almost settled with running one WU at a time.

But I also read Brent's post, and will definately try with a fresh commandline, to see if something is going on, in that regard.
ID: 1866683 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1867146 - Posted: 13 May 2017, 18:13:27 UTC

Arg, this drives me crazy:

https://1drv.ms/i/s!ArIvftV8roEagQvCbSdMgpMPz4ex

Host's daily average has gone from 380,000 per day to over 500,000 per day, in two days, and I don't know which change or combination of changes to the system have caused it.
ID: 1867146 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1867169 - Posted: 13 May 2017, 22:42:49 UTC - in response to Message 1867146.  
Last modified: 13 May 2017, 22:52:25 UTC

Arg, this drives me crazy:

https://1drv.ms/i/s!ArIvftV8roEagQvCbSdMgpMPz4ex

Host's daily average has gone from 380,000 per day to over 500,000 per day, in two days, and I don't know which change or combination of changes to the system have caused it.

It would appear Collatz Conjecture is paying stupid amounts of Credit for doing very little work.
30,000 Credits for less than 1,000 seconds work? That's not Credit inflation, that's Credit hyperinflation.

Let's see, if Seti paid like them, just one of my GTX 1070s would give almost 10,000 Credits per WU, instead of it's present 80-145.
Edit- unless they have a fixed pay per WU in which case i'd be getting 90,000 Credits for 1,000 seconds work.

What a joke.
Grant
Darwin NT
ID: 1867169 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1867171 - Posted: 13 May 2017, 22:57:55 UTC - in response to Message 1867146.  

Arg, this drives me crazy:

https://1drv.ms/i/s!ArIvftV8roEagQvCbSdMgpMPz4ex

Host's daily average has gone from 380,000 per day to over 500,000 per day, in two days, and I don't know which change or combination of changes to the system have caused it.

With my R9 390x fully tweaked it was good for ~5,000,000 RAC on Collatz. I haven't run Collatz in about a year. So that may no longer be accurate.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1867171 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1867174 - Posted: 13 May 2017, 23:17:14 UTC - in response to Message 1867171.  

With my R9 390x fully tweaked it was good for ~5,000,000 RAC on Collatz. I haven't run Collatz in about a year. So that may no longer be accurate.

Extremely roughly worked out using the numbers above i'd be getting around 7.7 million credits per day from just one of my GTX 1070s if they're paying out a fixed amount per WU.
Grant
Darwin NT
ID: 1867174 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : RX 480 OpenCL Question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.