Strange observation of -SBS size on GTX 1060 6GB card

Message boards : Number crunching : Strange observation of -SBS size on GTX 1060 6GB card
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,217,413
RAC: 358,686
United States
Message 1892119 - Posted: 28 Sep 2017, 7:22:18 UTC

I've been fiddling on Numbskull Host 8030022. Grant got my interest piqued with how he is running his GTX 1070s. I decided to try dropping the task count per card down to 1 from 2. I also modified my app_config command line parameters to add -hp and change -SBS 1024 back to -SBS 2048 where I had run them before. The host has two GTX 1070s and a recently added GTX 1060 6GB card. Everything has been running fine. After a few hours with the new command line parameters, I checked some stderr. txt output to make sure the newly changed parameters took. Everything looks like it is supposed to on the 1070s. However I have noticed an anomaly with the 1060 that I can't explain. The command line parameters are global across all cards. I see the usual:

Maximum single buffer size set to:2048MB
Currently allocated 2121 MB for GPU buffers
Single buffer allocation size: 2048MB

for the 1070s.

But this is what I am seeing with the 1060:
Maximum single buffer size set to:2048MB
Currently allocated 1609 MB for GPU buffers
Single buffer allocation size: 1536MB

How is it that the 1060 doesn't allocate the same amount of memory for the GPU buffers? The 1070s have 8GB and the 1060 has 6GB. Only one task per card is being run on all cards. There is plenty of memory available for the single task on any card. The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation?
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1892119 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8881
Credit: 115,080,143
RAC: 70,401
Australia
Message 1892121 - Posted: 28 Sep 2017, 7:33:21 UTC - in response to Message 1892119.  

The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation?

I can't remember the actual discussion, but the OpenCL buffer limitation is a percentage of the total available memory. So since the 1060 has less RAM than the 1070, it's SBS allocation will be less. Cards with more VRAM than the 1070 should have a larger maximum SBS buffer allocated.
Grant
Darwin NT
ID: 1892121 · Report as offensive     Reply Quote
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3992
Credit: 208,943,603
RAC: 50,996
United States
Message 1892123 - Posted: 28 Sep 2017, 7:37:18 UTC - in response to Message 1892121.  

The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation?

I can't remember the actual discussion, but the OpenCL buffer limitation is a percentage of the total available memory. So since the 1060 has less RAM than the 1070, it's SBS allocation will be less. Cards with more VRAM than the 1070 should have a larger maximum SBS buffer allocated.


+1
ID: 1892123 · Report as offensive     Reply Quote
Profile DarrellProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,371,306
RAC: 413
United States
Message 1892156 - Posted: 28 Sep 2017, 13:19:08 UTC

Looking at your task result:
GTX1070 max memory allocation ==> 2147483648 \ 1024 \ 1024 = 2048MB
GTX1060 max memory allocation ==> 1610612736 \ 1024 \ 1024 = 1536MB

This is the amount of OpenCL memory available for all tasks (i.e. total) running on the card provided by the driver. When running more than one task at a time on the card, these maximums should be kept in mind. If you exceed them, you increase runtime because the GPU has to swap between the GPUs main memory and the OpenCL memory. These buffers show up in HWinfo under the heading of "GPU D3D Memory Dedicated".
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1892156 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5806
Credit: 76,044,315
RAC: 50,967
Russia
Message 1892172 - Posted: 28 Sep 2017, 14:12:25 UTC

And usual note: bigger doesn'tmean better (faster in our case).
One need test if any speed gain really achieved with such big SNBS values.
Though last builds are clever enough to use only what they really need from that amount...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1892172 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,217,413
RAC: 358,686
United States
Message 1892221 - Posted: 28 Sep 2017, 18:23:35 UTC - in response to Message 1892121.  

Hi Grant, thanks for jogging my memory. Yes, it is a percentage, I forgot. OpenCL limit is 25% of available memory. 6144MB/0.25 = 1536MB Mystery solved. Now to finish up a day's worth of testing to see if my production got better/stayed the same/got worse.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1892221 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,217,413
RAC: 358,686
United States
Message 1892288 - Posted: 29 Sep 2017, 1:40:34 UTC

Well, my 24 hour experiment with single tasks etc. etc. didn't pan out. Throughput is better when running 2 tasks per card. So back to original configuration. At least I know for sure now instead of just wondering what if?
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1892288 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8881
Credit: 115,080,143
RAC: 70,401
Australia
Message 1892307 - Posted: 29 Sep 2017, 4:40:50 UTC - in response to Message 1892288.  

Well, my 24 hour experiment with single tasks etc. etc. didn't pan out. Throughput is better when running 2 tasks per card. So back to original configuration. At least I know for sure now instead of just wondering what if?

Yet every time I've tried (with and without various different command line values, and even none at all), running 1 at a time gives me the almost the same number of WUs per hour as running 2. So for me, no benefit in running 2 at a time, as when you get 1 Arecibo & 1 GBT on the same card, the Arecibo task can take as much as 3 times longer than it's usual processing time to finish.
Grant
Darwin NT
ID: 1892307 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,217,413
RAC: 358,686
United States
Message 1892318 - Posted: 29 Sep 2017, 6:20:16 UTC - in response to Message 1892307.  

My issue is that I also work for Einstein and Milkyway. With one task per card, as soon as I put one of those tasks on a card, I am not doing any SETI. So I saw about a third reduction in tasks per day on SETI. With two tasks per card I have at least 5 SETI tasks running instead of 2.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1892318 · Report as offensive     Reply Quote

Message boards : Number crunching : Strange observation of -SBS size on GTX 1060 6GB card


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.