Strange observation of -SBS size on GTX 1060 6GB card

Message boards : Number crunching : Strange observation of -SBS size on GTX 1060 6GB card
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1892119 - Posted: 28 Sep 2017, 7:22:18 UTC

I've been fiddling on Numbskull Host 8030022. Grant got my interest piqued with how he is running his GTX 1070s. I decided to try dropping the task count per card down to 1 from 2. I also modified my app_config command line parameters to add -hp and change -SBS 1024 back to -SBS 2048 where I had run them before. The host has two GTX 1070s and a recently added GTX 1060 6GB card. Everything has been running fine. After a few hours with the new command line parameters, I checked some stderr. txt output to make sure the newly changed parameters took. Everything looks like it is supposed to on the 1070s. However I have noticed an anomaly with the 1060 that I can't explain. The command line parameters are global across all cards. I see the usual:

Maximum single buffer size set to:2048MB
Currently allocated 2121 MB for GPU buffers
Single buffer allocation size: 2048MB

for the 1070s.

But this is what I am seeing with the 1060:
Maximum single buffer size set to:2048MB
Currently allocated 1609 MB for GPU buffers
Single buffer allocation size: 1536MB

How is it that the 1060 doesn't allocate the same amount of memory for the GPU buffers? The 1070s have 8GB and the 1060 has 6GB. Only one task per card is being run on all cards. There is plenty of memory available for the single task on any card. The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1892119 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1892121 - Posted: 28 Sep 2017, 7:33:21 UTC - in response to Message 1892119.  

The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation?

I can't remember the actual discussion, but the OpenCL buffer limitation is a percentage of the total available memory. So since the 1060 has less RAM than the 1070, it's SBS allocation will be less. Cards with more VRAM than the 1070 should have a larger maximum SBS buffer allocated.
Grant
Darwin NT
ID: 1892121 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1892123 - Posted: 28 Sep 2017, 7:37:18 UTC - in response to Message 1892121.  

The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation?

I can't remember the actual discussion, but the OpenCL buffer limitation is a percentage of the total available memory. So since the 1060 has less RAM than the 1070, it's SBS allocation will be less. Cards with more VRAM than the 1070 should have a larger maximum SBS buffer allocated.


+1
ID: 1892123 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1892156 - Posted: 28 Sep 2017, 13:19:08 UTC

Looking at your task result:
GTX1070 max memory allocation ==> 2147483648 \ 1024 \ 1024 = 2048MB
GTX1060 max memory allocation ==> 1610612736 \ 1024 \ 1024 = 1536MB

This is the amount of OpenCL memory available for all tasks (i.e. total) running on the card provided by the driver. When running more than one task at a time on the card, these maximums should be kept in mind. If you exceed them, you increase runtime because the GPU has to swap between the GPUs main memory and the OpenCL memory. These buffers show up in HWinfo under the heading of "GPU D3D Memory Dedicated".
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1892156 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1892172 - Posted: 28 Sep 2017, 14:12:25 UTC

And usual note: bigger doesn'tmean better (faster in our case).
One need test if any speed gain really achieved with such big SNBS values.
Though last builds are clever enough to use only what they really need from that amount...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1892172 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1892221 - Posted: 28 Sep 2017, 18:23:35 UTC - in response to Message 1892121.  

Hi Grant, thanks for jogging my memory. Yes, it is a percentage, I forgot. OpenCL limit is 25% of available memory. 6144MB/0.25 = 1536MB Mystery solved. Now to finish up a day's worth of testing to see if my production got better/stayed the same/got worse.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1892221 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1892288 - Posted: 29 Sep 2017, 1:40:34 UTC

Well, my 24 hour experiment with single tasks etc. etc. didn't pan out. Throughput is better when running 2 tasks per card. So back to original configuration. At least I know for sure now instead of just wondering what if?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1892288 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1892307 - Posted: 29 Sep 2017, 4:40:50 UTC - in response to Message 1892288.  

Well, my 24 hour experiment with single tasks etc. etc. didn't pan out. Throughput is better when running 2 tasks per card. So back to original configuration. At least I know for sure now instead of just wondering what if?

Yet every time I've tried (with and without various different command line values, and even none at all), running 1 at a time gives me the almost the same number of WUs per hour as running 2. So for me, no benefit in running 2 at a time, as when you get 1 Arecibo & 1 GBT on the same card, the Arecibo task can take as much as 3 times longer than it's usual processing time to finish.
Grant
Darwin NT
ID: 1892307 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1892318 - Posted: 29 Sep 2017, 6:20:16 UTC - in response to Message 1892307.  

My issue is that I also work for Einstein and Milkyway. With one task per card, as soon as I put one of those tasks on a card, I am not doing any SETI. So I saw about a third reduction in tasks per day on SETI. With two tasks per card I have at least 5 SETI tasks running instead of 2.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1892318 · Report as offensive

Message boards : Number crunching : Strange observation of -SBS size on GTX 1060 6GB card


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.