Message boards :
Number crunching :
Strange observation of -SBS size on GTX 1060 6GB card
Message board moderation
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I've been fiddling on Numbskull Host 8030022. Grant got my interest piqued with how he is running his GTX 1070s. I decided to try dropping the task count per card down to 1 from 2. I also modified my app_config command line parameters to add -hp and change -SBS 1024 back to -SBS 2048 where I had run them before. The host has two GTX 1070s and a recently added GTX 1060 6GB card. Everything has been running fine. After a few hours with the new command line parameters, I checked some stderr. txt output to make sure the newly changed parameters took. Everything looks like it is supposed to on the 1070s. However I have noticed an anomaly with the 1060 that I can't explain. The command line parameters are global across all cards. I see the usual: Maximum single buffer size set to:2048MB Currently allocated 2121 MB for GPU buffers Single buffer allocation size: 2048MB for the 1070s. But this is what I am seeing with the 1060: Maximum single buffer size set to:2048MB Currently allocated 1609 MB for GPU buffers Single buffer allocation size: 1536MB How is it that the 1060 doesn't allocate the same amount of memory for the GPU buffers? The 1070s have 8GB and the 1060 has 6GB. Only one task per card is being run on all cards. There is plenty of memory available for the single task on any card. The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation? I can't remember the actual discussion, but the OpenCL buffer limitation is a percentage of the total available memory. So since the 1060 has less RAM than the 1070, it's SBS allocation will be less. Cards with more VRAM than the 1070 should have a larger maximum SBS buffer allocated. Grant Darwin NT |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
The 1536MB looks suspiciously like the supposed limitation of OpenCL that has been discussed recently. If that is actually happening in this case, what makes the difference between amount of memory allocated between a 1070 and 1060? Shouldn't it be the same if the OpenCL platform is the limitation? +1 |
Darrell Send message Joined: 14 Mar 03 Posts: 267 Credit: 1,418,681 RAC: 0 |
Looking at your task result: GTX1070 max memory allocation ==> 2147483648 \ 1024 \ 1024 = 2048MB GTX1060 max memory allocation ==> 1610612736 \ 1024 \ 1024 = 1536MB This is the amount of OpenCL memory available for all tasks (i.e. total) running on the card provided by the driver. When running more than one task at a time on the card, these maximums should be kept in mind. If you exceed them, you increase runtime because the GPU has to swap between the GPUs main memory and the OpenCL memory. These buffers show up in HWinfo under the heading of "GPU D3D Memory Dedicated". ... and still I fear, and still I dare not laugh at the Mad Man! Queen - The Prophet's Song |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And usual note: bigger doesn'tmean better (faster in our case). One need test if any speed gain really achieved with such big SNBS values. Though last builds are clever enough to use only what they really need from that amount... SETI apps news We're not gonna fight them. We're gonna transcend them. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Hi Grant, thanks for jogging my memory. Yes, it is a percentage, I forgot. OpenCL limit is 25% of available memory. 6144MB/0.25 = 1536MB Mystery solved. Now to finish up a day's worth of testing to see if my production got better/stayed the same/got worse. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Well, my 24 hour experiment with single tasks etc. etc. didn't pan out. Throughput is better when running 2 tasks per card. So back to original configuration. At least I know for sure now instead of just wondering what if? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Well, my 24 hour experiment with single tasks etc. etc. didn't pan out. Throughput is better when running 2 tasks per card. So back to original configuration. At least I know for sure now instead of just wondering what if? Yet every time I've tried (with and without various different command line values, and even none at all), running 1 at a time gives me the almost the same number of WUs per hour as running 2. So for me, no benefit in running 2 at a time, as when you get 1 Arecibo & 1 GBT on the same card, the Arecibo task can take as much as 3 times longer than it's usual processing time to finish. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
My issue is that I also work for Einstein and Milkyway. With one task per card, as soon as I put one of those tasks on a card, I am not doing any SETI. So I saw about a third reduction in tasks per day on SETI. With two tasks per card I have at least 5 SETI tasks running instead of 2. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.