CUDA Versions

Message boards : Number crunching : CUDA Versions
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

AuthorMessage
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1553963 - Posted: 8 Aug 2014, 22:18:52 UTC - in response to Message 1553574.  

If your not shore about app_config.xml load Lunatics .
Sorry can't comment on building a config file others will help there


Took a hard look at building an appropriate .cfg file but, before starting, decided to compare the samples to app_info.xml produced by Lunatics. Found much of the same constructs there and, given the time and effort, decided to use Lunatics. Had the new machine on line last night with GPU set for 5 WU's (single 780). Hope to bring the other 780 in tonight but am a little concerned that my 900 watt UPS can't handle peaks. Presently at 440 watts so might work. Picked up a few 'validation inconclusive' results so far but may be due to all the apps and driver installs, restarts, etc. Letting it set at 5 WU's for now but need to address that UPS issue.

Thanks to all who have provided advice, suggestions, education, etc. Have learned much.
ID: 1553963 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1553967 - Posted: 8 Aug 2014, 22:23:26 UTC - in response to Message 1549702.  

Hello Everyone,

Just curious, what determines the CUDA version WU one's computer receives?

I have a rig with two GTX 780Ti cards which I've only ever seen receive CUDA42 WUs and I also have a PC with two GTX 680 cards which I've seen receive CUDA32, CUDA42, and CUDA50.

I always keep both machines up to date with the latest WHQL drivers. Both run Win7 x64.

Thank you to anyone who can take the time to help.


Matt, how many simul. WU's are you running on the 780's? Am presently running 5 on my recently installed 780ti but would like to hear from others on this.
ID: 1553967 · Report as offensive
Matt

Send message
Joined: 6 Jun 10
Posts: 18
Credit: 1,434,540
RAC: 0
United States
Message 1553979 - Posted: 8 Aug 2014, 22:33:17 UTC - in response to Message 1553967.  

I'm currently 2x on SETIv7 WUs and 1x on AP w/ full HT core dedicated. I'm using the Lunatics for both. Just updated drivers and am seeing that AP requires a full HT core per WU now. I also run WCG on CPU, so that's why I only run 2x on the v7 WUs. Still, Lunatics has increased GPU utilization so that 2 v7 tasks average 95%.
ID: 1553979 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1553988 - Posted: 8 Aug 2014, 22:43:57 UTC - in response to Message 1553967.  
Last modified: 8 Aug 2014, 22:53:22 UTC

Bill, what is the average time it takes for each AP on your system? I run 2 APs on each of my 780s and my average time is 55 minutes. When no APs are available I run 3 v7 at a time on each.

edit

I could probably do 3 APs on each but I don't have enough Cores on my chip to support that. Utilization of those 780s is showing only about 88-92%. The other 2 cards in there are topped out at 98% utilization.
ID: 1553988 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556191 - Posted: 13 Aug 2014, 15:11:49 UTC - in response to Message 1553979.  

I'm currently 2x on SETIv7 WUs and 1x on AP w/ full HT core dedicated. I'm using the Lunatics for both. Just updated drivers and am seeing that AP requires a full HT core per WU now. I also run WCG on CPU, so that's why I only run 2x on the v7 WUs. Still, Lunatics has increased GPU utilization so that 2 v7 tasks average 95%.


Matt - on my dual 780ti machine, received 3 errors yesterday but had it busy elsewhere. But I note that your RAC for your 780 machine is low, about 5700, and I'm wondering if you just brought it on line or if you run part time. My 780 RAC is now around 25k and still climbing - see

http://setiathome.berkeley.edu/show_host_detail.php?hostid=7354971

for comparisons. I'm still running 5 wu's per GPU (all types) with about 1% validation inconclusive. Not invalids showing.
ID: 1556191 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556208 - Posted: 13 Aug 2014, 16:01:04 UTC - in response to Message 1553988.  

Bill, what is the average time it takes for each AP on your system? I run 2 APs on each of my 780s and my average time is 55 minutes. When no APs are available I run 3 v7 at a time on each.

edit

I could probably do 3 APs on each but I don't have enough Cores on my chip to support that. Utilization of those 780s is showing only about 88-92%. The other 2 cards in there are topped out at 98% utilization.


Sorry ... been off on other things. Not many ap's to average but the 3 completed thus far average to about 70 minutes.

I looked to see the RAC on your 780 but only found 750's. They are showing a remarkable result for the cost. I'm going to be very disappointed having gone with dual 780's rather than quad 750's if I don't reach 80k on this machine. Would have been much cheaper having quad 750's riding an AMD 8 banger like you have. But I'm at 25k presently and the rate of increase remains just off vertical.

What are the other 2 cards (98%) you mention? Perhaps the 780 is buried in that configuration and thus not showing. All threads on my machine are showing 100% CPU but not sure how to view GPU utilization. I noted that the 3.4 GHz CPU is running at 3.6 GHz so apparently I've engaged over-clocking by setting performance at high using the ASUS Dual Intelligent Processors 4 application that came with the motherboard. But tell me how to get to the GPU utilization.

Thanks ...
ID: 1556208 · Report as offensive
Matt

Send message
Joined: 6 Jun 10
Posts: 18
Credit: 1,434,540
RAC: 0
United States
Message 1556225 - Posted: 13 Aug 2014, 16:32:44 UTC - in response to Message 1556191.  

Matt - on my dual 780ti machine, received 3 errors yesterday but had it busy elsewhere. But I note that your RAC for your 780 machine is low, about 5700, and I'm wondering if you just brought it on line or if you run part time.


Bill - Both are correct. I just recently started crunching some SETI work after a long absence and I'm only crunching for SETI part-time.

I have modified my app_config so that I'm running 4 MB WUs concurrently on each GPU. This seems to be the limit before I run out of VRAM (I keep my cards in SLI for gaming, so the available memory for crunching is reduced). I've tried AP both at 1x and 2x but have not yet done enough of those to tell if there is a benefit to running multiple instances of AP on each card.
ID: 1556225 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1556317 - Posted: 13 Aug 2014, 18:34:34 UTC
Last modified: 13 Aug 2014, 18:35:13 UTC

On the normal 780 you could run 3 WU at a time (MB/AP) if you leave enought cores to feed the GPU, maybe the Ti model could handle 4, that´s could be answered by others, my point is, is not just about what the GPU itself can handle, you need to keep cores free to keep them well feeded or the entire host will slow down, especialy on 2xGPU hosts.
ID: 1556317 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556358 - Posted: 13 Aug 2014, 19:43:17 UTC - in response to Message 1556317.  

On the normal 780 you could run 3 WU at a time (MB/AP) if you leave enought cores to feed the GPU, maybe the Ti model could handle 4, that´s could be answered by others, my point is, is not just about what the GPU itself can handle, you need to keep cores free to keep them well feeded or the entire host will slow down, especialy on 2xGPU hosts.


I'm presently running 5 on the dual ti version but have noticed display stalls and at least one driver failure (that auto-recovered). Think I'll be dropping back to 4 after seeing where it peaks at 5 (just brought it online). But I also want to evaluate the effects of reserving a core for feeding the GPU's. How do I go about that?

Thanks ...
ID: 1556358 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1556367 - Posted: 13 Aug 2014, 19:52:18 UTC
Last modified: 13 Aug 2014, 20:11:46 UTC

Ok we could try to help you on that, but before we start to try to optimize the number of free cores what you need is install the new Lunatics code, it will make your AP GPU WU uses a lot less CPU cores and allow us to better handle the control of the GPU and the entire process will be easier.

There are a thread exclusively about the new Lunatics Instaler, check it.

As soon you put the builds work PM or post another msg here and we could start to find the best setting for your particular host.

One point is importat there are a "urban legend" who say more WU at a time per GPU = more production. That´s is true when you come form 1 to 2 WU at a time or maybe 3 on most Fermi´s and up cards, but it´s not totaly true when you try to add more WU because the overhead added by the switching code form one task to the other. Top of the class GPU like the 780Ti could load up to 10 WU at a time (the limit is the memory) but when you run above 4 or maybe 5 all slow down. And remember each host is unique that´s why all say YMMV and test are necessary to find the optimal point on each host.

In the case of the AP (who needs to move large amount of data in and out from the PCIe slot, something we not see with MB) we have another big bottleneck, the PCIe transfer capacity and the latencies. That´s one of the point the devs are working hard to bypass and we all expect news in the future. Be aware, the latency problem grows as your GPU capacity grows. And that problems grows a lot more when you run more than one GPU on the same host. But don´t worry find the optimal point is easy.

Let´s wait for your return to continue.

<edit> To avoid any missunderstandings, i focus on AP optimization first because MB optimization comes allmost automaticaly after that.
ID: 1556367 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556382 - Posted: 13 Aug 2014, 20:31:40 UTC - in response to Message 1556367.  

Ok we could try to help you on that, but before we start to try to optimize the number of free cores what you need is install the new Lunatics code, it will make your AP GPU WU uses a lot less CPU cores and allow us to better handle the control of the GPU and the entire process will be easier.

There are a thread exclusively about the new Lunatics Instaler, check it.

As soon you put the builds work PM or post another msg here and we could start to find the best setting for your particular host.

One point is importat there are a "urban legend" who say more WU at a time per GPU = more production. That´s is true when you come form 1 to 2 WU at a time or maybe 3 on most Fermi´s and up cards, but it´s not totaly true when you try to add more WU because the overhead added by the switching code form one task to the other. Top of the class GPU like the 780Ti could load up to 10 WU at a time (the limit is the memory) but when you run above 4 or maybe 5 all slow down. And remember each host is unique that´s why all say YMMV and test are necessary to find the optimal point on each host.

In the case of the AP (who needs to move large amount of data in and out from the PCIe slot, something we not see with MB) we have another big bottleneck, the PCIe transfer capacity and the latencies. That´s one of the point the devs are working hard to bypass and we all expect news in the future. Be aware, the latency problem grows as your GPU capacity grows. And that problems grows a lot more when you run more than one GPU on the same host. But don´t worry find the optimal point is easy.

Let´s wait for your return to continue.

<edit> To avoid any missunderstandings, i focus on AP optimization first because MB optimization comes allmost automaticaly after that.


Good info, some of which I was aware, but clearly your experience will be useful. Thought I was on the most current Lunatics but see that I was one version behind. Have now upgraded and using 0.42 ... and appreciate your pointing that out. Ready to move on to optimization advice but will take it slowly to await results from each change and be sure I understand what is going on.

Must also upgrade my other machines as well.
ID: 1556382 · Report as offensive
Matt

Send message
Joined: 6 Jun 10
Posts: 18
Credit: 1,434,540
RAC: 0
United States
Message 1556383 - Posted: 13 Aug 2014, 20:40:27 UTC - in response to Message 1556382.  
Last modified: 13 Aug 2014, 20:41:32 UTC

Bill- I see that your 780Ti host is running the latest Nvidia driver - 340.52. It seems that with this driver, AP requires a full core per task. This could limit some of the possible advantages Lunatics offers on AP tasks. I immediately ran into WUs erroring out because they did not have enough dedicated CPU after I updated. I updated my app_config accordingly and have not had any issues since then.
ID: 1556383 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556389 - Posted: 13 Aug 2014, 21:13:09 UTC - in response to Message 1556383.  

Bill- I see that your 780Ti host is running the latest Nvidia driver - 340.52. It seems that with this driver, AP requires a full core per task. This could limit some of the possible advantages Lunatics offers on AP tasks. I immediately ran into WUs erroring out because they did not have enough dedicated CPU after I updated. I updated my app_config accordingly and have not had any issues since then.


I assume from your comments that you are running stock SETI (rather than Lunatics). I've yet to attempt the changes to the stock script which allows setting of both type and number of wu's but feel I should on another (480) machine that is producing quite a few invalids. Don't know which driver is being used on that machine but I will watch for increases in errors on this (780) machine. I note that I've picked up 4 errors today but all are MP. Only 1 other showing ... on the 7th, 1st day online for the 780's. Thanks for the heads up.

If you app_config file isn't too long, would you mind posting here?
ID: 1556389 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1556397 - Posted: 13 Aug 2014, 21:43:46 UTC
Last modified: 13 Aug 2014, 22:11:25 UTC

Ok lets try one step at a time... and please forgiveme for any mistakes... Use this instructions with NV GPUs only.

But before see the Matt warning, and go back to an earlier Nvidia driver, the 340.52 was reported to have some problems with the -use_sleep switch we use to optimize the AP crunching on some hosts.

I use the 337.88 just not sure if it´s compatible with the 780 Ti, you need to check it at NVidia site.

After you downgrade the Nvidia driver to a more compatible one...

I only crunch on GPU so i will focus all on GPU crunching only, then at the end we could try to optimize the CPU too. So if i talk about AP i´m talk about GPU AP.

Lets start the AP GPU crunching tunning first:

1 - The installer build a clean config file for the AP, you must tune it for your GPU card.

2 - There is an help file who explain each parameter in separate careful read it. Mike´s takes a long time to colect the information and most simply ignore but that information is very important. This is the file: AstroPulse_OpenCL_NV_ReadMe.txt

3 - You need to edit the file: ap_cmdline_win_x86_SSE2_OpenCL_NV.txt and add the optimized configuration for your GPU.

This is mine, it´s a lot conservative but works fine with the 670/690/780 try it first and then when you where sure all is working push a little more (your 780Ti could do a lot more) but for the beggining just try to be sure all is working:

-use_sleep -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 1

A Note: There are some other ways to load the config files but i allways prefear to totaly exit the boinc (not just the boincmgr) and restart it again to be sure all is loaded properly.

4 - You will see your CPU time down to about 10-20% of the GPU crunching time. When you do thar you free your CPU core a lot.

5 - With all working keep in mind you could try to push a little more just follow the help file. And not forget if you going to use that in the fute on less powerful GPU´s the parameter need to be adjusted to.

6 - Next step is to try to find the optimal number of WU on your GPU. But for that you need first to be sure the first part is working fine.

PM when you where ready.
ID: 1556397 · Report as offensive
Matt

Send message
Joined: 6 Jun 10
Posts: 18
Credit: 1,434,540
RAC: 0
United States
Message 1556398 - Posted: 13 Aug 2014, 21:46:38 UTC - in response to Message 1556389.  

I'm currently running Lunatics v0.42. My app_config.xml is as follows:

<app_config>
<app>
<name>astropulse_v6</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.375</cpu_usage>
</gpu_versions>
</app>
</app_config>

Currently, I'm only running one instance of AP per GPU. As far as the MB tasks go, I'm not sure what the "sweet spot" for CPU usage per task is. I've been experimenting with different values along with varying numbers of MB tasks running concurrently.
ID: 1556398 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556409 - Posted: 13 Aug 2014, 22:43:51 UTC - in response to Message 1556397.  

Ok lets try one step at a time... and please forgiveme for any mistakes... Use this instructions with NV GPUs only.

But before see the Matt warning, and go back to an earlier Nvidia driver, the 340.52 was reported to have some problems with the -use_sleep switch we use to optimize the AP crunching on some hosts.

I use the 337.88 just not sure if it´s compatible with the 780 Ti, you need to check it at NVidia site.

After you downgrade the Nvidia driver to a more compatible one...

I only crunch on GPU so i will focus all on GPU crunching only, then at the end we could try to optimize the CPU too. So if i talk about AP i´m talk about GPU AP.

Lets start the AP GPU crunching tunning first:

1 - The installer build a clean config file for the AP, you must tune it for your GPU card.

2 - There is an help file who explain each parameter in separate careful read it. Mike´s takes a long time to colect the information and most simply ignore but that information is very important. This is the file: AstroPulse_OpenCL_NV_ReadMe.txt

3 - You need to edit the file: ap_cmdline_win_x86_SSE2_OpenCL_NV.txt and add the optimized configuration for your GPU.

This is mine, it´s a lot conservative but works fine with the 670/690/780 try it first and then when you where sure all is working push a little more (your 780Ti could do a lot more) but for the beggining just try to be sure all is working:

-use_sleep -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 1

A Note: There are some other ways to load the config files but i allways prefear to totaly exit the boinc (not just the boincmgr) and restart it again to be sure all is loaded properly.

4 - You will see your CPU time down to about 10-20% of the GPU crunching time. When you do thar you free your CPU core a lot.

5 - With all working keep in mind you could try to push a little more just follow the help file. And not forget if you going to use that in the fute on less powerful GPU´s the parameter need to be adjusted to.

6 - Next step is to try to find the optimal number of WU on your GPU. But for that you need first to be sure the first part is working fine.

PM when you where ready.


Thanks, Juan. 337.50 came with the GPU's so 337.88 should be fine. The update occurred automatically but I will look to rolling back after the configuration has stabilized. I want to see where it peaks before taking any tuning efforts including the steps you list above. Probably a few days before that occurs but I will get back once I get into tuning.

Appreciate your time ...
ID: 1556409 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556411 - Posted: 13 Aug 2014, 22:49:24 UTC - in response to Message 1556398.  

I'm currently running Lunatics v0.42. My app_config.xml is as follows:

<app_config>
<app>
<name>astropulse_v6</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.375</cpu_usage>
</gpu_versions>
</app>
</app_config>

Currently, I'm only running one instance of AP per GPU. As far as the MB tasks go, I'm not sure what the "sweet spot" for CPU usage per task is. I've been experimenting with different values along with varying numbers of MB tasks running concurrently.


Thanks, Matt. There does not exist an app_config.xml file in my Lunatics configuration but think I remember reading where it will override count parameters in app_info.xml if it exists. Must read some more but at least I have something to work with.
ID: 1556411 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1556413 - Posted: 13 Aug 2014, 22:57:11 UTC - in response to Message 1556409.  
Last modified: 13 Aug 2014, 23:00:50 UTC

Appreciate your time ...

You know where to find me when you be ready to continue.

On last reminder, maybe you allready know but: NEVER leave your host automaticaly update your NVidia driver. Disable that option, you will avoid a lot of headaches. Allways wait some time when a new driver where released to be sure it works with Boinc and you projects. Nvidia drivers (specialy the Gforce) are target for gaming not for crunching so it´s very common to discover some incompatibility. And allways when you going to install a new driver, DL it directly from the Nvidia site and select the clean instalation to be sure nothing for the old driver was leave forgoted.

<edit> Make your changes on the app_config.xml only avoid the old way to edit app_info.xml that will make your configuration easier, with less posibility of error and BTW will allow you to keep your setting if you need to update/reinstall the Lunatics for any reason.
ID: 1556413 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13958
Credit: 208,696,464
RAC: 304
Australia
Message 1556500 - Posted: 14 Aug 2014, 1:43:02 UTC - in response to Message 1556358.  
Last modified: 14 Aug 2014, 1:44:16 UTC

I'm presently running 5 on the dual ti version but have noticed display stalls and at least one driver failure (that auto-recovered). Think I'll be dropping back to 4 after seeing where it peaks at 5 (just brought it online).


Keep in mind that there is the law of diminishing returns. Just because you can run 5 (or 6 or more) doesn't mean it's more productive.

With my GTX750Tis I ran 1, 2 & 3 WUs at a time. The end result was that 2WUs at a time produces the most work per hour. 1 WU at a time crunches a WU the fastest, but running 2 at a time while taking longer to process each WU actually resulted in more work per hour. Running 3 at a time was very close to 2 at a time, but in the end 2 at a time produced more work per hour.
So that's what I went with.

If you wait for RAC to level off for an idea of the effect of any changes, you're looking at 6-8 weeks for things to "stabilise" after you make each change. Keeping track of WU run time allows you to figure things out much more quickly.
Grant
Darwin NT
ID: 1556500 · Report as offensive
Bill Greene
Volunteer tester

Send message
Joined: 3 Jul 99
Posts: 80
Credit: 116,047,529
RAC: 61
United States
Message 1556953 - Posted: 14 Aug 2014, 20:52:28 UTC - in response to Message 1556413.  

Appreciate your time ...

You know where to find me when you be ready to continue.

On last reminder, maybe you allready know but: NEVER leave your host automaticaly update your NVidia driver. Disable that option, you will avoid a lot of headaches. Allways wait some time when a new driver where released to be sure it works with Boinc and you projects. Nvidia drivers (specialy the Gforce) are target for gaming not for crunching so it´s very common to discover some incompatibility. And allways when you going to install a new driver, DL it directly from the Nvidia site and select the clean instalation to be sure nothing for the old driver was leave forgoted.

<edit> Make your changes on the app_config.xml only avoid the old way to edit app_info.xml that will make your configuration easier, with less posibility of error and BTW will allow you to keep your setting if you need to update/reinstall the Lunatics for any reason.


Really haven't devoted much time to GPU activity so your advice well taken. Having just brought on line this dual 780ti machine is cause to pay more attention to detail. Therein lies my effort to seek out the experience of others like yourself. Much thanks.
ID: 1556953 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

Message boards : Number crunching : CUDA Versions


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.