Message boards :
Number crunching :
CUDA Versions
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next
Author | Message |
---|---|
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
If your not shore about app_config.xml load Lunatics . Took a hard look at building an appropriate .cfg file but, before starting, decided to compare the samples to app_info.xml produced by Lunatics. Found much of the same constructs there and, given the time and effort, decided to use Lunatics. Had the new machine on line last night with GPU set for 5 WU's (single 780). Hope to bring the other 780 in tonight but am a little concerned that my 900 watt UPS can't handle peaks. Presently at 440 watts so might work. Picked up a few 'validation inconclusive' results so far but may be due to all the apps and driver installs, restarts, etc. Letting it set at 5 WU's for now but need to address that UPS issue. Thanks to all who have provided advice, suggestions, education, etc. Have learned much. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Hello Everyone, Matt, how many simul. WU's are you running on the 780's? Am presently running 5 on my recently installed 780ti but would like to hear from others on this. |
Matt Send message Joined: 6 Jun 10 Posts: 18 Credit: 1,434,540 RAC: 0 ![]() |
I'm currently 2x on SETIv7 WUs and 1x on AP w/ full HT core dedicated. I'm using the Lunatics for both. Just updated drivers and am seeing that AP requires a full HT core per WU now. I also run WCG on CPU, so that's why I only run 2x on the v7 WUs. Still, Lunatics has increased GPU utilization so that 2 v7 tasks average 95%. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Bill, what is the average time it takes for each AP on your system? I run 2 APs on each of my 780s and my average time is 55 minutes. When no APs are available I run 3 v7 at a time on each. edit I could probably do 3 APs on each but I don't have enough Cores on my chip to support that. Utilization of those 780s is showing only about 88-92%. The other 2 cards in there are topped out at 98% utilization. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
I'm currently 2x on SETIv7 WUs and 1x on AP w/ full HT core dedicated. I'm using the Lunatics for both. Just updated drivers and am seeing that AP requires a full HT core per WU now. I also run WCG on CPU, so that's why I only run 2x on the v7 WUs. Still, Lunatics has increased GPU utilization so that 2 v7 tasks average 95%. Matt - on my dual 780ti machine, received 3 errors yesterday but had it busy elsewhere. But I note that your RAC for your 780 machine is low, about 5700, and I'm wondering if you just brought it on line or if you run part time. My 780 RAC is now around 25k and still climbing - see http://setiathome.berkeley.edu/show_host_detail.php?hostid=7354971 for comparisons. I'm still running 5 wu's per GPU (all types) with about 1% validation inconclusive. Not invalids showing. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Bill, what is the average time it takes for each AP on your system? I run 2 APs on each of my 780s and my average time is 55 minutes. When no APs are available I run 3 v7 at a time on each. Sorry ... been off on other things. Not many ap's to average but the 3 completed thus far average to about 70 minutes. I looked to see the RAC on your 780 but only found 750's. They are showing a remarkable result for the cost. I'm going to be very disappointed having gone with dual 780's rather than quad 750's if I don't reach 80k on this machine. Would have been much cheaper having quad 750's riding an AMD 8 banger like you have. But I'm at 25k presently and the rate of increase remains just off vertical. What are the other 2 cards (98%) you mention? Perhaps the 780 is buried in that configuration and thus not showing. All threads on my machine are showing 100% CPU but not sure how to view GPU utilization. I noted that the 3.4 GHz CPU is running at 3.6 GHz so apparently I've engaged over-clocking by setting performance at high using the ASUS Dual Intelligent Processors 4 application that came with the motherboard. But tell me how to get to the GPU utilization. Thanks ... |
Matt Send message Joined: 6 Jun 10 Posts: 18 Credit: 1,434,540 RAC: 0 ![]() |
Matt - on my dual 780ti machine, received 3 errors yesterday but had it busy elsewhere. But I note that your RAC for your 780 machine is low, about 5700, and I'm wondering if you just brought it on line or if you run part time. Bill - Both are correct. I just recently started crunching some SETI work after a long absence and I'm only crunching for SETI part-time. I have modified my app_config so that I'm running 4 MB WUs concurrently on each GPU. This seems to be the limit before I run out of VRAM (I keep my cards in SLI for gaming, so the available memory for crunching is reduced). I've tried AP both at 1x and 2x but have not yet done enough of those to tell if there is a benefit to running multiple instances of AP on each card. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
On the normal 780 you could run 3 WU at a time (MB/AP) if you leave enought cores to feed the GPU, maybe the Ti model could handle 4, that´s could be answered by others, my point is, is not just about what the GPU itself can handle, you need to keep cores free to keep them well feeded or the entire host will slow down, especialy on 2xGPU hosts. ![]() |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
On the normal 780 you could run 3 WU at a time (MB/AP) if you leave enought cores to feed the GPU, maybe the Ti model could handle 4, that´s could be answered by others, my point is, is not just about what the GPU itself can handle, you need to keep cores free to keep them well feeded or the entire host will slow down, especialy on 2xGPU hosts. I'm presently running 5 on the dual ti version but have noticed display stalls and at least one driver failure (that auto-recovered). Think I'll be dropping back to 4 after seeing where it peaks at 5 (just brought it online). But I also want to evaluate the effects of reserving a core for feeding the GPU's. How do I go about that? Thanks ... |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Ok we could try to help you on that, but before we start to try to optimize the number of free cores what you need is install the new Lunatics code, it will make your AP GPU WU uses a lot less CPU cores and allow us to better handle the control of the GPU and the entire process will be easier. There are a thread exclusively about the new Lunatics Instaler, check it. As soon you put the builds work PM or post another msg here and we could start to find the best setting for your particular host. One point is importat there are a "urban legend" who say more WU at a time per GPU = more production. That´s is true when you come form 1 to 2 WU at a time or maybe 3 on most Fermi´s and up cards, but it´s not totaly true when you try to add more WU because the overhead added by the switching code form one task to the other. Top of the class GPU like the 780Ti could load up to 10 WU at a time (the limit is the memory) but when you run above 4 or maybe 5 all slow down. And remember each host is unique that´s why all say YMMV and test are necessary to find the optimal point on each host. In the case of the AP (who needs to move large amount of data in and out from the PCIe slot, something we not see with MB) we have another big bottleneck, the PCIe transfer capacity and the latencies. That´s one of the point the devs are working hard to bypass and we all expect news in the future. Be aware, the latency problem grows as your GPU capacity grows. And that problems grows a lot more when you run more than one GPU on the same host. But don´t worry find the optimal point is easy. Let´s wait for your return to continue. <edit> To avoid any missunderstandings, i focus on AP optimization first because MB optimization comes allmost automaticaly after that. ![]() |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Ok we could try to help you on that, but before we start to try to optimize the number of free cores what you need is install the new Lunatics code, it will make your AP GPU WU uses a lot less CPU cores and allow us to better handle the control of the GPU and the entire process will be easier. Good info, some of which I was aware, but clearly your experience will be useful. Thought I was on the most current Lunatics but see that I was one version behind. Have now upgraded and using 0.42 ... and appreciate your pointing that out. Ready to move on to optimization advice but will take it slowly to await results from each change and be sure I understand what is going on. Must also upgrade my other machines as well. |
Matt Send message Joined: 6 Jun 10 Posts: 18 Credit: 1,434,540 RAC: 0 ![]() |
Bill- I see that your 780Ti host is running the latest Nvidia driver - 340.52. It seems that with this driver, AP requires a full core per task. This could limit some of the possible advantages Lunatics offers on AP tasks. I immediately ran into WUs erroring out because they did not have enough dedicated CPU after I updated. I updated my app_config accordingly and have not had any issues since then. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Bill- I see that your 780Ti host is running the latest Nvidia driver - 340.52. It seems that with this driver, AP requires a full core per task. This could limit some of the possible advantages Lunatics offers on AP tasks. I immediately ran into WUs erroring out because they did not have enough dedicated CPU after I updated. I updated my app_config accordingly and have not had any issues since then. I assume from your comments that you are running stock SETI (rather than Lunatics). I've yet to attempt the changes to the stock script which allows setting of both type and number of wu's but feel I should on another (480) machine that is producing quite a few invalids. Don't know which driver is being used on that machine but I will watch for increases in errors on this (780) machine. I note that I've picked up 4 errors today but all are MP. Only 1 other showing ... on the 7th, 1st day online for the 780's. Thanks for the heads up. If you app_config file isn't too long, would you mind posting here? |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Ok lets try one step at a time... and please forgiveme for any mistakes... Use this instructions with NV GPUs only. But before see the Matt warning, and go back to an earlier Nvidia driver, the 340.52 was reported to have some problems with the -use_sleep switch we use to optimize the AP crunching on some hosts. I use the 337.88 just not sure if it´s compatible with the 780 Ti, you need to check it at NVidia site. After you downgrade the Nvidia driver to a more compatible one... I only crunch on GPU so i will focus all on GPU crunching only, then at the end we could try to optimize the CPU too. So if i talk about AP i´m talk about GPU AP. Lets start the AP GPU crunching tunning first: 1 - The installer build a clean config file for the AP, you must tune it for your GPU card. 2 - There is an help file who explain each parameter in separate careful read it. Mike´s takes a long time to colect the information and most simply ignore but that information is very important. This is the file: AstroPulse_OpenCL_NV_ReadMe.txt 3 - You need to edit the file: ap_cmdline_win_x86_SSE2_OpenCL_NV.txt and add the optimized configuration for your GPU. This is mine, it´s a lot conservative but works fine with the 670/690/780 try it first and then when you where sure all is working push a little more (your 780Ti could do a lot more) but for the beggining just try to be sure all is working: -use_sleep -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 1 A Note: There are some other ways to load the config files but i allways prefear to totaly exit the boinc (not just the boincmgr) and restart it again to be sure all is loaded properly. 4 - You will see your CPU time down to about 10-20% of the GPU crunching time. When you do thar you free your CPU core a lot. 5 - With all working keep in mind you could try to push a little more just follow the help file. And not forget if you going to use that in the fute on less powerful GPU´s the parameter need to be adjusted to. 6 - Next step is to try to find the optimal number of WU on your GPU. But for that you need first to be sure the first part is working fine. PM when you where ready. ![]() |
Matt Send message Joined: 6 Jun 10 Posts: 18 Credit: 1,434,540 RAC: 0 ![]() |
I'm currently running Lunatics v0.42. My app_config.xml is as follows: <app_config> <app> <name>astropulse_v6</name> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>.25</gpu_usage> <cpu_usage>.375</cpu_usage> </gpu_versions> </app> </app_config> Currently, I'm only running one instance of AP per GPU. As far as the MB tasks go, I'm not sure what the "sweet spot" for CPU usage per task is. I've been experimenting with different values along with varying numbers of MB tasks running concurrently. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Ok lets try one step at a time... and please forgiveme for any mistakes... Use this instructions with NV GPUs only. Thanks, Juan. 337.50 came with the GPU's so 337.88 should be fine. The update occurred automatically but I will look to rolling back after the configuration has stabilized. I want to see where it peaks before taking any tuning efforts including the steps you list above. Probably a few days before that occurs but I will get back once I get into tuning. Appreciate your time ... |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
I'm currently running Lunatics v0.42. My app_config.xml is as follows: Thanks, Matt. There does not exist an app_config.xml file in my Lunatics configuration but think I remember reading where it will override count parameters in app_info.xml if it exists. Must read some more but at least I have something to work with. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Appreciate your time ... You know where to find me when you be ready to continue. On last reminder, maybe you allready know but: NEVER leave your host automaticaly update your NVidia driver. Disable that option, you will avoid a lot of headaches. Allways wait some time when a new driver where released to be sure it works with Boinc and you projects. Nvidia drivers (specialy the Gforce) are target for gaming not for crunching so it´s very common to discover some incompatibility. And allways when you going to install a new driver, DL it directly from the Nvidia site and select the clean instalation to be sure nothing for the old driver was leave forgoted. <edit> Make your changes on the app_config.xml only avoid the old way to edit app_info.xml that will make your configuration easier, with less posibility of error and BTW will allow you to keep your setting if you need to update/reinstall the Lunatics for any reason. ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13958 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I'm presently running 5 on the dual ti version but have noticed display stalls and at least one driver failure (that auto-recovered). Think I'll be dropping back to 4 after seeing where it peaks at 5 (just brought it online). Keep in mind that there is the law of diminishing returns. Just because you can run 5 (or 6 or more) doesn't mean it's more productive. With my GTX750Tis I ran 1, 2 & 3 WUs at a time. The end result was that 2WUs at a time produces the most work per hour. 1 WU at a time crunches a WU the fastest, but running 2 at a time while taking longer to process each WU actually resulted in more work per hour. Running 3 at a time was very close to 2 at a time, but in the end 2 at a time produced more work per hour. So that's what I went with. If you wait for RAC to level off for an idea of the effect of any changes, you're looking at 6-8 weeks for things to "stabilise" after you make each change. Keeping track of WU run time allows you to figure things out much more quickly. Grant Darwin NT |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Appreciate your time ... Really haven't devoted much time to GPU activity so your advice well taken. Having just brought on line this dual 780ti machine is cause to pay more attention to detail. Therein lies my effort to seek out the experience of others like yourself. Much thanks. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.