CUDA Versions

Author	Message
Bill Greene Volunteer tester Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61	Message 1558354 - Posted: 17 Aug 2014, 20:54:28 UTC - in response to Message 1558071. Anyone know of a way to put on-board GPU's to work when there exists bus installed GPU's? Eligible on-board GPU's are recognized and it seems a waste that they are setting there idle. Hi SOrry for the delay, i was sleeping, you know time zones and shifts are our enemy. About the Precision i get the msg, iÂ´m happy i was able to give you a hand on that, just one tip be sure to enable itÂ´s automatic fans control and try to run your GPUÂ´s fans in the range of 80%. Thats will make them lasting for a long time. Each GPU has his max temp, but 100C is not a safe temp for sure, i set my max temp at 75C in the case of the 780 for example. I never try to use the iGPUÂ´s, but i imagine is not a good ideia. Why? iGPUs normaly makes the CPU produce a lot of heat, and a lot of heat is bad, it could trotle you CPU clock, or forces your CPU fan to the limit. IMHO the troubles are bigger than a possible gain, your 780Ti has a lot larger crunching capacity than the iGPU, so focus on their optimization and leave the iGPU aside, at least for now. I see your allready crunched AP Wu (on the 780Ti host at least) and seems like your configuration is OK and the -use_sllep is working fine, look the diference on the CPU times. I belive you are ready to the next phase, find the best WU/time could we continue? Believe so, Juan, continuing that is. However, now that with Precision on the 480 machine and seeing the high temps there, I'll be concentrating on getting those temps down. The fans are about 90% so suspect they won't last long. Looking at coolers, namely, the ARCTIC Accelero Xtreme PLUS. Any suggestions? In the meantime, I'm ready for the next step on the 780's. What I learn from this exercise will help with the 480's as well. ID: 1558354 ·

Bill Greene Volunteer tester Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61	Message 1558355 - Posted: 17 Aug 2014, 20:56:43 UTC - in response to Message 1557570. Additionally, I'm unsure how the command switch construct above as inserted in ap_cmdline_win_x86_SSE2_OpenCL_NV.txt is engaged. I assume that it finds the proper place (in app_info.xml?) using aimerge.cmd or, as you suggest, with an exit and restart of Boinc. Neither is necessary. Simply place the command line text you require into the supplied (empty) .txt file, and the application will read and act on it when the next task starts running in the normal course of events. If you are running multiple copies of the application, each will read its own copy of the parameters as each starts its own next task - so don't draw any conclusions from performance measurements until every task instance has completed at least one task starting with the new settings. I altered a command switch parameter on my 480 machine without restarting BOINC and see that it was picked up in line. Thanks. ID: 1558355 ·

Bill Greene Volunteer tester Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61	Message 1558359 - Posted: 17 Aug 2014, 21:05:38 UTC - in response to Message 1558054. At the bottom of precision x you will see the precision log with 3 displays. If you double click anywhere on it, it will open up a new window with multiple stats to view. You can run the built in GPU on your chip, but from what I've heard it's not worth it if you have GPUs via the PCIe. Using it tends to slow down the dedicated GPUs from what I've read. Most that have dedicated GPUs don't tend to use it. If you search the threads you will find what I talk about. If that's the only GPU you have them it's fine to use it, others I'd suggest you concentrate on the dedicated GPUs. My 2 cents Zalster Thanks. Should have tried that. And your advice about the iGPU along with advice from others is cause enough for me to shy away from any effort there; as I'm told, "Just not worth the effort". ID: 1558359 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1558365 - Posted: 17 Aug 2014, 21:14:46 UTC - in response to Message 1558354. Last modified: 17 Aug 2014, 21:29:05 UTC Believe so, Juan, continuing that is. However, now that with Precision on the 480 machine and seeing the high temps there, I'll be concentrating on getting those temps down. The fans are about 90% so suspect they won't last long. Looking at coolers, namely, the ARCTIC Accelero Xtreme PLUS. Any suggestions? Besides the usuals, check your thermal compound (old compounds loose their capacity to transfer the heat), clean all you air paths from dust and check your fan speed, olders fans sometimes stop to run at high speed, my only sugestion is to add few 12-15cm fans directly over the GPU, to increase the volume of cold air, that normaly downs the temp by 5-10C. Remember the 480 is a very hot GPU, no matters what you do with air cooling itÂ´s hard to keep then running cold when you put them to crunch. SteveÂ´s has 2 of them on his host, you could ask him for any aditional sugestions and clues about the operating temperatures. He is a nice friend/mate and one of my drinking companies here. He has a lot technical knowledge on whow to optimize the host/GPU. You could find him at: http://setiathome.berkeley.edu/hosts_user.php?userid=202207 IÂ´m in the middle of my shift who ends at 20:00 hrs (UTC -3), and i canÂ´t write too much now, will post you the instructions to the second phase ASAP but not expect before that. ID: 1558365 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1558366 - Posted: 17 Aug 2014, 21:18:00 UTC - in response to Message 1558359. Last modified: 17 Aug 2014, 21:22:22 UTC At the bottom of precision x you will see the precision log with 3 displays. If you double click anywhere on it, it will open up a new window with multiple stats to view. You can run the built in GPU on your chip, but from what I've heard it's not worth it if you have GPUs via the PCIe. Using it tends to slow down the dedicated GPUs from what I've read. Most that have dedicated GPUs don't tend to use it. If you search the threads you will find what I talk about. If that's the only GPU you have them it's fine to use it, others I'd suggest you concentrate on the dedicated GPUs. My 2 cents Zalster Thanks. Should have tried that. And your advice about the iGPU along with advice from others is cause enough for me to shy away from any effort there; as I'm told, "Just not worth the effort". The iGPU is fast enough, but the testing I have done with Haswell so far has shown that running it with CPU work about doubles the run time of the CPU tasks. I have not run "iGPU + PCIe GPU w/o CPU" as a test as of yet. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1558366 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1558425 - Posted: 17 Aug 2014, 22:43:25 UTC - in response to Message 1558366. Last modified: 17 Aug 2014, 22:44:56 UTC The iGPU is fast enough, but the testing I have done with Haswell so far has shown that running it with CPU work about doubles the run time of the CPU tasks. Will be interesting to see if DX12 (on hardware that can make use of it) makes a difference for crunching. One of the features of DX12 is it is meant to significantly lower CPU usage while providing even better graphic performance. For integrated GPUs where the entire CPU/GPU package has a thermal limit reducing the CPU usage will allow greater iGPU clock speeds. DX12 CPU/iGPU power usage Click on the DX11/12 buttons to compare performance & power usage. In the Demo displayed, DX12 gave a 74% boost in video performance compared to DX11 just from reducing the CPU load, allowing the iGPU to rampup further.[/url] Grant Darwin NT ID: 1558425 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1558435 - Posted: 17 Aug 2014, 23:25:47 UTC - in response to Message 1558425. Last modified: 17 Aug 2014, 23:33:57 UTC The iGPU is fast enough, but the testing I have done with Haswell so far has shown that running it with CPU work about doubles the run time of the CPU tasks. Will be interesting to see if DX12 (on hardware that can make use of it) makes a difference for crunching. One of the features of DX12 is it is meant to significantly lower CPU usage while providing even better graphic performance. For integrated GPUs where the entire CPU/GPU package has a thermal limit reducing the CPU usage will allow greater iGPU clock speeds. DX12 CPU/iGPU power usage Click on the DX11/12 buttons to compare performance & power usage. In the Demo displayed, DX12 gave a 74% boost in video performance compared to DX11 just from reducing the CPU load, allowing the iGPU to rampup further.[/url] THe iGPU wasn't really CPU heavy while processing. It is probably just a limitation of the design that uses shared cache between the CPU & GPU cores. With the Bay Trail not having the issue due to a different cache design. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1558435 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1558450 - Posted: 18 Aug 2014, 0:01:05 UTC - in response to Message 1558354. Last modified: 18 Aug 2014, 0:07:48 UTC In the meantime, I'm ready for the next step on the 780's. What I learn from this exercise will help with the 480's as well. Each one has way to try to optimize the GPUÂ´s this is how i do mines. - First of all i stop all CPU crunching, so the CPU load will not interfear in the test and of course i imagine you are ussing a well configurated r2399 or the right cuda version for your GPU. - With the EVGA Precision program put to work the graph who show you the GPU Usage % (you could use GPU-z or any other similar program). - Put the Boinc to run the CPU & GPU at allways. ItÂ´s easy to do the test running MB than AP since the % blanking affects the measures. - Now go to your app.config.xml and allow to run 2 WU at a time on each GPU <gpu_usage>0.5</gpu_usage> <cpu_usage>0.1</cpu_usage> - Reload the config files and wait some seconds to allow the program stabilize and see the GPU usage. - If the GPU usage not reaches 98% (+/-) restart the test now running 3 WU at a time: <gpu_usage>0.33</gpu_usage> - If still not close to 98% try with 4. Retest. - The best point is when you reach +/- 98% for the first time. There is a point when more WU will not increase the GPU usage if that happening you allready pass the optimal point. - In Most of the cards the optimal point is 2 (670/690 for example) and few runs fine 3 at a time (780). No one of my GPuÂ´s runs 4 faster than 3. Maybe because i use slow I5 to power them, who knows? And iÂ´m realy not care. Of course YMMV but i belive the optimal point is 3 on the 780Ti and 2 on the 480. - After you find the optimal point for the GPUs leave them crunching and letÂ´s try to find the best number of CPU WU for this hosts. - Keep showing the GPU usage % and start 1 (one) CPU task at a time, and keep an eye on the %. - There is a point when you start a new CPU WU and the GPU Usage will droop, then go back one step and that is the optimal number of GPU/CPU WU that could run on this particular hosts. - You could repeat the test now with AP WU, but since you are running r2399 you will see the numbers are very close if not the same (of course not with high blank AP WU). - Save the configuration for future use on this hosts. - You must repeat the test on each host to be sure, itÂ´s a fact, even "twin hosts" (same MB/CPU/GPU/etc) not allways show the same point i never fully understood why. - Most of the test will show you in fast CPU hosts you could run several CPU WU at the same time while crunch on yours GPUs with allmost no decrease in the performance. On the other side, in slower hosts and specialy those with multiple fast GPUÂ´s who runs 2 or more instances the test will show sometimes is better not even run a single CPU WU. (I have several hosts in this situation). Simple no? After you be familiar the test takes only few minutes to compleate and works with allmost any combination os CPU/GPUs but be aware in the case of NV you need to have a Fermi or newer GPU to run ore than 1 WU at a time. Try to run more than 1 WU at a time on older cads simply waste resources and time. Not forget, more WU at a time = more heat so keep an eye on the GPUs/CPU temperature. IÂ´m around if you need anything else. ID: 1558450 ·

Bill Greene Volunteer tester Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61	Message 1558980 - Posted: 19 Aug 2014, 2:11:08 UTC - in response to Message 1558450. In the meantime, I'm ready for the next step on the 780's. What I learn from this exercise will help with the 480's as well. Each one has way to try to optimize the GPUÂ´s this is how i do mines. - First of all i stop all CPU crunching, so the CPU load will not interfear in the test and of course i imagine you are ussing a well configurated r2399 or the right cuda version for your GPU. - With the EVGA Precision program put to work the graph who show you the GPU Usage % (you could use GPU-z or any other similar program). - Put the Boinc to run the CPU & GPU at allways. ItÂ´s easy to do the test running MB than AP since the % blanking affects the measures. - Now go to your app.config.xml and allow to run 2 WU at a time on each GPU <gpu_usage>0.5</gpu_usage> <cpu_usage>0.1</cpu_usage> - Reload the config files and wait some seconds to allow the program stabilize and see the GPU usage. - If the GPU usage not reaches 98% (+/-) restart the test now running 3 WU at a time: <gpu_usage>0.33</gpu_usage> - If still not close to 98% try with 4. Retest. - The best point is when you reach +/- 98% for the first time. There is a point when more WU will not increase the GPU usage if that happening you allready pass the optimal point. - In Most of the cards the optimal point is 2 (670/690 for example) and few runs fine 3 at a time (780). No one of my GPuÂ´s runs 4 faster than 3. Maybe because i use slow I5 to power them, who knows? And iÂ´m realy not care. Of course YMMV but i belive the optimal point is 3 on the 780Ti and 2 on the 480. - After you find the optimal point for the GPUs leave them crunching and letÂ´s try to find the best number of CPU WU for this hosts. - Keep showing the GPU usage % and start 1 (one) CPU task at a time, and keep an eye on the %. - There is a point when you start a new CPU WU and the GPU Usage will droop, then go back one step and that is the optimal number of GPU/CPU WU that could run on this particular hosts. - You could repeat the test now with AP WU, but since you are running r2399 you will see the numbers are very close if not the same (of course not with high blank AP WU). - Save the configuration for future use on this hosts. - You must repeat the test on each host to be sure, itÂ´s a fact, even "twin hosts" (same MB/CPU/GPU/etc) not allways show the same point i never fully understood why. - Most of the test will show you in fast CPU hosts you could run several CPU WU at the same time while crunch on yours GPUs with allmost no decrease in the performance. On the other side, in slower hosts and specialy those with multiple fast GPUÂ´s who runs 2 or more instances the test will show sometimes is better not even run a single CPU WU. (I have several hosts in this situation). Simple no? After you be familiar the test takes only few minutes to compleate and works with allmost any combination os CPU/GPUs but be aware in the case of NV you need to have a Fermi or newer GPU to run ore than 1 WU at a time. Try to run more than 1 WU at a time on older cads simply waste resources and time. Not forget, more WU at a time = more heat so keep an eye on the GPUs/CPU temperature. IÂ´m around if you need anything else. Pretty clear that you have experimented with this procedure enough that you have a good grasp of expectations. On a first reading the logic surely seems sound but I need to study it more and compose a step-by-step process where I document the results each time, i.e., ensure consistency from one wu configuration to the next. I have Precision installed on all machines now so GPU use, temp, etc., are easily obtained. For your information, I'm presently running 4 wu's on each of the 780's which are at -+97% utilization but bot at +-87% power. However the temps are 78C and 71C, a little higher than I would like. To start with, I'm going to drop back 1 wu (3 each) and watch the %use and temps. At least that will give me a starting point. Is there an easier way to turn off CPU workload without re-installing Lunatics? Had a productive dialog with Steve who gave me both some target temps to aim for on the 480's and ideas on how to get there. Gave them both a good cleaning today and found fins on one blocked. At least they are both at the same temp now though still 15-20C to high. Will probably put some coolers on them if space allows. As always, thanks for the advice on all these matters. ID: 1558980 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1558984 - Posted: 19 Aug 2014, 2:26:38 UTC - in response to Message 1558980. Last modified: 19 Aug 2014, 2:38:19 UTC Is there an easier way to turn off CPU workload without re-installing Lunatics? There are few ways, but for me, the easy way and the way that allow me to easy test what i looking for is: Set Boinc to NNT Then manualy suspend all the CPU WU. Do all the test you need, inclusive itÂ´s easy to start one WU at a time with this metodoth and find the optimal number of CPU WU. Just not forget to restart the work flow and the WU activities after that. About the 780 usage, you should test for 3, i belive 3 is the best point, specialy when you crunch AP, actualy i use 3 here but mine 780 is the FTW model not the Ti you have. <LOL ON> DonÂ´t tell the others how i do, certainly i will be hang for that, since this is a totaly out of rules way. <LOL OFF> Nice i was able to help you a little. I know sometimes is hard to understand my poor english. ID: 1558984 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1558989 - Posted: 19 Aug 2014, 2:36:35 UTC - in response to Message 1558984. Is there an easier way to turn off CPU workload without re-installing Lunatics? In the Mananger, Tools, Computing preferences, Processor usage tab, down the bottom is On multiprocessor systems use at most xxx% of the processors. Set it to 50% to only use half of your processors (4 core with HT on that means 4 processes running. With HT off it would be 2 processes). Grant Darwin NT ID: 1558989 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1559162 - Posted: 19 Aug 2014, 21:54:16 UTC Maybe I need to go back and re-read this thread? Did I miss something? How does using the sleep command in the astro_pulse cmdline file effect the gpu processing of the Cuda files? I bet your running a bunch more astropulse than I am. Tom A proud member of the OFA (Old Farts Association). ID: 1559162 ·

FalconFly Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0	Message 1559163 - Posted: 19 Aug 2014, 21:59:19 UTC - in response to Message 1559162. Last modified: 19 Aug 2014, 22:02:39 UTC From my understanding, that switch removes a significant bunch of CPU load (which does effectively nothing for the processing but waits, blocking it as a resource for other CPU loads). It doesn't work as expected on the latest 340.xx Drivers, so to use that switch it's highly recommended to use the previous (337.88) Driver instead. Once it runs, it frees up alot of CPU time; in most standard configurations that will also remove the competition for CPU time on the AP tasks in general, as they're very CPU intensive to keep the GPUs loaded. In effect, I think that will speed up overall performance, especially when running multiple AP tasks/GPU or in Multi-GPU configurations or anytime running AP workunits with heavy blanking in general (which are the most extreme CPU/GPU hogs and take by far the longest time to complete). ID: 1559163 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1559166 - Posted: 19 Aug 2014, 22:06:20 UTC Last modified: 19 Aug 2014, 22:06:37 UTC Yes that makes a lof of diference but only on NV cards (ATI/IGPU not needs) ID: 1559166 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1559175 - Posted: 19 Aug 2014, 22:28:22 UTC - in response to Message 1559166. Yes that makes a lof of diference but only on NV cards (ATI/IGPU not needs) And it only affects the OpenCL NVidia applications - i.e. those for AP tasks - which seem to have taken over this thread, and rather distracted everyone from the CUDA question which started it. So, in answer to Tom: How does using the sleep command in the astro_pulse cmdline file effect the gpu processing of the Cuda files? Not at all - they are separate and distinct applications, and twiddling with one shouldn't affect the other. ID: 1559175 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1559176 - Posted: 19 Aug 2014, 22:43:29 UTC - in response to Message 1559175. Ok, that means I wasn't completely confused. What I am going to do as add the -sleep_time switch to my ap_cmdline_etc.txt file on general principles. It sounds like it will reduce the impact on the whole system when I get the astropulse that fits it. Tom A proud member of the OFA (Old Farts Association). ID: 1559176 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1559213 - Posted: 20 Aug 2014, 1:58:02 UTC - in response to Message 1559176. What I am going to do as add the -sleep_time switch to my ap_cmdline_etc.txt file ... There is no such switch! ReadMe_AstroPulse_OpenCL_NV.txt -use_sleep :Results in additional Sleep() calls to yield CPU to other processes. Can affect performance. Experimentation required. Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1559213 ·

Bill Greene Volunteer tester Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61	Message 1559268 - Posted: 20 Aug 2014, 5:23:58 UTC - in response to Message 1558989. Now that you have pointed that parm. out, there also is the % CPU time which could could be used to throttle down the CPU's as well I think. Had seen those before but never occurred to me that I would ever drop them below 100%. Thanks for tuning me in. ID: 1559268 ·

Bill Greene Volunteer tester Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61	Message 1559282 - Posted: 20 Aug 2014, 5:40:09 UTC - in response to Message 1558984. Is there an easier way to turn off CPU workload without re-installing Lunatics? There are few ways, but for me, the easy way and the way that allow me to easy test what i looking for is: Set Boinc to NNT Then manualy suspend all the CPU WU. Do all the test you need, inclusive itÂ´s easy to start one WU at a time with this metodoth and find the optimal number of CPU WU. Just not forget to restart the work flow and the WU activities after that. About the 780 usage, you should test for 3, i belive 3 is the best point, specialy when you crunch AP, actualy i use 3 here but mine 780 is the FTW model not the Ti you have. <LOL ON> DonÂ´t tell the others how i do, certainly i will be hang for that, since this is a totaly out of rules way. <LOL OFF> Nice i was able to help you a little. I know sometimes is hard to understand my poor english. I'll look into this further, Juan. Haven't gotten around to test profile yet but will get there. Plagued with local power outages and SETI shutdown today so not a good day. But your help has been more than a little; just glad to run into SETI supporters like you. And your english not a problem. I'll get back with conclusions about WU numbers when I've pulled it all together. ID: 1559282 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1559779 - Posted: 20 Aug 2014, 23:43:38 UTC - in response to Message 1559213. What is really interesting is I got it right in my parameter file and screwed up the post. Thank you for the correction! Tom ID: 1559779 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.