NV GPU - AP bench test run (e.g. @ GT730)

Author	Message
Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1681955 - Posted: 21 May 2015, 0:57:02 UTC Last modified: 21 May 2015, 1:03:20 UTC Small progress report ... Fastest settings on my PC (J1900 + GT730), until now: -v 0 -unroll 9 -ffa_block 896 -ffa_block_fetch 448 -tune 1 16 4 1 -use_sleep -cpu_lock -instances_per_device 2 -hp [CUDA/0.5 in app_info.xml file] I tested following: -tune 1 8 4 1 -tune 1 16 2 1 -tune 1 16 4 1 -tune 1 32 2 1 -tune 1 32 4 1 -tune 1 64 2 1 -tune 1 64 4 1 ... and like above mentioned '-tune 1 16 4 1' is fastest. (other values are possible which I should test?) I tested following: -tune 2 8 1 1 -tune 2 16 1 1 -tune 2 32 1 1 -tune 2 64 1 1 -tune 2 128 1 1 -tune 2 256 1 1 ... and no settings decreased the calculation times. Like above mentioned no '-tune 2 N N N' in usage. What are the default settings which use the AP v7.10 (r2887) GPU app automatically? (other values are possible which I should test?) Thanks. ID: 1681955 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1687197 - Posted: 3 Jun 2015, 5:32:02 UTC Last modified: 3 Jun 2015, 5:35:52 UTC OK, on my Intel Celeron CPU J1900 with NVIDIA GeForce GT 730 (with PCIe 2.0 x1 plug & speed): app: astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe (r2887) The start with the default settings (bench test run with 2 AP WUs/GPU): -v 0 -unroll 2 -ffa_block 512 -ffa_block_fetch 256 -use_sleep -cpu_lock -instances_per_device 2 -hp Elapsed 565.38 secs CPU 11.97 secs The finally fastest settings (bench test run with 2 AP WUs/GPU): -v 0 -unroll 9 -ffa_block 896 -ffa_block_fetch 448 -use_sleep -cpu_lock -instances_per_device 2 -hp -tune 1 16 4 1 -oclFFT_plan 256 32 64 Elapsed 383.53 secs CPU 8.37 secs 'Live' this means 2 AP WUs/GPU in 7.5 hrs (default) down to 6 hrs (fastest settings). - - - - - - - - - - If I would like to know the fastest settings for 1 AP WU/GPU (with and without -use_sleep) ... I start the bench test run (without -use_sleep) with -unrull, -ffa_* ... and so on ... Then I found the fastest settings for without -use_sleep. At the end I test the fastest found settings then with -use_sleep, I found also the fastest settings with -use_sleep? Or I need to start again from scratch with -use_sleep from the beginning and need to test all possible settings & values again? - - - - - - - - - - Finally ... For to find the fastest settings for 1 AP WU/GPU (with and without -use_sleep), I need just 1 (at the end -use_sleep test) or 2 (1 without 1 with -use_sleep from beginning) whole bench test runs? Thanks. ID: 1687197 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1687244 - Posted: 3 Jun 2015, 8:18:46 UTC - in response to Message 1687197. -use_sleep favors bigger kernel sizes to reduce overhead. So, quite possible that -use_sleep would shift best point in parameter space. ID: 1687244 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1687538 - Posted: 4 Jun 2015, 1:34:57 UTC - in response to Message 1687244. Last modified: 4 Jun 2015, 1:37:32 UTC I unfortunately don't understand ... I start to find fastest settings, with: 1. -unroll N 2. -ffa_block N & -ffa_block_fetch N 3. -tune 1 N N N 4. -tune 2 N N N 5. -oclFFT_plan N N N After which test I should continue with 2 separate test runs (1 with -use_sleep 1 without -use_sleep)? After the 2., at point 3 continue with 2 separate test runs? With other words ... 1st I test 1. up to 5. without -use_sleep. Where I should start the 2nd run with usage of -use_sleep? At point 3? Thanks. ID: 1687538 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1687644 - Posted: 4 Jun 2015, 8:23:01 UTC - in response to Message 1687538. I unfortunately don't understand ... I start to find fastest settings, with: 1. -unroll N 2. -ffa_block N & -ffa_block_fetch N 3. -tune 1 N N N 4. -tune 2 N N N 5. -oclFFT_plan N N N After which test I should continue with 2 separate test runs (1 with -use_sleep 1 without -use_sleep)? After the 2., at point 3 continue with 2 separate test runs? With other words ... 1st I test 1. up to 5. without -use_sleep. Where I should start the 2nd run with usage of -use_sleep? At point 3? Thanks. No point 2. You get bigger kernels increasing ffa_block and ffa_block_fetch. The 730 only has 2 compute units which means -unroll between 4 - 6 at max. If you read the read me you had found my tuning tips. -use_sleep -unroll 4 -ffa_block 2048 -ffa_block_fetch 1024 plus additional tune params. With each crime and every kindness we birth our future. ID: 1687644 ·

KLiK Volunteer tester Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60	Message 1688048 - Posted: 5 Jun 2015, 10:24:23 UTC does your GT730 also does only cuda50 WUs? non-profit org. Play4Life in Zagreb, Croatia, EU ID: 1688048 ·

KLiK Volunteer tester Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60	Message 1696032 - Posted: 26 Jun 2015, 20:15:02 UTC also, don't put v353.30 drivers on GT730...mine releases an error from time to time, when I wake up the GPU from all day crunching (arrive from work or wake up 4 example)... ;) non-profit org. Play4Life in Zagreb, Croatia, EU ID: 1696032 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1696034 - Posted: 26 Jun 2015, 20:19:04 UTC - in response to Message 1696032. What kind of errors ware you seeing with driver 353.30? Can you go into more detail as to what it does and what kind of errors? ID: 1696034 ·

KLiK Volunteer tester Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60	Message 1696059 - Posted: 26 Jun 2015, 22:35:19 UTC - in response to Message 1696034. What kind of errors ware you seeing with driver 353.30? Can you go into more detail as to what it does and what kind of errors? there was sthg about memory error...written in BOINC tasks! the task was restarted & finished OK. but nothing is in Log?! strange! non-profit org. Play4Life in Zagreb, Croatia, EU ID: 1696059 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1696065 - Posted: 26 Jun 2015, 23:00:36 UTC - in response to Message 1696059. KLiK, have you told your computer not to put the hard drive to sleep? Also, next time you see the error, copy it so we can know more specifically what it is referring to ID: 1696065 ·

KLiK Volunteer tester Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60	Message 1696074 - Posted: 26 Jun 2015, 23:27:48 UTC - in response to Message 1696065. KLiK, have you told your computer not to put the hard drive to sleep? Also, next time you see the error, copy it so we can know more specifically what it is referring to HDD never goes to sleep on any of my computers...that is just another way to kill them PROMPTLY! I'll make a PrtScr...no worry! non-profit org. Play4Life in Zagreb, Croatia, EU ID: 1696074 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1696205 - Posted: 27 Jun 2015, 11:33:30 UTC Last modified: 27 Jun 2015, 11:41:50 UTC Because of the last messages/questions... I made a quick test with Fred's SetiPerformance (I added manually the cuda42 app) (count 1.0): x41zc_cuda50: 755 Seconds (12 Minutes, 35 Seconds) x41zc_cuda42: 784 Seconds (13 Minutes, 4 Seconds) So it looks like the cuda50 app is faster (at least on my system). In GPU-Z it shows GPU Load just ~80% with count 1.0. So I let run count 0.5 (2 SETI (MB) project tasks simultaneously) on my NV GT730 (PCIe 2.0 x1 plug & slot). (After a quick test with Fred's SetiPerformance it looks like 2 SETI (MB) project tasks simultaneously on my GT730 give highest overall (CPU & NV GPU) RAC (on my system). 3 SETI (MB) project tasks simultaneously are also possible, but just +1% RAC for the NV GPU. In the same time -1.9% CPU RAC, because more CPU time usage of the 3rd GPU app.) @ KLiK I see you let run just the stock GPU apps. If you don't want to install the opti apps of the Lunatics crew you should let run at least also 2 SETI WUs/NV GT730. If I edit my Milkyway app_config.xml file (which work there very well) the SETI app_config.xml file entries could/should work with: <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.1</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.006</cpu_usage> </gpu_versions> </app> </app_config> (cpu_usage from my PC, maybe you take 0.04 like in Lunatics' app_info.xml files) This means: 2 SETI WUs/GPU 1 AP WU/GPU (Because of CreditNew I let run 2 SETI and 1 AP WUs/GPU, because if 1 SETI and 1 AP WU simultaneously, the AP WU is ~50% faster. This would screw up the 'average processing rate x.x GFLOPS'. I made bench test runs on my NV GT730 and the fastest settings are: 2 AP WUs/GPU: -v 0 -unroll 9 -ffa_block 896 -ffa_block_fetch 448 -tune 1 16 4 1 -oclFFT_plan 256 32 64 -use_sleep -cpu_lock -instances_per_device 2 -hp 1 AP WU/GPU: -v 0 -unroll 11 -ffa_block 1792 -ffa_block_fetch 448 -tune 1 16 4 1 -tune 2 16 1 1 -oclFFT_plan 256 16 256 -use_sleep -cpu_lock -instances_per_device 1 -hp I have opti apps installed (app_info.xml file usage), so this cmdline entries are in the ap_cmdline_win_x86_SSE2_OpenCL_NV.txt file at my PC. I guess if stock you should find this file also in the setiathome.berkeley.edu project folder. Open it with Notebad and insert the entries and save. In the setiathome.berkeley.edu project folder is also a mbcuda.cfg file. Open it with Notepad and delete the ; at the beginning in the line of: processpriority = abovenormal ...and save. Then the SETI CUDA app get better CPU support (maybe faser calculation on GPU). I can't install the NV GPU driver with help of the NV driver .exe. If I execute this install tool, it don't find the GT730 (it say: 'no NVidia VGA card found'). I have the screen connected to the Intel iGPU output. The GT730 is the secondary VGA card and nothing connected, also no VGA dummy plug. I guess this is the problem that the NV driver .exe don't find a NV VGA card. I guess Windows 8.1 installed a 'Windows certified' driver (347.52) or something. I have no idea where this driver came from. I have not downloaded it, and Windows had just ~10 seconds to search and download (100+MB not possible with my DSL2000 in this short time). So I have no idea which kind of driver this is (Windows/Software no MB size shown) which is installed now (maybe a 'shortened' driver?). So I can install a new NV driver just if Windows say 'a new driver available' (I don't know if this could happen, if Windows have this function) or I go to the device-manager and let search a new driver there over properties. ID: 1696205 ·

KLiK Volunteer tester Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60	Message 1696715 - Posted: 29 Jun 2015, 12:25:23 UTC How to add manualy CUDA42 or CUDA50? Nope, I've got about 80-90% of Quadro2000 or GT730 PCIe x8...so all graphics work just fine with that extra 4 me...& still a GPU does the job! If it's less than 80% on SETi@home single WU - then I'll work with app_config.xml to make a dual WU cruncher out of my GT 730! Right now I'm findinga a way to upgrade another computer, so it can crunch on PCIe x16...that card would need some tweeking! Preferably, it would be GTX 750 Ti from EGA... ;) non-profit org. Play4Life in Zagreb, Croatia, EU ID: 1696715 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1696783 - Posted: 29 Jun 2015, 17:19:52 UTC - in response to Message 1696715. Have you ever used Lunatics? You can choose which version of Cuda to use to crunch your work. If you don't want to install all of the apps, you could go to Mike's World and download only that 1 app for cuda 50. I would link you but I have the old link. Someone else can provide you with Mikes. If you want Lunatics I believe there is a link at that top of Number Crunching. I believe the last version is 43a ID: 1696783 ·

Rasputin42 Volunteer tester Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0	Message 1696787 - Posted: 29 Jun 2015, 17:53:53 UTC Last modified: 29 Jun 2015, 17:54:13 UTC http://lunatics.kwsn.info/index.php?module=Downloads ID: 1696787 ·

KLiK Volunteer tester Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60	Message 1697368 - Posted: 1 Jul 2015, 16:07:33 UTC - in response to Message 1696065. KLiK, have you told your computer not to put the hard drive to sleep? Also, next time you see the error, copy it so we can know more specifically what it is referring to here is a link to the web pic: https://www.dropbox.com/s/dpen9mun7rxe4z3/screenshot%202015-06-30.jpg?dl=0 non-profit org. Play4Life in Zagreb, Croatia, EU ID: 1697368 ·

KLiK Volunteer tester Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60	Message 1697639 - Posted: 2 Jul 2015, 6:55:24 UTC - in response to Message 1696205. Last modified: 2 Jul 2015, 6:55:41 UTC @ KLiK I see you let run just the stock GPU apps. If you don't want to install the opti apps of the Lunatics crew you should let run at least also 2 SETI WUs/NV GT730. If I edit my Milkyway app_config.xml file (which work there very well) the SETI app_config.xml file entries could/should work with: <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.1</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.006</cpu_usage> </gpu_versions> </app> </app_config> (cpu_usage from my PC, maybe you take 0.04 like in Lunatics' app_info.xml files) This means: 2 SETI WUs/GPU 1 AP WU/GPU just put this in app_config.xml & crunch now 2x intel HD2500 with 2x nVidia Quadro 2000 on an work computer... same will go on a computer & home with GT730! but, can someone tell me how to include CUDA42 & CUDA50 & CUDA32 for my cards...without using 3rd party apps? non-profit org. Play4Life in Zagreb, Croatia, EU ID: 1697639 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1697679 - Posted: 2 Jul 2015, 9:52:52 UTC - in response to Message 1697639. Last modified: 2 Jul 2015, 10:08:52 UTC I myself wrote: (...) I made a quick test with Fred's SetiPerformance (I added manually the cuda42 app) (count 1.0): x41zc_cuda50: 755 Seconds (12 Minutes, 35 Seconds) x41zc_cuda42: 784 Seconds (13 Minutes, 4 Seconds) So it looks like the cuda50 app is faster (at least on my system). (...) Like I wrote, I added the cuda42 app to Fred's tool Â»SetiPerformanceÂ«. Just for to look if cuda42 or cuda50 is faster on my GT730. [ with this tool you can test which value of simultaneously WUs give highest performance (RAC) on your GPU.] cuda50 is fastest on my GT730 (PC system). If you add an app_config.xml file to the setiathome.berkeley.edu folder with above mentioned (0.04 for <cpu_usage>) BOINC will let run the cuda app which is fastest on your GPU (PC system). The app with highest 'Host/Application details/Average processing rate' GFLOPS value. But, if you let run WUs also on your Intel HD Graphics (4 compute units) and Intel HD2500 (6 compute units) you should let run (test first) just 1 WU/GPU there. Look in GPU-Z 'GPU Load' how much the values are. On my Intel HD Graphics I let run just 1 SETI or AP WU (95-100% GPU Load). <app_config> <app_version> <app_name>setiathome_v7</app_name> <plan_class>cuda50</plan_class> <avg_ncpus>0.04</avg_ncpus> <ngpus>0.5</ngpus> </app_version> </app_config> This app_config.xml file make that 2 SETI WUs/NVidia GPU run simultaneously. AFAIK, then 1 AP WU/NV GPU and 1 SETI or 1 AP on Intel iGPU automatically. ID: 1697679 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1697683 - Posted: 2 Jul 2015, 10:08:04 UTC - in response to Message 1697679. Last modified: 2 Jul 2015, 10:17:24 UTC If you let run all stock, maybe you should add all (cuda22, cuda23, cuda32, cuda42 and cuda50) to your app_config.xml file. Because if SETI send you (to above mentioned entries) 'cuda22 up to cuda42 WUs', they run much faster (because just 1 WU/GPU) and then the 'Average processing rate' is much higher. For to adjust all apps' 'Average processing rates' your app_config.xml file could/should look like (AFAIK): <app_config> <app_version> <app_name>setiathome_v7</app_name> <plan_class>cuda50</plan_class> <avg_ncpus>0.04</avg_ncpus> <ngpus>0.5</ngpus> </app_version> <app_version> <app_name>setiathome_v7</app_name> <plan_class>cuda42</plan_class> <avg_ncpus>0.04</avg_ncpus> <ngpus>0.5</ngpus> </app_version> <app_version> <app_name>setiathome_v7</app_name> <plan_class>cuda32</plan_class> <avg_ncpus>0.04</avg_ncpus> <ngpus>0.5</ngpus> </app_version> <app_version> <app_name>setiathome_v7</app_name> <plan_class>cuda23</plan_class> <avg_ncpus>0.04</avg_ncpus> <ngpus>0.5</ngpus> </app_version> <app_version> <app_name>setiathome_v7</app_name> <plan_class>cuda22</plan_class> <avg_ncpus>0.04</avg_ncpus> <ngpus>0.5</ngpus> </app_version> </app_config> ID: 1697683 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1697684 - Posted: 2 Jul 2015, 10:15:51 UTC - in response to Message 1697683. Last modified: 2 Jul 2015, 10:16:36 UTC Also, you could add: -use_sleep -hp ...to your ap_cmdline_win_x86_SSE2_OpenCL_NV.txt files. (Or named similar, but with ap_cmdline and NV). Then the NV AP GPU app don't use the whole CPU-thread and get higher priority. ID: 1697684 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.