NV GPU - AP bench test run (e.g. @ GT730)

Message boards : Number crunching : NV GPU - AP bench test run (e.g. @ GT730)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1681955 - Posted: 21 May 2015, 0:57:02 UTC
Last modified: 21 May 2015, 1:03:20 UTC

Small progress report ...
Fastest settings on my PC (J1900 + GT730), until now:
-v 0 -unroll 9 -ffa_block 896 -ffa_block_fetch 448 -tune 1 16 4 1 -use_sleep -cpu_lock -instances_per_device 2 -hp
[CUDA/0.5 in app_info.xml file]

I tested following:
-tune 1 8 4 1
-tune 1 16 2 1
-tune 1 16 4 1
-tune 1 32 2 1
-tune 1 32 4 1
-tune 1 64 2 1
-tune 1 64 4 1
... and like above mentioned '-tune 1 16 4 1' is fastest.
(other values are possible which I should test?)

I tested following:
-tune 2 8 1 1
-tune 2 16 1 1
-tune 2 32 1 1
-tune 2 64 1 1
-tune 2 128 1 1
-tune 2 256 1 1
... and no settings decreased the calculation times.
Like above mentioned no '-tune 2 N N N' in usage. What are the default settings which use the AP v7.10 (r2887) GPU app automatically?
(other values are possible which I should test?)

Thanks.
ID: 1681955 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1687197 - Posted: 3 Jun 2015, 5:32:02 UTC
Last modified: 3 Jun 2015, 5:35:52 UTC

OK, on my Intel Celeron CPU J1900 with NVIDIA GeForce GT 730 (with PCIe 2.0 x1 plug & speed):

app: astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe (r2887)

The start with the default settings (bench test run with 2 AP WUs/GPU):
-v 0 -unroll 2 -ffa_block 512 -ffa_block_fetch 256 -use_sleep -cpu_lock -instances_per_device 2 -hp
Elapsed 565.38 secs CPU 11.97 secs

The finally fastest settings (bench test run with 2 AP WUs/GPU):
-v 0 -unroll 9 -ffa_block 896 -ffa_block_fetch 448 -use_sleep -cpu_lock -instances_per_device 2 -hp -tune 1 16 4 1 -oclFFT_plan 256 32 64
Elapsed 383.53 secs CPU 8.37 secs

'Live' this means 2 AP WUs/GPU in 7.5 hrs (default) down to 6 hrs (fastest settings).

- - - - - - - - - -
If I would like to know the fastest settings for 1 AP WU/GPU (with and without -use_sleep) ...

I start the bench test run (without -use_sleep) with -unrull, -ffa_* ... and so on ...
Then I found the fastest settings for without -use_sleep.

At the end I test the fastest found settings then with -use_sleep, I found also the fastest settings with -use_sleep?

Or I need to start again from scratch with -use_sleep from the beginning and need to test all possible settings & values again?
- - - - - - - - - -
Finally ...
For to find the fastest settings for 1 AP WU/GPU (with and without -use_sleep), I need just 1 (at the end -use_sleep test) or 2 (1 without 1 with -use_sleep from beginning) whole bench test runs?

Thanks.
ID: 1687197 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1687244 - Posted: 3 Jun 2015, 8:18:46 UTC - in response to Message 1687197.  

-use_sleep favors bigger kernel sizes to reduce overhead. So, quite possible that -use_sleep would shift best point in parameter space.
ID: 1687244 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1687538 - Posted: 4 Jun 2015, 1:34:57 UTC - in response to Message 1687244.  
Last modified: 4 Jun 2015, 1:37:32 UTC

I unfortunately don't understand ...

I start to find fastest settings, with:
1. -unroll N
2. -ffa_block N & -ffa_block_fetch N
3. -tune 1 N N N
4. -tune 2 N N N
5. -oclFFT_plan N N N

After which test I should continue with 2 separate test runs (1 with -use_sleep 1 without -use_sleep)?
After the 2., at point 3 continue with 2 separate test runs?

With other words ...
1st I test 1. up to 5. without -use_sleep.
Where I should start the 2nd run with usage of -use_sleep? At point 3?

Thanks.
ID: 1687538 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1687644 - Posted: 4 Jun 2015, 8:23:01 UTC - in response to Message 1687538.  

I unfortunately don't understand ...

I start to find fastest settings, with:
1. -unroll N
2. -ffa_block N & -ffa_block_fetch N
3. -tune 1 N N N
4. -tune 2 N N N
5. -oclFFT_plan N N N

After which test I should continue with 2 separate test runs (1 with -use_sleep 1 without -use_sleep)?
After the 2., at point 3 continue with 2 separate test runs?

With other words ...
1st I test 1. up to 5. without -use_sleep.
Where I should start the 2nd run with usage of -use_sleep? At point 3?

Thanks.


No point 2.

You get bigger kernels increasing ffa_block and ffa_block_fetch.

The 730 only has 2 compute units which means -unroll between 4 - 6 at max.

If you read the read me you had found my tuning tips.

-use_sleep -unroll 4 -ffa_block 2048 -ffa_block_fetch 1024 plus additional tune params.


With each crime and every kindness we birth our future.
ID: 1687644 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1688048 - Posted: 5 Jun 2015, 10:24:23 UTC

does your GT730 also does only cuda50 WUs?


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1688048 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1696032 - Posted: 26 Jun 2015, 20:15:02 UTC

also, don't put v353.30 drivers on GT730...mine releases an error from time to time, when I wake up the GPU from all day crunching (arrive from work or wake up 4 example)... ;)


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1696032 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1696034 - Posted: 26 Jun 2015, 20:19:04 UTC - in response to Message 1696032.  

What kind of errors ware you seeing with driver 353.30? Can you go into more detail as to what it does and what kind of errors?
ID: 1696034 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1696059 - Posted: 26 Jun 2015, 22:35:19 UTC - in response to Message 1696034.  

What kind of errors ware you seeing with driver 353.30? Can you go into more detail as to what it does and what kind of errors?

there was sthg about memory error...written in BOINC tasks!
the task was restarted & finished OK.

but nothing is in Log?! strange!


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1696059 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1696065 - Posted: 26 Jun 2015, 23:00:36 UTC - in response to Message 1696059.  

KLiK, have you told your computer not to put the hard drive to sleep?

Also, next time you see the error, copy it so we can know more specifically what it is referring to
ID: 1696065 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1696074 - Posted: 26 Jun 2015, 23:27:48 UTC - in response to Message 1696065.  

KLiK, have you told your computer not to put the hard drive to sleep?

Also, next time you see the error, copy it so we can know more specifically what it is referring to

HDD never goes to sleep on any of my computers...that is just another way to kill them PROMPTLY!

I'll make a PrtScr...no worry!


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1696074 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1696205 - Posted: 27 Jun 2015, 11:33:30 UTC
Last modified: 27 Jun 2015, 11:41:50 UTC

Because of the last messages/questions...

I made a quick test with Fred's SetiPerformance (I added manually the cuda42 app) (count 1.0):
x41zc_cuda50: 755 Seconds (12 Minutes, 35 Seconds)
x41zc_cuda42: 784 Seconds (13 Minutes, 4 Seconds)

So it looks like the cuda50 app is faster (at least on my system).

In GPU-Z it shows GPU Load just ~80% with count 1.0.
So I let run count 0.5 (2 SETI (MB) project tasks simultaneously) on my NV GT730 (PCIe 2.0 x1 plug & slot).
(After a quick test with Fred's SetiPerformance it looks like 2 SETI (MB) project tasks simultaneously on my GT730 give highest overall (CPU & NV GPU) RAC (on my system).
3 SETI (MB) project tasks simultaneously are also possible, but just +1% RAC for the NV GPU. In the same time -1.9% CPU RAC, because more CPU time usage of the 3rd GPU app.
)

@ KLiK
I see you let run just the stock GPU apps.
If you don't want to install the opti apps of the Lunatics crew you should let run at least also 2 SETI WUs/NV GT730.
If I edit my Milkyway app_config.xml file (which work there very well) the SETI app_config.xml file entries could/should work with:
<app_config>
  <app>
      <name>setiathome_v7</name>
    <gpu_versions>
      <gpu_usage>0.5</gpu_usage>
      <cpu_usage>0.1</cpu_usage>
    </gpu_versions>
  </app>
  <app>
      <name>astropulse_v7</name>
    <gpu_versions>
      <gpu_usage>1.0</gpu_usage>
      <cpu_usage>0.006</cpu_usage>
    </gpu_versions>
  </app>
</app_config>

(cpu_usage from my PC, maybe you take 0.04 like in Lunatics' app_info.xml files)

This means:
2 SETI WUs/GPU
1 AP WU/GPU

(Because of CreditNew I let run 2 SETI and 1 AP WUs/GPU, because if 1 SETI and 1 AP WU simultaneously, the AP WU is ~50% faster. This would screw up the 'average processing rate x.x GFLOPS'.

I made bench test runs on my NV GT730 and the fastest settings are:
2 AP WUs/GPU: -v 0 -unroll 9 -ffa_block 896 -ffa_block_fetch 448 -tune 1 16 4 1 -oclFFT_plan 256 32 64 -use_sleep -cpu_lock -instances_per_device 2 -hp
1 AP WU/GPU: -v 0 -unroll 11 -ffa_block 1792 -ffa_block_fetch 448 -tune 1 16 4 1 -tune 2 16 1 1 -oclFFT_plan 256 16 256 -use_sleep -cpu_lock -instances_per_device 1 -hp

I have opti apps installed (app_info.xml file usage), so this cmdline entries are in the ap_cmdline_win_x86_SSE2_OpenCL_NV.txt file at my PC.
I guess if stock you should find this file also in the setiathome.berkeley.edu project folder. Open it with Notebad and insert the entries and save.

In the setiathome.berkeley.edu project folder is also a mbcuda.cfg file.
Open it with Notepad and delete the ; at the beginning in the line of:
processpriority = abovenormal
...and save.
Then the SETI CUDA app get better CPU support (maybe faser calculation on GPU).

I can't install the NV GPU driver with help of the NV driver .exe.
If I execute this install tool, it don't find the GT730 (it say: 'no NVidia VGA card found').
I have the screen connected to the Intel iGPU output.
The GT730 is the secondary VGA card and nothing connected, also no VGA dummy plug.
I guess this is the problem that the NV driver .exe don't find a NV VGA card.
I guess Windows 8.1 installed a 'Windows certified' driver (347.52) or something.
I have no idea where this driver came from.
I have not downloaded it, and Windows had just ~10 seconds to search and download (100+MB not possible with my DSL2000 in this short time).
So I have no idea which kind of driver this is (Windows/Software no MB size shown) which is installed now (maybe a 'shortened' driver?).
So I can install a new NV driver just if Windows say 'a new driver available' (I don't know if this could happen, if Windows have this function) or I go to the device-manager and let search a new driver there over properties.
ID: 1696205 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1696715 - Posted: 29 Jun 2015, 12:25:23 UTC

How to add manualy CUDA42 or CUDA50?

Nope, I've got about 80-90% of Quadro2000 or GT730 PCIe x8...so all graphics work just fine with that extra 4 me...& still a GPU does the job!
If it's less than 80% on SETi@home single WU - then I'll work with app_config.xml to make a dual WU cruncher out of my GT 730!

Right now I'm findinga a way to upgrade another computer, so it can crunch on PCIe x16...that card would need some tweeking!
Preferably, it would be GTX 750 Ti from EGA...
;)


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1696715 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1696783 - Posted: 29 Jun 2015, 17:19:52 UTC - in response to Message 1696715.  

Have you ever used Lunatics? You can choose which version of Cuda to use to crunch your work. If you don't want to install all of the apps, you could go to Mike's World and download only that 1 app for cuda 50.

I would link you but I have the old link. Someone else can provide you with Mikes. If you want Lunatics I believe there is a link at that top of Number Crunching. I believe the last version is 43a
ID: 1696783 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1696787 - Posted: 29 Jun 2015, 17:53:53 UTC
Last modified: 29 Jun 2015, 17:54:13 UTC

ID: 1696787 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1697368 - Posted: 1 Jul 2015, 16:07:33 UTC - in response to Message 1696065.  

KLiK, have you told your computer not to put the hard drive to sleep?

Also, next time you see the error, copy it so we can know more specifically what it is referring to

here is a link to the web pic:
https://www.dropbox.com/s/dpen9mun7rxe4z3/screenshot%202015-06-30.jpg?dl=0


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1697368 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1697639 - Posted: 2 Jul 2015, 6:55:24 UTC - in response to Message 1696205.  
Last modified: 2 Jul 2015, 6:55:41 UTC


@ KLiK
I see you let run just the stock GPU apps.
If you don't want to install the opti apps of the Lunatics crew you should let run at least also 2 SETI WUs/NV GT730.
If I edit my Milkyway app_config.xml file (which work there very well) the SETI app_config.xml file entries could/should work with:
<app_config>
  <app>
      <name>setiathome_v7</name>
    <gpu_versions>
      <gpu_usage>0.5</gpu_usage>
      <cpu_usage>0.1</cpu_usage>
    </gpu_versions>
  </app>
  <app>
      <name>astropulse_v7</name>
    <gpu_versions>
      <gpu_usage>1.0</gpu_usage>
      <cpu_usage>0.006</cpu_usage>
    </gpu_versions>
  </app>
</app_config>

(cpu_usage from my PC, maybe you take 0.04 like in Lunatics' app_info.xml files)

This means:
2 SETI WUs/GPU
1 AP WU/GPU

just put this in
app_config.xml
& crunch now
2x intel HD2500
with
2x nVidia Quadro 2000
on an work computer...

same will go on a computer & home with GT730!

but, can someone tell me how to include CUDA42 & CUDA50 & CUDA32 for my cards...without using 3rd party apps?


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1697639 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1697679 - Posted: 2 Jul 2015, 9:52:52 UTC - in response to Message 1697639.  
Last modified: 2 Jul 2015, 10:08:52 UTC

I myself wrote:
(...)
I made a quick test with Fred's SetiPerformance (I added manually the cuda42 app) (count 1.0):
x41zc_cuda50: 755 Seconds (12 Minutes, 35 Seconds)
x41zc_cuda42: 784 Seconds (13 Minutes, 4 Seconds)

So it looks like the cuda50 app is faster (at least on my system).
(...)


Like I wrote, I added the cuda42 app to Fred's tool »SetiPerformance«*.
Just for to look if cuda42 or cuda50 is faster on my GT730.
[* with this tool you can test which value of simultaneously WUs give highest performance (RAC) on your GPU.]

cuda50 is fastest on my GT730 (PC system).

If you add an app_config.xml file to the setiathome.berkeley.edu folder with above mentioned (0.04 for <cpu_usage>) BOINC will let run the cuda app which is fastest on your GPU (PC system). The app with highest 'Host/Application details/Average processing rate' GFLOPS value.

But, if you let run WUs also on your Intel HD Graphics (4 compute units) and Intel HD2500 (6 compute units) you should let run (test first) just 1 WU/GPU there.
Look in GPU-Z 'GPU Load' how much the values are.
On my Intel HD Graphics I let run just 1 SETI or AP WU (95-100% GPU Load).

<app_config>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>cuda50</plan_class>
       <avg_ncpus>0.04</avg_ncpus>
       <ngpus>0.5</ngpus>
   </app_version>
</app_config>


This app_config.xml file make that 2 SETI WUs/NVidia GPU run simultaneously.
AFAIK, then 1 AP WU/NV GPU and 1 SETI or 1 AP on Intel iGPU automatically.
ID: 1697679 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1697683 - Posted: 2 Jul 2015, 10:08:04 UTC - in response to Message 1697679.  
Last modified: 2 Jul 2015, 10:17:24 UTC

If you let run all stock, maybe you should add all (cuda22, cuda23, cuda32, cuda42 and cuda50) to your app_config.xml file.
Because if SETI send you (to above mentioned entries) 'cuda22 up to cuda42 WUs', they run much faster (because just 1 WU/GPU) and then the 'Average processing rate' is much higher.

For to adjust all apps' 'Average processing rates' your app_config.xml file could/should look like (AFAIK):

<app_config>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>cuda50</plan_class>
       <avg_ncpus>0.04</avg_ncpus>
       <ngpus>0.5</ngpus>
   </app_version>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>cuda42</plan_class>
       <avg_ncpus>0.04</avg_ncpus>
       <ngpus>0.5</ngpus>
   </app_version>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>cuda32</plan_class>
       <avg_ncpus>0.04</avg_ncpus>
       <ngpus>0.5</ngpus>
   </app_version>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>cuda23</plan_class>
       <avg_ncpus>0.04</avg_ncpus>
       <ngpus>0.5</ngpus>
   </app_version>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>cuda22</plan_class>
       <avg_ncpus>0.04</avg_ncpus>
       <ngpus>0.5</ngpus>
   </app_version>
</app_config>
ID: 1697683 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1697684 - Posted: 2 Jul 2015, 10:15:51 UTC - in response to Message 1697683.  
Last modified: 2 Jul 2015, 10:16:36 UTC

Also, you could add:
-use_sleep -hp
...to your ap_cmdline_win_x86_SSE2_OpenCL_NV.txt files. (Or named similar, but with ap_cmdline and NV).
Then the NV AP GPU app don't use the whole CPU-thread and get higher priority.
ID: 1697684 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : NV GPU - AP bench test run (e.g. @ GT730)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.