Can I further optimize multiple GPU calculations?

Message boards : Number crunching : Can I further optimize multiple GPU calculations?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1774494 - Posted: 27 Mar 2016, 18:13:11 UTC - in response to Message 1774454.  

Thanks Zalster. Yeah, all these systems that I am building right now are going to have only 1 GPU in them, I just need to tweak the file to tell it to run 2 tasks at one time. I changed the <count>1</count> to <count>2</count> but it didn't change anything, so I think I am playing with the wrong file. I'll have to do more digging here find the right things to change.

ID: 1774494 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1774501 - Posted: 27 Mar 2016, 18:23:39 UTC - in response to Message 1774494.  
Last modified: 27 Mar 2016, 18:24:02 UTC

Thanks Zalster. Yeah, all these systems that I am building right now are going to have only 1 GPU in them, I just need to tweak the file to tell it to run 2 tasks at one time. I changed the <count>1</count> to <count>2</count> but it didn't change anything, so I think I am playing with the wrong file. I'll have to do more digging here find the right things to change.



edit as above
ID: 1774501 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1774554 - Posted: 27 Mar 2016, 21:24:55 UTC

Got it, thanks! Now just need to change it to leave one of the 4 cores free, and I'm golden.

ID: 1774554 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1774596 - Posted: 28 Mar 2016, 0:18:12 UTC

I have been having problems getting over 55% GPU loading on my latest system build. My other 2 systems have good loading but each only has 1 GPU Card. The latest system has 3 GPUs. System details:
* i7-3930K + Fury X has average of 82% GPU loading, CPD ~36K
* A10-7870K + R9 290X has average of 74% GPU loading, CPD ~20K
* FX-8370 + 2 Fury Nanos has average of 55% GPU loading, CPD ~29K

For the FX-8370 system, I actually started with an additional R7870 GPU, but I removed it to work out the loading issues. After removing it, the credit per day of this system did not go down. My only entry in the app_config files is as follows:
<app_config>
<app>
<name>setiathome_v8</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
</app_config>
I have tried cpu_usage values from 0.001 to 2 and GPU loading goes from about 50% to a max of about 55% with cpu_usage of 1 or 2. I have also tried gpu_usage values of 0.5, but I get sporadic compute errors.

I have also tried moving a Nano card to the i7 system paired with the Fury X, and found the same issue where GPU usage of both cards went down to ~55%.

Currently the FX-8370 with 2-3 GPU has lower credit per day than the i7 with 1 GPU. All systems running Crimson 15.12 and Windows 10. I post videos with system details and troubleshooting here: https://www.youtube.com/c/ricknetwork
ID: 1774596 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1774698 - Posted: 28 Mar 2016, 9:22:49 UTC
Last modified: 28 Mar 2016, 9:27:44 UTC

Make sure CPU utilisation doesn`t exceed 50% - 60%.

Search for a file called ***cmdline***.txt in your seti data folder and add the following line and save it.

-sbs 256 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

On next task optimized values will be used.


With each crime and every kindness we birth our future.
ID: 1774698 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1774704 - Posted: 28 Mar 2016, 10:00:03 UTC - in response to Message 1774596.  

I would also try:

<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.5</cpu_usage>

That will run 2 tasks per card.
ID: 1774704 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1774708 - Posted: 28 Mar 2016, 10:58:57 UTC - in response to Message 1774704.  

I would also try:

<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.5</cpu_usage>

That will run 2 tasks per card.


Nope, doesn`t work on Fiji.


With each crime and every kindness we birth our future.
ID: 1774708 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1774710 - Posted: 28 Mar 2016, 11:06:19 UTC - in response to Message 1774708.  
Last modified: 28 Mar 2016, 11:06:51 UTC

I have 2 WU per GPU working fine on my i7-3930K single Fury X system, but get computation errors on my FX8730 system with 2 Fury Nanos. Could it be related to i7 having 2 threads per core?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1774710 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1774711 - Posted: 28 Mar 2016, 11:11:14 UTC - in response to Message 1774710.  

I have 2 WU per GPU working fine on my i7-3930K single Fury X system, but get computation errors on my FX8730 system with 2 Fury Nanos. Could it be related to i7 having 2 threads per core?


I can see 19 bad results.
So its not running fine also.

Its broken buffers which seems to be driver related.
With 300 valid tasks 19 is above the 5% margin.


With each crime and every kindness we birth our future.
ID: 1774711 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1774712 - Posted: 28 Mar 2016, 11:21:07 UTC - in response to Message 1774711.  

I have 2 WU per GPU working fine on my i7-3930K single Fury X system, but get computation errors on my FX8730 system with 2 Fury Nanos. Could it be related to i7 having 2 threads per core?


I can see 19 bad results.
So its not running fine also.

Its broken buffers which seems to be driver related.
With 300 valid tasks 19 is above the 5% margin.


Those might have occurred when I was experimenting with adding a Nano to the system. I am out of GPU tasks, seems like something is wrong at SETI, since I have not received new tasks for most of the day.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1774712 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1774714 - Posted: 28 Mar 2016, 11:27:20 UTC - in response to Message 1774712.  

I have 2 WU per GPU working fine on my i7-3930K single Fury X system, but get computation errors on my FX8730 system with 2 Fury Nanos. Could it be related to i7 having 2 threads per core?


I can see 19 bad results.
So its not running fine also.

Its broken buffers which seems to be driver related.
With 300 valid tasks 19 is above the 5% margin.


Those might have occurred when I was experimenting with adding a Nano to the system. I am out of GPU tasks, seems like something is wrong at SETI, since I have not received new tasks for most of the day.


That`s interesting.
Which brand is the FuryX ?


With each crime and every kindness we birth our future.
ID: 1774714 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1774715 - Posted: 28 Mar 2016, 11:29:56 UTC - in response to Message 1774714.  

I have 2 WU per GPU working fine on my i7-3930K single Fury X system, but get computation errors on my FX8730 system with 2 Fury Nanos. Could it be related to i7 having 2 threads per core?


I can see 19 bad results.
So its not running fine also.

Its broken buffers which seems to be driver related.
With 300 valid tasks 19 is above the 5% margin.


Those might have occurred when I was experimenting with adding a Nano to the system. I am out of GPU tasks, seems like something is wrong at SETI, since I have not received new tasks for most of the day.


That`s interesting.
Which brand is the FuryX ?


Asus, with an EK Waterblock.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1774715 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1774717 - Posted: 28 Mar 2016, 11:48:14 UTC - in response to Message 1774698.  
Last modified: 28 Mar 2016, 11:48:39 UTC

Make sure CPU utilisation doesn`t exceed 50% - 60%.

Search for a file called ***cmdline***.txt in your seti data folder and add the following line and save it.

-sbs 256 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

On next task optimized values will be used.


I have tried what you suggested (increased CPUs to 2 for low loading and made cmdline file edit), but no change in the GPU utilization. It is still around 55%. I also tried POEM and found it can load both at about 77%.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1774717 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1774733 - Posted: 28 Mar 2016, 13:47:30 UTC
Last modified: 28 Mar 2016, 14:42:03 UTC

I have tried what you suggested (increased CPUs to 2 for low loading and made cmdline file edit), but no change in the GPU utilization. It is still around 55%. I also tried POEM and found it can load both at about 77%.

I don'd have experience with AMD but...
Now I leaned that GPU and CPU applications use what ever they want under threads allocated by Operating sysytem, I tried to maximize GPU utilization. This time I used 3 WU per GPU and looked at the effect of freeing more CPU cores. I took data for past 5days until server went down.
With my systems:
i7-3930K with 3 x GTX 670, <GPU_usage>.33, <cpu_usage>.9
i7-950 with 1 x GTX 680, <GPU_usage>.33, <cpu_usage>1
Q6600 with 1 x GTX 950, <GPU_usage>.33, <cpu_usage>1
seem to work fastest.
With these settings, they all achieve 92%(with i7-950)-97%(with i7-3930K) GPU usage with a very little fluctuation most of the time and process GPU WU more than enough to compensate for the reduced CPU WU. GPU stays between 62C(GTX 670) and 67C(GTX 680) with no error. All 3 systems now process more total WU per day than ever, even though CPU usage stays between 60 to 70% (Enough processing power left to process what ever Windows wants to do and stay happy).
Too bad I can't download more WU now...I hope the system recovers soon.
ID: 1774733 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1775035 - Posted: 30 Mar 2016, 4:59:29 UTC - in response to Message 1774733.  

I don'd have experience with AMD but...
Now I leaned that GPU and CPU applications use what ever they want under threads allocated by Operating sysytem, I tried to maximize GPU utilization. This time I used 3 WU per GPU and looked at the effect of freeing more CPU cores. I took data for past 5days until server went down.
With my systems:
i7-3930K with 3 x GTX 670, <GPU_usage>.33, <cpu_usage>.9
i7-950 with 1 x GTX 680, <GPU_usage>.33, <cpu_usage>1
Q6600 with 1 x GTX 950, <GPU_usage>.33, <cpu_usage>1
seem to work fastest.
With these settings, they all achieve 92%(with i7-950)-97%(with i7-3930K) GPU usage with a very little fluctuation most of the time and process GPU WU more than enough to compensate for the reduced CPU WU. GPU stays between 62C(GTX 670) and 67C(GTX 680) with no error. All 3 systems now process more total WU per day than ever, even though CPU usage stays between 60 to 70% (Enough processing power left to process what ever Windows wants to do and stay happy).
Too bad I can't download more WU now...I hope the system recovers soon.


Looks like you have a really solid setup. I am still having trouble getting multiple WU per GPU on the Fury...
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1775035 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1775047 - Posted: 30 Mar 2016, 7:38:33 UTC - in response to Message 1773053.  

so CPU usage may go up for apps even though they are doing the same thing as before when the system was not as heavily loaded.

For me, it is particularly noticeable when running 2 or 3 APs on my GPUs.


There is another explanation that especially valid for APs.
With bigger OS load CPU cache pollution increases. So, app indeed starts to take more CPU time just because CPU stalled for data awaiting much more cycles.
AP is cache-hungry so cache pollution does especially strong damage to its performance.
ID: 1775047 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1783472 - Posted: 29 Apr 2016, 13:50:20 UTC - in response to Message 1775047.  

Raistmer, guys, after a few years of having it sit on the shelf while waiting for the prices of the processors I wanted to become more 'reasonable', I finally got the $ together and bought the ones I had planned to use since the beginning. It is an EVGA SR-2, with a pair of X5690 CPU's, (currently with, for testing purposes) 3 GTX950's and a GTX750 with 48 gb of RAM running on a pair of Intel 480Gb SSD's in a RAID 1 config. The system is ID: 7990258 - CrunchMonster.

I spent much of yesterday building, loading, configuring and updating it, and finally early this morning turned on the spigot and let the WU's flow! Just before that I spent a bit of time reading the current posts on how to configure this to provide the most efficient crunching and utilize the hardware optimally, including Can I further optimize multiple GPU calculations? and my post Getting back into it, advice appreciated, as well as this one.

I have set it to <count>0.5, and set the <avg_ncpus>0.35</avg_ncpus> and <max_ncpus>0.35</max_ncpus>, and was reading about polling as well, though haven't implimented it yet, don't want to change too much at once, before consulting the Oracle (the brain trust here on the forum!).

As currently configured, it is running 24 CPU tasks, which I am sad to say seem to be averaging over 3 hours, which if I remember correctly is Much, Much longer than my CAD system I just built (ID: 7949285 Zoom-PC). Taking a quick look at that one, it appears that GPU tasks have a Run time (sec) averaging 750-1500 and the CPU time (sec) of 175-320 and the CPU tasks Run time is 5400-5700 and CPU time 5100-5500 (with a few in the 7500 range).

For the new system, (so far, as only 21 have validated as I write this) GPU tasks average Run time 1500-2500 & CPU time of 460-680, and CPU tasks Run time 11,000-12,500 CPU time of 9,700-10300! I know there is a few gen difference between the CPU's, and also between the 980's/970 in the former and the 950's/750 in the latter, but it seems that there might be something more at play here? Could someone with better insight than I take a look at them and see if there is anything that might stand out as a configuration Gotcha that I can adjust to help optimize this new one for a little better performance? The video cards are slightly O/C'd (around 10%), otherwise everything else is currently running stock.

Thanks again, guys!

ID: 1783472 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1783474 - Posted: 29 Apr 2016, 14:03:26 UTC - in response to Message 1783472.  
Last modified: 29 Apr 2016, 14:03:59 UTC

ID: 1783474 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1783485 - Posted: 29 Apr 2016, 14:34:31 UTC - in response to Message 1783474.  

Read this: http://setiathome.berkeley.edu/forum_thread.php?id=76063&postid=1783004

Brent, I click on that link and it comes back with "This forum is not visible to you." I've never seen that before, is it in a private section of the forum possibly?

ID: 1783485 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1783495 - Posted: 29 Apr 2016, 14:53:45 UTC - in response to Message 1783485.  

Opps that is GPU User Group

Here:
Mark, I don't see you making use of the mbccuda.cfg file. Someone correctly if I'm wrong, but I think you can boost performance there, both for you 780 and 980, they are running defaults.


For my 750 Ti I have:
SETI@home using CUDA accelerated device GeForce GTX 750 Ti
mbcuda.cfg, processpriority key detected
mbcuda.cfg, Global pfblockspersm key being used for this device
pulsefind: blocks per SM 16
mbcuda.cfg, Global pfperiodsperlaunch key being used for this device
pulsefind: periods per launch 400
Priority of process set to ABOVE_NORMAL successfully


You have:
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully


My times seem to be similar to yours, but I think you should be outperforming me noticeably.

ID: 1783495 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Can I further optimize multiple GPU calculations?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.