SETI applications for NVIDIA GPU improvement - how you can help

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803690 - Posted: 20 Jul 2016, 14:46:47 UTC - in response to Message 1803627.  
Last modified: 20 Jul 2016, 14:47:11 UTC

Came across this while checking r3486 against r3430

It looks like OpenCl Guppi slow down just as cuda work units when paired with an AP on the same device. In this case there were 2 APs and 1 Guppi on the same device. Thought you might want this information

AP
http://setiathome.berkeley.edu/result.php?resultid=5051041254
Running on device number: 0

Received 20 Jul 2016, 13:45:57 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 7565600
Run time 30 min
CPU time 29 min 50 sec



AP
http://setiathome.berkeley.edu/result.php?resultid=5051041221
Running on device number: 0


Received 20 Jul 2016, 13:45:57 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 7565600
Run time 30 min 8 sec
CPU time 29 min 56 sec


Guppi
http://setiathome.berkeley.edu/result.php?resultid=5050524994
Running on device number: 0

Received 20 Jul 2016, 13:51:07 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 7565600
Run time 28 min 41 sec
CPU time 24 min 22 sec

As a side r3430 is still about 60-90sec faster than r3486 when doing 3 at a time
ID: 1803690 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803691 - Posted: 20 Jul 2016, 14:48:51 UTC - in response to Message 1803690.  

As a side r3430 is still about 60-90sec faster than r3486 when doing 3 at a time

What tuning lines were used in this comparison?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803691 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803693 - Posted: 20 Jul 2016, 14:55:44 UTC - in response to Message 1803691.  

Yes, was using my commandlines on both machines for this comparison
ID: 1803693 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1803729 - Posted: 20 Jul 2016, 16:44:48 UTC - in response to Message 1803693.  

Yes, was using my commandlines on both machines for this comparison

What command lines, please list them.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1803729 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1804802 - Posted: 25 Jul 2016, 11:58:15 UTC - in response to Message 1803533.  

Please test in single task per GPU mode with and w/o -use_sleep for comparison:
https://cloud.mail.ru/public/6wp7/cgfuAXmnc


Is this a new version or still part of the older r3486Final?


revision the same but binary is new.



. . I am working up the courage to try and implement this version. Am I right to presume that it runs the standard MB8.12 WUs?
ID: 1804802 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1805166 - Posted: 27 Jul 2016, 18:37:35 UTC - in response to Message 1803476.  

Another test to do by volunteers:

On beta (or in anonymous platform mode on main) take 8.16 OpenCL app or later and add -tt F parameter to command line.
increase F value (default is 15 that means 15 ms target time for partial PulseFind kernel) and watch for host usability (that is, GUI lags, missing letters at typing and so on). At what F value lags appear?
From performance point of view it's better to have longer kernels but this can result in GUI lags. This testing needed to establish best possible default value for unattended run.

EDIT: describe your host config (preferably with link to host on beta) along with report, please.


I would like to remind that w/o feedback about usability there will be no any corrections done and chance to get app working nice out of the box for your particular host will be missed.
Please check usability of 8.16+ builds for your particular host and report back.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1805166 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4648
Credit: 85,281,665
RAC: 126
Finland
Message 1805206 - Posted: 27 Jul 2016, 21:46:08 UTC - in response to Message 1805166.  

Well my experience without the -use_sleep parameter was a driver restart within 5 minutes of starting the application. So I was forced to go back to -use_sleep. The application does use a lot less CPU compared to previous versions. I am currently running with -period_iterations_num 5 with no lag on this host: http://setiathome.berkeley.edu/results.php?hostid=7043787, decreasing it to 3 and lags will appear. The host has two GPUs: a GTX650Ti and a GTX970 both running one seti task at a time with three CPU cores reserved for GPU feeding. The host's GPUs do also Einstein WUs one or two at a time depending on how BOINC is scheduling them (seti share is 45 and einstein is 25).
ID: 1805206 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1805214 - Posted: 27 Jul 2016, 22:31:07 UTC - in response to Message 1805206.  

And what about running with defaults?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1805214 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4648
Credit: 85,281,665
RAC: 126
Finland
Message 1805323 - Posted: 28 Jul 2016, 12:37:43 UTC - in response to Message 1805214.  
Last modified: 28 Jul 2016, 12:41:18 UTC

And what about running with defaults?

Today I have started running with default settings, so no commandline and lately also without app_config. I am positively surprised how well the host is behaving. The load on GPUs is of course fluctuating as there are no free CPU cores to feed them constantly but I have no big lags or driver restarts. On CPU there is running 6 setiathome_v8 tasks and 2 CPDN wah2_eu25 tasks so CPU load is constantly at 100%.

I can also use the computer for email, surfing the net and I could view the new Seti video from Youtube without a problem (Firefox HW-acceleration is turned off).

I am using the r3486 release with date 19.7.2016 on the exe.

Here are a few WUs I have crunched so far with default settings:
http://setiathome.berkeley.edu/result.php?resultid=5063178279 a Guppi VLAR on GTX970.
http://setiathome.berkeley.edu/result.php?resultid=5063162557 normal Arecibo WU on GTX970.
http://setiathome.berkeley.edu/result.php?resultid=5063120898 normal Arecibo WU on GTX650Ti.
http://setiathome.berkeley.edu/result.php?resultid=5063178278 a Guppi VLAR on GTX650Ti.

So good job Raistmer on this application.

(edit)Next I try and free one CPU core and see what that does.(/edit)
ID: 1805323 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1805431 - Posted: 28 Jul 2016, 21:53:48 UTC - in response to Message 1805323.  

Thanks for report.
Good that no usability problems at defaults.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1805431 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4648
Credit: 85,281,665
RAC: 126
Finland
Message 1805554 - Posted: 29 Jul 2016, 9:26:10 UTC

The test with freeing a CPU core for feeding the GPUs resulted the already known conclusion: one free CPU core is not enough for two GPUs to maximize the GPU load. For best result you should free one CPU core for each GPU (while running one WU/GPU at a time). This is for my host, but YMMV. Now I'm back to using my old app_config and commnadline parameters.
ID: 1805554 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1805560 - Posted: 29 Jul 2016, 10:08:24 UTC - in response to Message 1805554.  

The test with freeing a CPU core for feeding the GPUs resulted the already known conclusion: one free CPU core is not enough for two GPUs to maximize the GPU load. For best result you should free one CPU core for each GPU (while running one WU/GPU at a time). This is for my host, but YMMV. Now I'm back to using my old app_config and commnadline parameters.


I would recommend to try also -hp -cpu_lock addition to your tuning line to improve GPU load with busy CPU.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1805560 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4648
Credit: 85,281,665
RAC: 126
Finland
Message 1805568 - Posted: 29 Jul 2016, 11:24:22 UTC - in response to Message 1805560.  
Last modified: 29 Jul 2016, 11:24:52 UTC

The test with freeing a CPU core for feeding the GPUs resulted the already known conclusion: one free CPU core is not enough for two GPUs to maximize the GPU load. For best result you should free one CPU core for each GPU (while running one WU/GPU at a time). This is for my host, but YMMV. Now I'm back to using my old app_config and commnadline parameters.


I would recommend to try also -hp -cpu_lock addition to your tuning line to improve GPU load with busy CPU.


The -hp was there already but I'm adding the -cpu_lock now. So new commandline is:
-sbs 1024 -use_sleep -hp -period_iterations_num 5 -instances_per_device 1 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 -tt 75 -cpu_lock


The -sbs 1024 applies to GTX970 only as the application seems to limit that value to 25% of GPU's memory. So GTX 650Ti only uses -sbs 256.
ID: 1805568 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1805601 - Posted: 29 Jul 2016, 15:33:40 UTC

New set of builds (r3500) available here: https://cloud.mail.ru/public/LJ8s/c3WyRR8ip
Please test
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1805601 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4648
Credit: 85,281,665
RAC: 126
Finland
Message 1805930 - Posted: 30 Jul 2016, 21:59:39 UTC - in response to Message 1805601.  

New set of builds (r3500) available here: https://cloud.mail.ru/public/LJ8s/c3WyRR8ip
Please test


I don't see much difference to the previous version. Slightly more lags but not too bad. When running with defaults same situation. When I freed two CPU cores and tried with a minimal commandline:
-sbs 1024 -hp -instances_per_device 1 -cpu_lock

I got a driver restart. Here's the WU where it happened: http://setiathome.berkeley.edu/result.php?resultid=5065787291
ID: 1805930 · Report as offensive     Reply Quote
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1805990 - Posted: 31 Jul 2016, 4:35:52 UTC

Greetings Raistmer

Have got brought back online two Crunchers

Cruncher 1 - LGA775 Q6600 Win 7 x64 Boinc 7.6.22 with HD7870 and GTX 760

Cruncher 2 - LGA775 Q9300 Win 7 x64 Boinc 7.6.22 with GTX 770

Running build r3500 with defaults (no cmd line switches)

Will run this for awhile, noticing no screen lag on both crunchers.

By the way only doing GPU crunching on these machines.

When I get time, I will try and post work unit details.

Regards
ID: 1805990 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1806195 - Posted: 31 Jul 2016, 23:16:11 UTC

r3500

A. CPU consumption

1. No Sleep command

WU true angle range is : 1.477986
http://setiathome.berkeley.edu/result.php?resultid=5070945053
Run time 1 min 6 sec
CPU time 10 sec

WU true angle range is : 0.415260
http://setiathome.berkeley.edu/result.php?resultid=5070939131
Run time 4 min 28 sec
CPU time 4 min 24 sec

WU true angle range is : 0.024646
http://setiathome.berkeley.edu/result.php?resultid=5070945029
Run time 8 min 3 sec
CPU time 7 min 51 sec

WU true angle range is : 0.009960 (GUPPI)
http://setiathome.berkeley.edu/result.php?resultid=5070939038
Run time 7 min 40 sec
CPU time 7 min 36 sec


2. No sleep command but -tt 1500

WU true angle range is : 1.477986
http://setiathome.berkeley.edu/result.php?resultid=5070945080
Run time 2 min 30 sec
CPU time 23 sec

WU true angle range is : 0.415318
http://setiathome.berkeley.edu/result.php?resultid=5070945010
Run time 4 min 26 sec
CPU time 4 min 4 sec

WU true angle range is : 0.011214 (GUPPI)
http://setiathome.berkeley.edu/result.php?resultid=5070952849
Run time 7 min 41 sec
CPU time 7 min 38 sec


3. -use_sleep

WU true angle range is : 1.477986
http://setiathome.berkeley.edu/result.php?resultid=5070945148
Run time 2 min 34 sec
CPU time 9 sec

WU true angle range is : 0.415260
http://setiathome.berkeley.edu/result.php?resultid=5070939117
Run time 6 min 19 sec
CPU time 1 min 1 sec


WU true angle range is : 0.330733
http://setiathome.berkeley.edu/result.php?resultid=5070938858
Run time 7 min 31 sec
CPU time 1 min 9 sec


WU true angle range is : 0.011214 (GUPPI)
http://setiathome.berkeley.edu/result.php?resultid=5070939149
Run time 11 min 35 sec
CPU time 2 min 19 sec


4. -use_sleep -high_prec_timer

WU true angle range is : 1.484248
http://setiathome.berkeley.edu/result.php?resultid=5070938888
Run time 2 min 33 sec
CPU time 10 sec

WU true angle range is : 1.484248
http://setiathome.berkeley.edu/result.php?resultid=5070938911
Run time 2 min 32 sec
CPU time 25 sec


WU true angle range is : 0.415260
http://setiathome.berkeley.edu/result.php?resultid=5070939125
Run time 4 min 35 sec
CPU time 1 min 49 sec

WU true angle range is : 0.012603 (GUPPI)
http://setiathome.berkeley.edu/result.php?resultid=5070939088
Run time 7 min 53 sec
CPU time 3 min 8 sec

WU true angle range is : 0.012603 (GUPPI)
http://setiathome.berkeley.edu/result.php?resultid=5070939088
Run time 7 min 53 sec
CPU time 3 min 8 sec


5. -use_sleep -tt 1500


WU true angle range is : 8.804756
http://setiathome.berkeley.edu/result.php?resultid=5070939147
Run time 2 min 26 sec
CPU time 7 sec


WU true angle range is : 1.484248
http://setiathome.berkeley.edu/result.php?resultid=5070938887
Run time 2 min 35 sec
CPU time 9 sec



WU true angle range is : 0.330733
http://setiathome.berkeley.edu/result.php?resultid=5070939071
Run time 5 min 4 sec
CPU time 4 min 17 sec

WU true angle range is : 0.330733
http://setiathome.berkeley.edu/result.php?resultid=5070938786
Run time 7 min 31 sec
CPU time 1 min 7 sec


WU true angle range is : 0.032335 (GUPPI)
http://setiathome.berkeley.edu/result.php?resultid=5070945042
Run time 10 min 59 sec
CPU time 2 min


Aside

Oddball work unit blc 3 prior attempts ended in error. This one complete in under 12 sec


WU true angle range is : 0.027435
http://setiathome.berkeley.edu/result.php?resultid=5070939086
Run time 11 sec
CPU time 8 sec
ID: 1806195 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1806197 - Posted: 31 Jul 2016, 23:22:03 UTC - in response to Message 1806195.  

Aside

Oddball work unit blc 3 prior attempts ended in error. This one complete in under 12 sec

The three prior attempts were all with the stock application with over-zealous sanity checks on the 'found' signals. That's the one we're desperately trying to replace (arguably a premature release) with an application version similar to the one you're testing now.
ID: 1806197 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1806231 - Posted: 1 Aug 2016, 1:11:22 UTC - in response to Message 1806195.  

Raistmer, what is the difference between -tt 1500 and -high_prec_timer?

It looks to me like the -high_prec_timer might be faster
ID: 1806231 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806457 - Posted: 1 Aug 2016, 20:41:05 UTC - in response to Message 1805930.  


I got a driver restart. Here's the WU where it happened: http://setiathome.berkeley.edu/result.php?resultid=5065787291

Unfortunately result out of reach already.
Worth to copy stderr ot do some work at beta.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806457 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.