SETI applications for NVIDIA GPU improvement - how you can help

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 14 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806458 - Posted: 1 Aug 2016, 20:42:27 UTC - in response to Message 1805990.  



Will run this for awhile, noticing no screen lag on both crunchers.

Regards


Good to hear that, thanks.
Please try to use smth like -tt 30 and -tt 60 - will lags appear and on what number?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806458 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806473 - Posted: 1 Aug 2016, 22:21:19 UTC - in response to Message 1806231.  
Last modified: 1 Aug 2016, 22:22:33 UTC

Raistmer, what is the difference between -tt 1500 and -high_prec_timer?

It looks to me like the -high_prec_timer might be faster

-high_prec_timer sets system's multimedia timer resolution to 1 ms.
This will allow real 1ms Sleep if Sleep(1) specified (instead of smth like 15 ms on some hosts). So, -use_sleep will work more precise and cause less performance degradation while keeping CPU consumption low.
But beware, such system-wide change could have some side-effects. Testing needed.

And -tt F option just sets desired partial PulseFind kernel length in ms.
So quite strange you see big difference with and w/o -tt 1500 on VHAR task.
I think some statistics collection will be required here.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806473 · Report as offensive     Reply Quote
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1806604 - Posted: 2 Aug 2016, 10:42:16 UTC - in response to Message 1806473.  

Greetings Raistmer

Apologies, but my crunchers are dual GPU setups and have an AMD card driving the Video Display and the NVIDIA GPU's are in the 2nd slot of all PC's crunching.

I have 3 AMD cards but only 2 are crunching and they are running default R3500 with no cmd line options.

I should have mentioned this in my previous email.

Regards
ID: 1806604 · Report as offensive     Reply Quote
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1806783 - Posted: 3 Aug 2016, 13:10:32 UTC - in response to Message 1805601.  

New set of builds (r3500) available here: https://cloud.mail.ru/public/LJ8s/c3WyRR8ip
Please test

I know you would like varied ARs, however, all WUs originate from Beta and are all Guppi VLARs.  Let me know if my testing needs improving.

>  wo -use_sleep
    WU true angle range : 0.008175
    Task 24493997
    Run time : 9s
    CPU time : 7s

    WU true angle range : 0.008175
    Task 24494050
    Run time : 18m 24s
    CPU time : 18m 19s

>  -tt 1000

    WU true angle range : 0.008175
    Task 24494101
    Run time : 17m 32s
    CPU time : 17m 25s

    WU true angle range : 0.008175
    Task 24494145
    Run time : 8m 45s
    CPU time : 8m 43s

    WU true angle range : 0.008175
    Task 24494245
    Run time : 9s
    CPU time : 7s

>  -use_sleep -tt 1000

    WU true angle range : 0.008604
    Task 24495067
    Run time : 20m 40s
    CPU time : 3m 16s

    WU true angle range : 0.008168
    Task 24495176
    Run time : 9s
    CPU time : 7s

    WU true angle range : 0.008175
    Task 24494294
    Run time : 6m
    CPU time : 1m 51s

>  -use_sleep with -high_prec_timer

    WU true angle range : 0.008175
    Task 24505601
    Run time : 18m 45s
    CPU time : 6m 7s

    WU true angle range : 0.008175
    Task 24505832
    Run time : 18m 46s
    CPU time : 6m 15s

ID: 1806783 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806845 - Posted: 3 Aug 2016, 18:49:52 UTC - in response to Message 1806783.  
Last modified: 3 Aug 2016, 19:09:10 UTC

 Let me know if my testing needs improving.

Thanks for report.
Lot of listed tasks are overflows. They are not representative ones so can be excluded from future reports completely.


>  -tt 1000[list]WU true angle range : 0.008175
Task 24494101
Run time : 17m 32s
CPU time : 17m 25s

WU true angle range : 0.008175
Task 24494145
Run time : 8m 45s
CPU time : 8m 43s

These 2 ran in identical environments, right? Still show very big timings divergence.
I would like to see some statistics for each of options then (let say 5 non-overflowed results per each of set of settings).
Also, did you run single task per GPU in this test or more?

P.S. From this app's point of view your device can be considered as high-end one so increasing -sbs N value (default is 128), switching to high-perf path and other options for high-end GPU can improve performance w/o GUI lags perhaps.
Longest partial PulseFind was under 100ms (that is, there is no difference between -tt 100 and -tt 1000 for this device).

P.P.S.
Did you experience GUI lags with -tt 1000 and w/o -use_sleep ?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806845 · Report as offensive     Reply Quote
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1806875 - Posted: 3 Aug 2016, 21:21:39 UTC - in response to Message 1806845.  

Lot of listed tasks are overflows. They are not representative ones so can be excluded from future reports completely.
Also, did you run single task per GPU in this test or more?

Single, sorry, should have mentioned that.

P.S. From this app's point of view your device can be considered as high-end one so increasing -sbs N value (default is 128), switching to high-perf path and other options for high-end GPU can improve performance w/o GUI lags perhaps.
Longest partial PulseFind was under 100ms (that is, there is no difference between -tt 100 and -tt 1000 for this device).

P.P.S.
Did you experience GUI lags with -tt 1000 and w/o -use_sleep ?

Sporadic, only minor.  I'll add -high_perf to all subsequent testing.
ID: 1806875 · Report as offensive     Reply Quote
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1806910 - Posted: 3 Aug 2016, 23:40:22 UTC

r3500

Here are the latest, but I've a bad feeling that they are all overflows.  What is in the Stderr output that indicates overflows?  Are overflows preventable on my end?  The WUs are running upwards of 4 minutes longer since the stock r3486.  Additionally, my GPU is not OC'd, it's strictly stock with default settings.

>  -high_perf -use_sleep
    WU true angle range : 0.008175
    Task 24508400
    Run time : 20m 26s
    CPU time : 4m 8s

    WU true angle range : 0.008175
    Task 24508677
    Run time : 20m 7s
    CPU time : 5m 22s

    WU true angle range : 0.008175
    Task 24508496
    Run time : 18m 59s
    CPU time : 4m 48s

    WU true angle range : 0.008175
    Task 24508572
    Run time : 20m 42s
    CPU time : 5m 9s

    WU true angle range : 0.008175
    Task 24508494
    Run time : 20m 42s
    CPU time : 3m 45s

ID: 1806910 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1806973 - Posted: 4 Aug 2016, 7:15:36 UTC - in response to Message 1806910.  

The sign of overflow is the next lines in stderr:
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.


In properly working app overflow is completely data-driven and can't be prevented (it's just noisy data in task).

Please provide similar set of results for -high_perf and w/o -use_sleep.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1806973 · Report as offensive     Reply Quote
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1807083 - Posted: 4 Aug 2016, 18:01:34 UTC - in response to Message 1806973.  

The sign of overflow is the next lines in stderr:
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.


In properly working app overflow is completely data-driven and can't be prevented (it's just noisy data in task).

Please provide similar set of results for -high_perf and w/o -use_sleep.

Thanks for the clarification.  The WUs below were the fastest yet.

>  -high_perf
    WU true angle range : 0.008175
    Task 24508683
    Run time : 18m 02s
    CPU time : 18m

    WU true angle range : 0.008175
    Task 24508590
    Run time : 17m 59s
    CPU time : 17m 56s

    WU true angle range : 0.008175
    Task 24508506
    Run time : 18m 03s
    CPU time : 18m 01s

    WU true angle range : 0.008175
    Task 24508481
    Run time : 18m 10s
    CPU time : 18m 05s

    WU true angle range : 0.008168
    Task 24508042
    Run time : 18m 03s
    CPU time : 18m

ID: 1807083 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1807100 - Posted: 4 Aug 2016, 19:03:37 UTC - in response to Message 1807083.  

thanks.
Seems for your GPU/CPU combo enabling sleep costs only 2/18 of GPU performance.
Interesting how many of GUPPIU tasks (less than 1 of course, but what ~% of task) could your CPU complete for 14 minutes?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1807100 · Report as offensive     Reply Quote
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1807320 - Posted: 5 Aug 2016, 18:46:37 UTC - in response to Message 1807100.  

r3500

These were the next fastest WUs.  They were 30s - 60s longer than -high_perf

> -high_perf -sbs 384 -high_prec_timer
    WU true angle range : 0.008175
    Task 24508711
    Run time : 18m 10s
    CPU time : 18m 03s

    WU true angle range : 0.008175
    Task 24508487
    Run time : 18m 03s
    CPU time : 17m 58

    WU true angle range : 0.008175
    Task 24508354
    Run time : 18m 09s
    CPU time : 18m 05s

    WU true angle range : 0.008175
    Task 24508413
    Run time : 18m 11s
    CPU time : 18m 05s

    WU true angle range : 0.008604
    Task 24513083
    Run time : 18m 04s
    CPU time : 18m

------------------------------------------

As you can see, these were much slower.

>  -high_perf -use_sleep -sbs 256 -hp

    WU true angle range : 0.008175
    Task 24508474
    Run time : 20m 29s
    CPU time : 3m 55s

    WU true angle range : 0.008175
    Task 24508392
    Run time : 20m 02s
    CPU time : 4m 32s

    WU true angle range : 0.008604
    Task 24513116
    Run time : 21m 12s
    CPU time : 3m 35s

    WU true angle range : 0.008604
    Task 24513081
    Run time : 19m 30s
    CPU time : 4m 36s

    WU true angle range : 0.008168
    Task 24513798
    Run time : 20m 11s
    CPU time : 4m 10s


Let me know if you want any other combination of parameters in the cmdline.  I'm going to test some that are listed in "ReadMe_MultiBeam_OpenCL.txt under the "NV specific info" section.  Please check your PM inbox.

ID: 1807320 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809598 - Posted: 16 Aug 2016, 10:49:29 UTC - in response to Message 1807320.  

Let me know if you want any other combination of parameters in the cmdline.  I'm going to test some that are listed in "ReadMe_MultiBeam_OpenCL.txt under the "NV specific info" section.  Please check your PM inbox.

Please remind your GPU model.
And please try complete "stock" config with only single option added:
-tt F
Values to try as F: 30, 60, 90, 120, 180

Look for crunching experience with BLC* tasks - on what value GUI lags begin and how strong they are?

P.S.
I would like to receive similar feedback from others too. Especially from ones with low-end, midrange GPU models. High-end ones can't saturate on PulseFind so give distorted results for this test.

P.P.S. also would be good to repeat exactly same test with another option added: -sbs 256
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809598 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22535
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1809601 - Posted: 16 Aug 2016, 10:55:36 UTC

A question, one of my crunchers has two different GPUs - a GTX1080 and two GTX970s.
(BOINC reports this cruncher as having three GTX1080, if only, if only...)


When it comes to setting up the "command line" should I use the settings for the GTX970 or the GTX1080, or something in between.
Or
Maybe I should try and run two copies of BOINC, one for each of the GPU types?

I'm not near my computers just now, and won't be able to access them for a few hours.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1809601 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809608 - Posted: 16 Aug 2016, 11:26:45 UTC - in response to Message 1809601.  

A question, one of my crunchers has two different GPUs - a GTX1080 and two GTX970s.
(BOINC reports this cruncher as having three GTX1080, if only, if only...)


When it comes to setting up the "command line" should I use the settings for the GTX970 or the GTX1080, or something in between.
Or
Maybe I should try and run two copies of BOINC, one for each of the GPU types?

I'm not near my computers just now, and won't be able to access them for a few hours.

I'm afraid both GTX970 and GTX1080 are too fast to provide reliable results for usability testing with -tt F option. Try -tt 1500 - will you see any lags on BLC VLARs?
And unless one uses special config file for MB app option will be applied to all installed GPUs of the same vendor.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809608 · Report as offensive     Reply Quote
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34380
Credit: 79,922,639
RAC: 80
Germany
Message 1809622 - Posted: 16 Aug 2016, 12:12:11 UTC - in response to Message 1809601.  

A question, one of my crunchers has two different GPUs - a GTX1080 and two GTX970s.
(BOINC reports this cruncher as having three GTX1080, if only, if only...)


When it comes to setting up the "command line" should I use the settings for the GTX970 or the GTX1080, or something in between.
Or
Maybe I should try and run two copies of BOINC, one for each of the GPU types?

I'm not near my computers just now, and won't be able to access them for a few hours.


I can provide device specific comand line file if you are interested.

PM me for details.


With each crime and every kindness we birth our future.
ID: 1809622 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1809790 - Posted: 17 Aug 2016, 1:56:40 UTC
Last modified: 17 Aug 2016, 1:59:07 UTC

. . Hi Raistmer,

. . I have added the command line for r3500 as follows:

-use_sleep -high_prec_timer -sbs 512 -tt 1500

. . I think they are the settings you recommended for GTX970 cards.

. . Runtimes are now about 8/9 mins for nonVLARs and 14/16 mins for Guppies, but they are BLC5 guppies, the slowest and meanest of all the 2bit guppies. The good news is that I am experiencing NO perceptible mouse or keyboard lag. Now my typos are all just due to my thick clumsy fingers not lag.
ID: 1809790 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809834 - Posted: 17 Aug 2016, 7:13:44 UTC - in response to Message 1809790.  


. . I have The good news is that I am experiencing NO perceptible mouse or keyboard lag.

That means that GTX970 can't be saturated on Partial PulseFind indeed. Otherwise you will experience quite noticeable lags. -tt 1500 means 1,5 s (!) kernels.
Obviously, each particular PulseFind just can't provide enough data for GTX970 to allow so long running kernel.

That's why I asking owners of low-end and midrange GPUs to provide data about -tt F option and GUI usability.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809834 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1809839 - Posted: 17 Aug 2016, 8:20:34 UTC - in response to Message 1809834.  
Last modified: 17 Aug 2016, 8:22:48 UTC


. . I have The good news is that I am experiencing NO perceptible mouse or keyboard lag.

That means that GTX970 can't be saturated on Partial PulseFind indeed. Otherwise you will experience quite noticeable lags. -tt 1500 means 1,5 s (!) kernels.
Obviously, each particular PulseFind just can't provide enough data for GTX970 to allow so long running kernel.

That's why I asking owners of low-end and midrange GPUs to provide data about -tt F option and GUI usability.


. . OK, after the WOW event is over I will try it out on the GT730 rig. I like to think of it as higher end of low range with CC of 3.5 but definitely low range with only 2 CUs. So in 2 weeks I will let you know what effects I observe on that rig.

. . As for the 970s, 4 lockups since installing r3500 this am but I am still convinced that is more a system management problem between Win10 and the MoBo. But usage is less than optimum running one WU. Should I try the tuning command line Richard listed in the Lunatics 0.45 Beta thread? He missed out the 970 but I am guessing the 980 string might work.

. . There is a definite ultilisation pattern observable. For the first half of the WU useage is high like about 80/90% but then drops low to about 40/50% on nonVLARs. It remains more constant on Guppies but there is a noticeable drop after the halfway mark, by about 10/15%.

. . I am not sure whether to try multiple instances or not at this point.

P.S. If 1500 is over the top should I reduce -tt to a lower value?
ID: 1809839 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809847 - Posted: 17 Aug 2016, 8:41:48 UTC - in response to Message 1809839.  

But usage is less than optimum running one WU. Should I try the tuning command line Richard listed in the Lunatics 0.45 Beta thread? He missed out the 970 but I am guessing the 980 string might work.

One should clearly understand that requests I make in this thread in no way connected with optimization of particular host setup.
Such efforts should go elsewhere.
My goal is to improve app for all.


. . There is a definite ultilisation pattern observable. For the first half of the WU useage is high like about 80/90% but then drops low to about 40/50% on nonVLARs. It remains more constant on Guppies but there is a noticeable drop after the halfway mark, by about 10/15%.

This info can be translated in search types and their relative performance:
task processing arranged from zero to highest possible chirps (that is, account for doppler shift). With chrp value increase algorithm switches to more and more rough searches leaving only some of FFT sizes and turning off Gaussian search and even PulseFind.
Remaining searches are less compute-intensive so GPU usage drops. Hence it's recommended to run at least 2 tasks per GPU in general (of course, if context switch penalty low enough, never run 2 tasks per pre-FERMI GPU for example).

P.S. If 1500 is over the top should I reduce -tt to a lower value?

Windows TDR delay is 2 seconds so 1,5s should go w/o driver restarts. Actually, I used -tt 1500 on GT720 when did some other profiling and wanted to reduce partial PulseFind kernels number to minimum. Same on C-60. With great GUI lags, of course, but w/o driver restarts.
What is "lockup" you mention? Driver restart or GUI complete freeze and cold-reset required? If latter then it probably driver issue.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809847 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1810033 - Posted: 17 Aug 2016, 23:07:30 UTC - in response to Message 1809847.  


One should clearly understand that requests I make in this thread in no way connected with optimization of particular host setup.
Such efforts should go elsewhere.
My goal is to improve app for all.

Windows TDR delay is 2 seconds so 1,5s should go w/o driver restarts. Actually, I used -tt 1500 on GT720 when did some other profiling and wanted to reduce partial PulseFind kernels number to minimum. Same on C-60. With great GUI lags, of course, but w/o driver restarts.
What is "lockup" you mention? Driver restart or GUI complete freeze and cold-reset required? If latter then it probably driver issue.


. . It is the latter case. I am pretty sure it is a WIN10 driver issue as it did not happen at all when running Win7. I am more and more convinced I need to return to Win7 to solve that completely. But for now mimimising the effect is my short term objective.

. . I will take the tuning issue to another thread.
ID: 1810033 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 14 · Next

Message boards : Number crunching : SETI applications for NVIDIA GPU improvement - how you can help


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.