Best tuning for 1080ti and Process Lasso use.

Message boards : Number crunching : Best tuning for 1080ti and Process Lasso use.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1920688 - Posted: 23 Feb 2018, 17:44:16 UTC - in response to Message 1920647.  

KM: Why do you say use_sleep causes less production? I set mine that way some time ago because the GPU tasks were hogging CPU threads (IIRC?). When I made that change, things seemed to get a lot better on that score. Did I misread what was happening?

Raistmer is the one to ask since he wrote the application. I believe based on his post at Lunatics Re: Some considerations regarding OpenCL MultiBeam app tuning from algorithm view on how sleep functions, that cpu affinity has nothing to do with -use_sleep. The function has more to do with the app doing spin waiting in the video driver.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1920688 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1920700 - Posted: 23 Feb 2018, 19:09:49 UTC - in response to Message 1920688.  

KM: Why do you say use_sleep causes less production? I set mine that way some time ago because the GPU tasks were hogging CPU threads (IIRC?). When I made that change, things seemed to get a lot better on that score. Did I misread what was happening?

Raistmer is the one to ask since he wrote the application. I believe based on his post at Lunatics Re: Some considerations regarding OpenCL MultiBeam app tuning from algorithm view on how sleep functions, that cpu affinity has nothing to do with -use_sleep. The function has more to do with the app doing spin waiting in the video driver.

I belive Cruncher American was just asking about the use_sleep option and my contention that it slowed down crunching. I don't think he was connecting that with the affinity assertion that we are playing with.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1920700 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1920707 - Posted: 23 Feb 2018, 19:36:50 UTC - in response to Message 1920700.  

KM: Why do you say use_sleep causes less production? I set mine that way some time ago because the GPU tasks were hogging CPU threads (IIRC?). When I made that change, things seemed to get a lot better on that score. Did I misread what was happening?

Raistmer is the one to ask since he wrote the application. I believe based on his post at Lunatics Re: Some considerations regarding OpenCL MultiBeam app tuning from algorithm view on how sleep functions, that cpu affinity has nothing to do with -use_sleep. The function has more to do with the app doing spin waiting in the video driver.

I belive Cruncher American was just asking about the use_sleep option and my contention that it slowed down crunching. I don't think he was connecting that with the affinity assertion that we are playing with.

Gotcha. The use_sleep explanation from Raistmer explains what use_sleep does. It HAS to be slower than not using it simply because it adds additional polling time on the cpu to allow the gpu kernel processes enough time to accumulate enough data to make it worthwhile for the cpu to service the gpu data transfer request.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1920707 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1920709 - Posted: 23 Feb 2018, 19:41:47 UTC

I used Process Lasso to take the affinity usage one step further. I assigned my browser to use the even numbered cores which are running the CPU tasks, leaving the odd numbered cores to handle the more 'needy' GPU SOG app.
Seems to have helped reduce any user lag just a little bit more.
I am quite pleased with the result so far.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1920709 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1920712 - Posted: 23 Feb 2018, 20:11:37 UTC

That is how I use ProcessLasso. Have done so for the longest times. Started doing that with my FX processor crunchers since they are handicapped by having to share the FPU registers between each modules physical and virtual cores. That way the physical cores get exclusive use of the FPU registers and that is where the cpu task does most of its calculations. The gpu task use almost none of any FPU register because all of the math calculations take place in the gpu. The virtual cores simply have to shovel data to and fro to each gpu task. Seem to me to be the most efficient use of any cpu.

I'm surprised that the browser requires that much cpu support. Do you have the "use hardware acceleration" option turned on in the browser? I guess if it further improves the noticeable system lag, no harm, no foul.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1920712 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1920718 - Posted: 23 Feb 2018, 20:42:54 UTC

Could someone try and give a Readers Digest version of Virtual vs. Physical cores? The way I thought it worked, which reading your last post Kevin is obviously wrong, was if HT was enabled, it basically took one "real" core, and divided itself in half, and presented itself as 2 cores, which could do semi autonomous work on their own. But, they were still attached at the hip, and were still handicapped by being such. Def confused now.

ID: 1920718 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1920724 - Posted: 23 Feb 2018, 21:08:38 UTC - in response to Message 1920718.  
Last modified: 23 Feb 2018, 21:15:04 UTC

My post was describing a one-off situation that only applies to the old AMD FX processors which were unique in that they didn't follow any of the conventional Intel or past AMD cpu designs. The design was very much a badly executed 'kludge'. And the market punished them appropriately.

Read this article for your 'Readers Digest' explanation of Hyperthreading and physical versus virtual or HT cores. CPU Basics: Multiple CPUs, Cores, and Hyper-Threading Explained

Essentially, the cpu core starts two threads in parallel pipelines and then context switches between them to allow access to the necessary registers for the thread execution.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1920724 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1920726 - Posted: 23 Feb 2018, 21:14:16 UTC - in response to Message 1920724.  

Or maybe I misinterpreted your confusion about what I describe as 'virtual' instead of the usual hyperthread term. I'm not talking about 'virtualization' as in a virtualized operating system which mimics a completely separate computer alongside the real computer hardware and OS by utilizing a virtual machine image that is held in a computer file on disk.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1920726 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1920727 - Posted: 23 Feb 2018, 21:14:21 UTC - in response to Message 1920724.  

Ahh, ok, gotcha, thanks. I'll give it a perusing tonite when I get home and can grab a beer, and play with some new (and existing) toys.. :-D

Going to hopefully be an interesting night, and if all goes well, I might even post some pics, if I can figure out where and how I did it last year...

ID: 1920727 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1920733 - Posted: 23 Feb 2018, 21:43:28 UTC - in response to Message 1920712.  
Last modified: 23 Feb 2018, 21:47:54 UTC

I'm surprised that the browser requires that much cpu support. Do you have the "use hardware acceleration" option turned on in the browser? I guess if it further improves the noticeable system lag, no harm, no foul.

I don't think it's the browser per se. lt is manifested mostly in keyboard lag and some erratic mouse movement, which I would think are more in the control of the OS. But, I am not certain of that.
In any case, wherever the bottleneck may exist, the affinity usage has tamed it to a kitten rather than a tiger.

I am looking to see where hardware acceleration is or is not used in my Seamonkey browser.

EDIT......
I did find the option for hardware acceleration in Seamonkey. It was turned off, so not contributing to any lag I was experiencing.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1920733 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1920737 - Posted: 23 Feb 2018, 22:10:40 UTC - in response to Message 1920733.  

Primarily, the hardware acceleration in browsers is performed by the gpu not the cpu. So normally you see choppiness in video display when it is turned on in low powered systems. That also can impact BOINC gpu processing since the two processes would be fighting over gpu resources.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1920737 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1920741 - Posted: 23 Feb 2018, 22:19:02 UTC

If you increase the -period_iterations_num a little (instead of 1 try 10 or 20) your lag will almost disappears and the crunching times did not increase to much.
ID: 1920741 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1920742 - Posted: 23 Feb 2018, 22:19:41 UTC - in response to Message 1920737.  
Last modified: 23 Feb 2018, 22:25:49 UTC

Primarily, the hardware acceleration in browsers is performed by the gpu not the cpu. So normally you see choppiness in video display when it is turned on in low powered systems. That also can impact BOINC gpu processing since the two processes would be fighting over gpu resources.

So, probably best to leave it turned off then. I get a bit of choppy choppy when watching a video at times. Must be due to some resource being stretched too thin. I would think trying to use hardware acceleration would make it worse.

EDIT....
Ahhhh....I just remembered something else. When I watch movies online from my Amazon account, I have to use Internet Exploder because playback does not work correctly in my Seamonkey browser. And in Exploder, hardware acceleration is an opt-out checkbox. It was not. I checked the box and will see if that makes any difference the next time I watch a movie.

EDIT 2........
Geez, the ol' kittyman is learning all sorts of thingys today......LOL.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1920742 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1920776 - Posted: 24 Feb 2018, 1:02:26 UTC - in response to Message 1920742.  

Always nice to learn new things everyday........ instead of pushing up daisies ;-^}
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1920776 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1920825 - Posted: 24 Feb 2018, 4:59:49 UTC - in response to Message 1920776.  

Always nice to learn new things everyday........ instead of pushing up daisies ;-^}

You bet your bippy!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1920825 · Report as offensive
Marco Vandebergh ( SETI orphan )

Send message
Joined: 27 Aug 10
Posts: 39
Credit: 12,630,994
RAC: 9
Netherlands
Message 1921031 - Posted: 25 Feb 2018, 13:28:48 UTC

Maybe this helps,

I have been playing a lot with the settings also, and ending up using no HT and these settings:
Just single task. It ends up for 1 guppi task about 2 minutes and between 30 and 40 seconds ish per unit.
Just using 1 at a time. GPU usage is between 80 an 100%, it depends on WU.

<app_config>
<app_version>
<app_name>setiathome_v8</app_name>
<plan_class>opencl_nvidia_SoG</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
<cmdline>-sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 512 -oclfft_tune_ls 512 -oclfft_tune_bn 256 -oclfft_tune_cw 256 -pref_wg_num_per_cu 6 -period_iterations_num 1 -hp -high_prec_timer -high_perf -tt 1500</cmdline>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<plan_class>opencl_nvidia_100</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
<cmdline>-sbs 2048 -unroll 28 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 1 64 8 1 -tune 2 64 8 1 -hp</cmdline>
</app_version>
</app_config>
ID: 1921031 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1921035 - Posted: 25 Feb 2018, 13:46:43 UTC - in response to Message 1921031.  
Last modified: 25 Feb 2018, 13:56:25 UTC

I will give it a test and see what shakes out.
Much of your command line is the same as mine except for.......
Your wg 512, mine 256
Your bn 256, mine 64
Your cw 256, mine 64

I'll plug your numbers in for a bit.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1921035 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1921052 - Posted: 25 Feb 2018, 16:55:26 UTC - in response to Message 1921035.  

I will give it a test and see what shakes out.
Much of your command line is the same as mine except for.......
Your wg 512, mine 256
Your bn 256, mine 64
Your cw 256, mine 64

I'll plug your numbers in for a bit.


No GPU has 256 memory banks so useless.


With each crime and every kindness we birth our future.
ID: 1921052 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921054 - Posted: 25 Feb 2018, 17:09:47 UTC - in response to Message 1921052.  

I think Mike is right. Useless. If I am interpreting Raistmer's docs in the Readme and at the Lunatics post. Some considerations regarding OpenCL MultiBeam app tuning from algorithm view

-pref_wg_size N : Sets preferred workgroup size for Pulsefind kernels.
Should be multiple of wave size (32 for nVidia, 64 for ATi) for better performance
and doesn't exceed maximal possible WG size for particular device (256 for ATi and Intel, less than 2048 for NV, depending on CC of device).


-pref_wg_num_per_cu N : Sets preferred number of workgroups per compute unit. Currently used only in PulseFind kernels.


This class of options tunes oclFFT performance
-oclfft_tune_gr N : Global radix
-oclfft_tune_lr N : Local radix
-oclfft_tune_wg N : Workgroup size
-oclfft_tune_ls N : Max size of local memory FFT
-oclfft_tune_bn N : Number of local memory banks
-oclfft_tune_cw N : Memory coalesce width


I played around with memory coalesce width in the past and it seemed to help on some tasks, but not others. That was back in the day of only Arecibo. Haven't experimented with any BLC work yet.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921054 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921055 - Posted: 25 Feb 2018, 17:12:04 UTC

The tunings in my opinion have to be a compromise in a 'mixed' card scenario, as in multiple cards of different versions. A max tune could work better for a single 1080Ti like Marco's case.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921055 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Best tuning for 1080ti and Process Lasso use.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.