Message boards :
Number crunching :
Best tuning for 1080ti and Process Lasso use.
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
KM: Why do you say use_sleep causes less production? I set mine that way some time ago because the GPU tasks were hogging CPU threads (IIRC?). When I made that change, things seemed to get a lot better on that score. Did I misread what was happening? Raistmer is the one to ask since he wrote the application. I believe based on his post at Lunatics Re: Some considerations regarding OpenCL MultiBeam app tuning from algorithm view on how sleep functions, that cpu affinity has nothing to do with -use_sleep. The function has more to do with the app doing spin waiting in the video driver. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
KM: Why do you say use_sleep causes less production? I set mine that way some time ago because the GPU tasks were hogging CPU threads (IIRC?). When I made that change, things seemed to get a lot better on that score. Did I misread what was happening? I belive Cruncher American was just asking about the use_sleep option and my contention that it slowed down crunching. I don't think he was connecting that with the affinity assertion that we are playing with. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
KM: Why do you say use_sleep causes less production? I set mine that way some time ago because the GPU tasks were hogging CPU threads (IIRC?). When I made that change, things seemed to get a lot better on that score. Did I misread what was happening? Gotcha. The use_sleep explanation from Raistmer explains what use_sleep does. It HAS to be slower than not using it simply because it adds additional polling time on the cpu to allow the gpu kernel processes enough time to accumulate enough data to make it worthwhile for the cpu to service the gpu data transfer request. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I used Process Lasso to take the affinity usage one step further. I assigned my browser to use the even numbered cores which are running the CPU tasks, leaving the odd numbered cores to handle the more 'needy' GPU SOG app. Seems to have helped reduce any user lag just a little bit more. I am quite pleased with the result so far. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
That is how I use ProcessLasso. Have done so for the longest times. Started doing that with my FX processor crunchers since they are handicapped by having to share the FPU registers between each modules physical and virtual cores. That way the physical cores get exclusive use of the FPU registers and that is where the cpu task does most of its calculations. The gpu task use almost none of any FPU register because all of the math calculations take place in the gpu. The virtual cores simply have to shovel data to and fro to each gpu task. Seem to me to be the most efficient use of any cpu. I'm surprised that the browser requires that much cpu support. Do you have the "use hardware acceleration" option turned on in the browser? I guess if it further improves the noticeable system lag, no harm, no foul. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Could someone try and give a Readers Digest version of Virtual vs. Physical cores? The way I thought it worked, which reading your last post Kevin is obviously wrong, was if HT was enabled, it basically took one "real" core, and divided itself in half, and presented itself as 2 cores, which could do semi autonomous work on their own. But, they were still attached at the hip, and were still handicapped by being such. Def confused now. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
My post was describing a one-off situation that only applies to the old AMD FX processors which were unique in that they didn't follow any of the conventional Intel or past AMD cpu designs. The design was very much a badly executed 'kludge'. And the market punished them appropriately. Read this article for your 'Readers Digest' explanation of Hyperthreading and physical versus virtual or HT cores. CPU Basics: Multiple CPUs, Cores, and Hyper-Threading Explained Essentially, the cpu core starts two threads in parallel pipelines and then context switches between them to allow access to the necessary registers for the thread execution. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Or maybe I misinterpreted your confusion about what I describe as 'virtual' instead of the usual hyperthread term. I'm not talking about 'virtualization' as in a virtualized operating system which mimics a completely separate computer alongside the real computer hardware and OS by utilizing a virtual machine image that is held in a computer file on disk. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Ahh, ok, gotcha, thanks. I'll give it a perusing tonite when I get home and can grab a beer, and play with some new (and existing) toys.. :-D Going to hopefully be an interesting night, and if all goes well, I might even post some pics, if I can figure out where and how I did it last year... |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I'm surprised that the browser requires that much cpu support. Do you have the "use hardware acceleration" option turned on in the browser? I guess if it further improves the noticeable system lag, no harm, no foul. I don't think it's the browser per se. lt is manifested mostly in keyboard lag and some erratic mouse movement, which I would think are more in the control of the OS. But, I am not certain of that. In any case, wherever the bottleneck may exist, the affinity usage has tamed it to a kitten rather than a tiger. I am looking to see where hardware acceleration is or is not used in my Seamonkey browser. EDIT...... I did find the option for hardware acceleration in Seamonkey. It was turned off, so not contributing to any lag I was experiencing. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Primarily, the hardware acceleration in browsers is performed by the gpu not the cpu. So normally you see choppiness in video display when it is turned on in low powered systems. That also can impact BOINC gpu processing since the two processes would be fighting over gpu resources. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
If you increase the -period_iterations_num a little (instead of 1 try 10 or 20) your lag will almost disappears and the crunching times did not increase to much. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Primarily, the hardware acceleration in browsers is performed by the gpu not the cpu. So normally you see choppiness in video display when it is turned on in low powered systems. That also can impact BOINC gpu processing since the two processes would be fighting over gpu resources. So, probably best to leave it turned off then. I get a bit of choppy choppy when watching a video at times. Must be due to some resource being stretched too thin. I would think trying to use hardware acceleration would make it worse. EDIT.... Ahhhh....I just remembered something else. When I watch movies online from my Amazon account, I have to use Internet Exploder because playback does not work correctly in my Seamonkey browser. And in Exploder, hardware acceleration is an opt-out checkbox. It was not. I checked the box and will see if that makes any difference the next time I watch a movie. EDIT 2........ Geez, the ol' kittyman is learning all sorts of thingys today......LOL. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Always nice to learn new things everyday........ instead of pushing up daisies ;-^} Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Always nice to learn new things everyday........ instead of pushing up daisies ;-^} You bet your bippy! "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Marco Vandebergh ( SETI orphan ) Send message Joined: 27 Aug 10 Posts: 39 Credit: 12,630,994 RAC: 9 |
Maybe this helps, I have been playing a lot with the settings also, and ending up using no HT and these settings: Just single task. It ends up for 1 guppi task about 2 minutes and between 30 and 40 seconds ish per unit. Just using 1 at a time. GPU usage is between 80 an 100%, it depends on WU. <app_config> <app_version> <app_name>setiathome_v8</app_name> <plan_class>opencl_nvidia_SoG</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <cmdline>-sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 512 -oclfft_tune_ls 512 -oclfft_tune_bn 256 -oclfft_tune_cw 256 -pref_wg_num_per_cu 6 -period_iterations_num 1 -hp -high_prec_timer -high_perf -tt 1500</cmdline> </app_version> <app_version> <app_name>astropulse_v7</app_name> <plan_class>opencl_nvidia_100</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <cmdline>-sbs 2048 -unroll 28 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 1 64 8 1 -tune 2 64 8 1 -hp</cmdline> </app_version> </app_config> |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I will give it a test and see what shakes out. Much of your command line is the same as mine except for....... Your wg 512, mine 256 Your bn 256, mine 64 Your cw 256, mine 64 I'll plug your numbers in for a bit. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Mike Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80 |
I will give it a test and see what shakes out. No GPU has 256 memory banks so useless. With each crime and every kindness we birth our future. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I think Mike is right. Useless. If I am interpreting Raistmer's docs in the Readme and at the Lunatics post. Some considerations regarding OpenCL MultiBeam app tuning from algorithm view -pref_wg_size N : Sets preferred workgroup size for Pulsefind kernels. -pref_wg_num_per_cu N : Sets preferred number of workgroups per compute unit. Currently used only in PulseFind kernels. This class of options tunes oclFFT performance I played around with memory coalesce width in the past and it seemed to help on some tasks, but not others. That was back in the day of only Arecibo. Haven't experimented with any BLC work yet. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The tunings in my opinion have to be a compromise in a 'mixed' card scenario, as in multiple cards of different versions. A max tune could work better for a single 1080Ti like Marco's case. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.