Message boards :
Number crunching :
To Hyperthread or not to Hyperthread, that is the question
Message board moderation
Author | Message |
---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I have a new-to-me version of this proposal. I have an older Z-400 that does not have AVX and a newer Optiplex that does have AVX. For reasons unrelated to hyper threading I have upgraded/clean installed the Z-400 to Windows 10 Pro. I also had the Bios reset itself to its default (I went back in and turned on the keyboard-number lock). So here I am with my Z-400 running without hyperthreading. i have years of experience on this system running both "stock" and lunatix seti. And the difference has been consistent. lunatix on the cpu is MUCH faster. Stock seti has been known to take upwards to 7 hours (wall clock time) to process. With hyperthreading engaged. Anyway. It now is beginning to look like that for a pre-AVX cpu, when the cpus are getting 8.05 it is a bunch faster without hyperthreading. I have an AVX type cpu that is running rapidly (2.5 hours or so?)with hyperthreading enabled. Since this is the 2nd re-install in 24 hours the Seti scheduler has started sending my CUDA 42/50's for the gpu. So my examination of the cpu wall clock times (not the ones currently being displayed) may be off. I thought I would offer up this idea for munching/crunching/discussion. Tom A proud member of the OFA (Old Farts Association). |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
Well, in getting New Prometheus up and running, I've noticed that on a Gigabyte GA-Z270-HD3 MOBO with an Intel i7 7700K at 4.2GHz with 32GB DDR4 Corsair Vengeance RAM and TWO EVGA GTX-1050 Gaming 2GB GDDR5 VRAM GPUs that I CAN crunch 6 Units at a time. Lunatics 0.45 Beta 6 is installed, and I've set it to run AVX on the CPU, allow Astropulse on CPU and GPU. Four of the 6 Units are crunched on the CPU, and two of the Units are on GPU - one task per GPU card. The GPUs are set by Lunatics to run SOG Units. The CPU crunches an Estimated 9 Hour Unit in 36 to 37.5 Minutes while crunching 4 of these at a time. The GPUs crunch each Unit in 10 to 11 Minutes. I run the system from 18:00 to 09:00 daily. Oh, and Hyperthreading is on so, of 8 possible CPU Threads, I'm using four for CPU crunching two to feed the GPUs, and two free for running the system. I hope this info. helps. :-) TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I have 6 real cores and 6 HT cores. I run 6 CPU tasks and 4 GPU tasks and leave 2 cores free. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
TL has a 4.3 Ghz cpu running AVX. I have a 3.4 GHZ running AVX. I am currently running stock Seti again to see how it performs against the Lunitix beta distro. Hmmm. Looks like your cpu is 25% faster than mine. But you are getting much more than 25% faster than mine under the Lunitix distro. I don't remember coming even close to that level of cpu production. Oh, well. The reason I raised the idea is I think that Hyperthreading in computationally intense environments keeps getting better. So an older Intel cpu that is not using HT may turn out to be more productive than when HT is turned on. I am certain of this for the very early generation(s) of hyperthreading cpus. I had one that the HT ran at maybe 18% of the speed of the main cpu. I had to use "cpu lasso" to setup things so it was running a bit more rationally. I guess after a while I will turn HT back on, on the Z-400 and see how the cpu processing speed changes. Tom A proud member of the OFA (Old Farts Association). |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
I did some HT testing, see thread back in May, on a Ryzen 1700 system and confirmed the (reasonable) expectation that HT does, in fact, increase the system production. But only by 10% to 20% overall, each task running slower but compensated by more concurrent tasks. Those tests were for a CPU-only configuration, i.e. no GPU active. Adding a GPU into the mix makes things more complicated. It is widely advised to "allocate" one CPU for each GPU device. But the problem is to get the task scheduler to use that CPU in the ideal way. The "ideal" way (in my view) is for the scheduler, when the GPU needs a CPU action, to assign a CPU that does NOT have its HT component in use. In my Linux environment I have not yet found a way to do that. To illustrate the point: imagine a 3-core system <Aa><Bb><Cc> with cpu's A,B,C and their respective HT components a,b,c. Suppose 2 tasks are running and a GPU task demands a 3rd task. Possible states are: A+B in use and C is allocated (kind of what I think of as ideal); but since ALL cpu's look identical to the scheduler it is also possible to have A+B in use and <a> allocated; or, A+B in use and <b> allocated; or A+a in use and B assigned. Well, you get the idea that the task serving the GPU may, more likely than not, be run in a HT core and have to contend for core resources with the other task in that core. You may notice that the "A+a in use and B assigned" kind of situation would be ideal for the GPU (and it is) but it implies that the two running tasks are jammed into one core, and performance suffering badly, when idle cores/cpus are available. I am trying to find uses of the "cpuset" and "cgroup" Linux kernel features to address this dilemma, but so far without success. What does the Windows scheduler do in similar situations?? I do not know. Maybe its better - maybe its worse. Since Tom is working in a Windows environment I only point out the (above) Linux perspective as an example of the HT complications. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I keep cpu tasks on the physical cores and keep the gpu tasks on the HT or virtual cores. The tool to use is 'schedtool' with which I assign affinity for each Seti application to the type of core I want a task to run on. I also use it raise the nice level of the gpu tasks to high priority. #Run in root terminal, NOT sudo nvidia-smi -pm 1 for (( ; ; )) do # Assign CPU Priority (19=Nice/LowPriority, 0=Normal, -20=HighPriority) # This was code Petri gave out # GPU Tasks get high Priority schedtool -n -20 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90` schedtool -n -20 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100` # CPU Tasks get (a little) Below Normal Priority (0 being normal) to make sure it doesn't choke the OS schedtool -n 5 `pidof ap_7.05r2728_sse3_linux64` schedtool -n 5 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu` # Assign CPU Usage Threads (0-7) # Brent added this to Petri's code # Keep GPU tasks on threads 1 3 5 7 9 11 13 15 schedtool -a 1,3,5,7,9,11,13,15 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90` schedtool -a 1,3,5,7,9,11,13,15 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100` # Keep CPU tasks on threads 0 2 4 6 8 10 12 14 schedtool -a 0,2,4,6,8,10,12,14 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu` schedtool -a 0,2,4,6,8,10,12,14 `pidof ap_7.05r2728_sse3_linux64` # CPU Priority Assignment Script date # lscpu | grep MHz sleep 5 echo " CPU Priority and Assignment Script (8 Threads)" done Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I googled around on this topic and someone quoting what they read from Intel said "in general" you can get upto a 30% boost in performance. So I threw up my hands and turned HT back on, on my two windows boxes. I have, I think, managed to disable all the "speed step" and/or wait states mentioned on all my BIO's. I sometimes wish I didn't seem to have a black thumb (I keep killing the setups because I don't understand what I am screwing up) on Linux. Oh well, thank you for the conversation. Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I got up this morning and the Z-400 which HAD been munching 8.05's in about a hour and a half was suddenly running at least 30% slower per task. I just switched it back to no hyper threading to see if the time goes back down. I remember reading something about "the individual tasks will run slower but the total production will increase." I wonder if that is what I am experiencing? A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Probably not wise. The change in task runtimes was more likely simply the difference in BLC02 to BLC04 tasks. Some tasks just take longer or shorter than others. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
ericlp Send message Joined: 11 Aug 08 Posts: 14 Credit: 14,151,505 RAC: 0 |
HT (HyperThreading) is just Intel's marketing brand name for SMT (Simultaneous Multithreading). AMD just calls it SMT. I've heard that AMD (ryzen) does a better job than Intel when it comes to multithreading, as they use larger cache memory and wider bus lanes. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Windows scheduler has affinity property that allows to pin process to particular logical CPU. So it's possible to group CPU processes on one real core while GPU processes allocate to anotehr real CPU core. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I got up this morning and the Z-400 which HAD been munching 8.05's in about a hour and a half was suddenly running at least 30% slower per task. I just switched it back to no hyper threading to see if the time goes back down. . . As has been said. Each task takes longer but you are doing twice as many tasks at a time. The question then is not how long each task takes but how long each pair of tasks take (averaged). If running with HT off and it takes 70 mins to complete 2 tasks then the issue is does it take more or less than 70 mins to complete two tasks with HT on. Stephen ?? |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Windows scheduler has affinity property that allows to pin process to particular logical CPU. So it's possible to group CPU processes on one real core while GPU processes allocate to anotehr real CPU core. Thank You Raistmer, and the Linux has a similar system too. Please do some find on site pages and with google too. If then still some problem we're glad to help you. P.R. #!/bin/bash for (( ; ; )) do schedtool -a 1,2,3,4 `pidof setiathome_x41zc_x86_64-pc-linux-gnu_cuda65_v8` schedtool -a 1,2,3,4 `pidof ap_7.01r2793_sse3_clGPU_x86_64` schedtool -a 1,2,3,4 `pidof axo` schedtool -a 6,7,8,9,10,11 `pidof MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu` schedtool -a 5 `pidof compiz` sleep 2 done To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example. . . Because you are using Petri's technique for optimising the CPU usage that is not a fair comparison unless you are using it in the same manner on both the Intel and AMD CPUs. The big advantage of that technique is that the CPU tasks have dibs on the FPU for each hardware core. The GPU support on the HT threads doesn't need it as I understand it. So the CPU tasks should run at speeds somewhere between having HT off and the normal HT on situation. Stephen ? ? |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example. I was an affiliate of AMD CPUs some ten-twenty years ago. If I'm going to build a new system, I'll still consider the AMD alternative a viable option. My recent build needed some serious backup from the MOBO supporting 4xGPU with maximum PCIEx lanes. Things may change. -- To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example. Nope. Running the same exact applications with the same exact parameters on both Intel and AMD systems. The only differences are quad channel memory for the Intel at 3000 Mhz and 4250 Mhz cpu and only dual channel memory at 3466 Mhz and 4000 Mhz cpu for the AMD systems. All systems have 16GB of memory. CPU tasks only run on the physical cores and gpu tasks only run on the virtual cores. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Then that is a fair comparison sure enough. So what sort of differences are you seeing? Stephen ? ? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Running the same exact applications with the same exact parameters on both Intel and AMD systems. It's not unusual for an application optimised for a particular hardware/instruction set to not run as well on different hardware- even if using the same instruction set in each case. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Cpu tasks on Ryzen run in 29-35 minutes. Cpu tasks on X99 Intel i7-6850K run in 45-55 minutes. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.