To Hyperthread or not to Hyperthread, that is the question

Message boards : Number crunching : To Hyperthread or not to Hyperthread, that is the question
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1946779 - Posted: 29 Jul 2018, 9:17:09 UTC

I have a new-to-me version of this proposal.

I have an older Z-400 that does not have AVX and a newer Optiplex that does have AVX. For reasons unrelated to hyper threading I have upgraded/clean installed the Z-400 to Windows 10 Pro.

I also had the Bios reset itself to its default (I went back in and turned on the keyboard-number lock).
So here I am with my Z-400 running without hyperthreading.

i have years of experience on this system running both "stock" and lunatix seti. And the difference has been consistent. lunatix on the cpu is MUCH faster.
Stock seti has been known to take upwards to 7 hours (wall clock time) to process. With hyperthreading engaged.

Anyway. It now is beginning to look like that for a pre-AVX cpu, when the cpus are getting 8.05 it is a bunch faster without hyperthreading.

I have an AVX type cpu that is running rapidly (2.5 hours or so?)with hyperthreading enabled.

Since this is the 2nd re-install in 24 hours the Seti scheduler has started sending my CUDA 42/50's for the gpu. So my examination of the cpu wall clock times (not the ones currently being displayed) may be off.

I thought I would offer up this idea for munching/crunching/discussion.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1946779 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1946790 - Posted: 29 Jul 2018, 11:17:14 UTC
Last modified: 29 Jul 2018, 11:19:38 UTC

Well, in getting New Prometheus up and running, I've noticed that on a Gigabyte GA-Z270-HD3 MOBO with an Intel i7 7700K at 4.2GHz with 32GB DDR4 Corsair Vengeance RAM and TWO EVGA GTX-1050 Gaming 2GB GDDR5 VRAM GPUs that I CAN crunch 6 Units at a time. Lunatics 0.45 Beta 6 is installed, and I've set it to run AVX on the CPU, allow Astropulse on CPU and GPU. Four of the 6 Units are crunched on the CPU, and two of the Units are on GPU - one task per GPU card. The GPUs are set by Lunatics to run SOG Units.

The CPU crunches an Estimated 9 Hour Unit in 36 to 37.5 Minutes while crunching 4 of these at a time. The GPUs crunch each Unit in 10 to 11 Minutes. I run the system from 18:00 to 09:00 daily. Oh, and Hyperthreading is on so, of 8 possible CPU Threads, I'm using four for CPU crunching two to feed the GPUs, and two free for running the system.

I hope this info. helps. :-)


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1946790 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1946804 - Posted: 29 Jul 2018, 13:12:24 UTC - in response to Message 1946790.  

I have 6 real cores and 6 HT cores.
I run 6 CPU tasks and 4 GPU tasks and leave 2 cores free.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1946804 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1946976 - Posted: 30 Jul 2018, 12:46:49 UTC

TL has a 4.3 Ghz cpu running AVX. I have a 3.4 GHZ running AVX. I am currently running stock Seti again to see how it performs against the Lunitix beta distro.

Hmmm. Looks like your cpu is 25% faster than mine. But you are getting much more than 25% faster than mine under the Lunitix distro. I don't remember coming even close to that level of cpu production.

Oh, well. The reason I raised the idea is I think that Hyperthreading in computationally intense environments keeps getting better. So an older Intel cpu that is not using HT may turn out to be more productive than when HT is turned on. I am certain of this for the very early generation(s) of hyperthreading cpus.

I had one that the HT ran at maybe 18% of the speed of the main cpu. I had to use "cpu lasso" to setup things so it was running a bit more rationally.

I guess after a while I will turn HT back on, on the Z-400 and see how the cpu processing speed changes.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1946976 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1947066 - Posted: 31 Jul 2018, 0:59:16 UTC

I did some HT testing, see thread back in May, on a Ryzen 1700 system and confirmed the (reasonable) expectation that HT does, in fact, increase the system production. But only by 10% to 20% overall, each task running slower but compensated by more concurrent tasks. Those tests were for a CPU-only configuration, i.e. no GPU active.
Adding a GPU into the mix makes things more complicated. It is widely advised to "allocate" one CPU for each GPU device. But the problem is to get the task scheduler to use that CPU in the ideal way. The "ideal" way (in my view) is for the scheduler, when the GPU needs a CPU action, to assign a CPU that does NOT have its HT component in use. In my Linux environment I have not yet found a way to do that. To illustrate the point: imagine a 3-core system <Aa><Bb><Cc> with cpu's A,B,C and their respective HT components a,b,c. Suppose 2 tasks are running and a GPU task demands a 3rd task. Possible states are: A+B in use and C is allocated (kind of what I think of as ideal); but since ALL cpu's look identical to the scheduler it is also possible to have A+B in use and <a> allocated; or, A+B in use and <b> allocated; or A+a in use and B assigned. Well, you get the idea that the task serving the GPU may, more likely than not, be run in a HT core and have to contend for core resources with the other task in that core. You may notice that the "A+a in use and B assigned" kind of situation would be ideal for the GPU (and it is) but it implies that the two running tasks are jammed into one core, and performance suffering badly, when idle cores/cpus are available. I am trying to find uses of the "cpuset" and "cgroup" Linux kernel features to address this dilemma, but so far without success.
What does the Windows scheduler do in similar situations?? I do not know. Maybe its better - maybe its worse. Since Tom is working in a Windows environment I only point out the (above) Linux perspective as an example of the HT complications.
ID: 1947066 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947071 - Posted: 31 Jul 2018, 1:16:11 UTC - in response to Message 1947066.  

I keep cpu tasks on the physical cores and keep the gpu tasks on the HT or virtual cores. The tool to use is 'schedtool' with which I assign affinity for each Seti application to the type of core I want a task to run on. I also use it raise the nice level of the gpu tasks to high priority.
#Run in root terminal, NOT sudo


nvidia-smi -pm 1

for (( ; ; ))
do
  # Assign CPU Priority (19=Nice/LowPriority, 0=Normal, -20=HighPriority)
 # This was code Petri gave out
 # GPU Tasks get high Priority
  schedtool -n -20 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90`
  schedtool -n -20 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
 # CPU Tasks get (a little) Below Normal Priority (0 being normal) to make sure it doesn't choke the OS
  schedtool -n   5 `pidof ap_7.05r2728_sse3_linux64`
  schedtool -n   5 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu`

  # Assign CPU Usage Threads (0-7)
 # Brent added this to Petri's code
 # Keep GPU tasks on threads 1 3 5 7 9 11 13 15
  schedtool -a 1,3,5,7,9,11,13,15 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90`
  schedtool -a 1,3,5,7,9,11,13,15 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
 # Keep CPU tasks on threads 0 2 4 6 8 10 12 14
  schedtool -a 0,2,4,6,8,10,12,14 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu`
  schedtool -a 0,2,4,6,8,10,12,14 `pidof ap_7.05r2728_sse3_linux64`


  #    CPU Priority Assignment Script
  date
  # lscpu | grep MHz
  sleep 5
  echo  "  CPU Priority and Assignment Script (8 Threads)" 
done

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947071 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1947075 - Posted: 31 Jul 2018, 1:39:48 UTC
Last modified: 31 Jul 2018, 1:41:21 UTC

I googled around on this topic and someone quoting what they read from Intel said "in general" you can get upto a 30% boost in performance.

So I threw up my hands and turned HT back on, on my two windows boxes. I have, I think, managed to disable all the "speed step" and/or wait states mentioned on all my BIO's.

I sometimes wish I didn't seem to have a black thumb (I keep killing the setups because I don't understand what I am screwing up) on Linux.

Oh well, thank you for the conversation.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1947075 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1947138 - Posted: 31 Jul 2018, 13:56:23 UTC - in response to Message 1947075.  

I got up this morning and the Z-400 which HAD been munching 8.05's in about a hour and a half was suddenly running at least 30% slower per task. I just switched it back to no hyper threading to see if the time goes back down.

I remember reading something about "the individual tasks will run slower but the total production will increase." I wonder if that is what I am experiencing?
A proud member of the OFA (Old Farts Association).
ID: 1947138 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947143 - Posted: 31 Jul 2018, 14:16:30 UTC

Probably not wise. The change in task runtimes was more likely simply the difference in BLC02 to BLC04 tasks. Some tasks just take longer or shorter than others.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947143 · Report as offensive
Profile ericlp

Send message
Joined: 11 Aug 08
Posts: 14
Credit: 14,151,505
RAC: 0
United States
Message 1947284 - Posted: 1 Aug 2018, 15:44:13 UTC

HT (HyperThreading) is just Intel's marketing brand name for SMT (Simultaneous Multithreading). AMD just calls it SMT. I've heard that AMD (ryzen) does a better job than Intel when it comes to multithreading, as they use larger cache memory and wider bus lanes.
ID: 1947284 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947289 - Posted: 1 Aug 2018, 16:19:15 UTC - in response to Message 1947284.  

I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947289 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1947598 - Posted: 2 Aug 2018, 22:02:03 UTC - in response to Message 1947066.  

Windows scheduler has affinity property that allows to pin process to particular logical CPU. So it's possible to group CPU processes on one real core while GPU processes allocate to anotehr real CPU core.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1947598 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947601 - Posted: 2 Aug 2018, 22:26:52 UTC - in response to Message 1947138.  

I got up this morning and the Z-400 which HAD been munching 8.05's in about a hour and a half was suddenly running at least 30% slower per task. I just switched it back to no hyper threading to see if the time goes back down.

I remember reading something about "the individual tasks will run slower but the total production will increase." I wonder if that is what I am experiencing?


. . As has been said. Each task takes longer but you are doing twice as many tasks at a time. The question then is not how long each task takes but how long each pair of tasks take (averaged). If running with HT off and it takes 70 mins to complete 2 tasks then the issue is does it take more or less than 70 mins to complete two tasks with HT on.

Stephen

??
ID: 1947601 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1947602 - Posted: 2 Aug 2018, 22:29:21 UTC - in response to Message 1947598.  
Last modified: 2 Aug 2018, 22:34:22 UTC

Windows scheduler has affinity property that allows to pin process to particular logical CPU. So it's possible to group CPU processes on one real core while GPU processes allocate to anotehr real CPU core.

Thank You Raistmer, and the Linux has a similar system too.
Please do some find on site pages and with google too. If then still some problem we're glad to help you.

P.R.

#!/bin/bash

for (( ; ; ))
do
  schedtool -a 1,2,3,4 `pidof setiathome_x41zc_x86_64-pc-linux-gnu_cuda65_v8`
  schedtool -a 1,2,3,4 `pidof ap_7.01r2793_sse3_clGPU_x86_64`
  schedtool -a 1,2,3,4 `pidof axo`
  schedtool -a 6,7,8,9,10,11 `pidof MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu`
  schedtool -a 5 `pidof compiz`
  sleep 2

done
	

To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1947602 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947603 - Posted: 2 Aug 2018, 22:32:43 UTC - in response to Message 1947289.  

I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example.


. . Because you are using Petri's technique for optimising the CPU usage that is not a fair comparison unless you are using it in the same manner on both the Intel and AMD CPUs. The big advantage of that technique is that the CPU tasks have dibs on the FPU for each hardware core. The GPU support on the HT threads doesn't need it as I understand it. So the CPU tasks should run at speeds somewhere between having HT off and the normal HT on situation.

Stephen

? ?
ID: 1947603 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1947609 - Posted: 2 Aug 2018, 22:41:17 UTC - in response to Message 1947289.  

I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example.


I was an affiliate of AMD CPUs some ten-twenty years ago.
If I'm going to build a new system, I'll still consider the AMD alternative a viable option.

My recent build needed some serious backup from the MOBO supporting 4xGPU with maximum PCIEx lanes.
Things may change.

--
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1947609 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947615 - Posted: 2 Aug 2018, 23:12:09 UTC - in response to Message 1947603.  

I don't know if it is the difference in SMT (or HT) as Intel calls it or simply the difference in architecture between Ryzen and Intel Broadwell-E. But the Ryzens kick butt on the cpu tasks compared to the i7-6850K that runs 300 Mhz faster. I always heard that Intel had superior math performance. Sure not seeing it on my example.


. . Because you are using Petri's technique for optimising the CPU usage that is not a fair comparison unless you are using it in the same manner on both the Intel and AMD CPUs. The big advantage of that technique is that the CPU tasks have dibs on the FPU for each hardware core. The GPU support on the HT threads doesn't need it as I understand it. So the CPU tasks should run at speeds somewhere between having HT off and the normal HT on situation.

Stephen

? ?

Nope. Running the same exact applications with the same exact parameters on both Intel and AMD systems. The only differences are quad channel memory for the Intel at 3000 Mhz and 4250 Mhz cpu and only dual channel memory at 3466 Mhz and 4000 Mhz cpu for the AMD systems. All systems have 16GB of memory. CPU tasks only run on the physical cores and gpu tasks only run on the virtual cores.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947615 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947626 - Posted: 3 Aug 2018, 0:02:04 UTC - in response to Message 1947615.  


Nope. Running the same exact applications with the same exact parameters on both Intel and AMD systems. The only differences are quad channel memory for the Intel at 3000 Mhz and 4250 Mhz cpu and only dual channel memory at 3466 Mhz and 4000 Mhz cpu for the AMD systems. All systems have 16GB of memory. CPU tasks only run on the physical cores and gpu tasks only run on the virtual cores.

. . Then that is a fair comparison sure enough. So what sort of differences are you seeing?

Stephen

? ?
ID: 1947626 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1947647 - Posted: 3 Aug 2018, 3:22:25 UTC - in response to Message 1947615.  

Running the same exact applications with the same exact parameters on both Intel and AMD systems.

It's not unusual for an application optimised for a particular hardware/instruction set to not run as well on different hardware- even if using the same instruction set in each case.
Grant
Darwin NT
ID: 1947647 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947651 - Posted: 3 Aug 2018, 3:37:00 UTC - in response to Message 1947626.  


Nope. Running the same exact applications with the same exact parameters on both Intel and AMD systems. The only differences are quad channel memory for the Intel at 3000 Mhz and 4250 Mhz cpu and only dual channel memory at 3466 Mhz and 4000 Mhz cpu for the AMD systems. All systems have 16GB of memory. CPU tasks only run on the physical cores and gpu tasks only run on the virtual cores.

. . Then that is a fair comparison sure enough. So what sort of differences are you seeing?

Stephen

? ?

Cpu tasks on Ryzen run in 29-35 minutes. Cpu tasks on X99 Intel i7-6850K run in 45-55 minutes.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947651 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : To Hyperthread or not to Hyperthread, that is the question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.