Re: [RESEND PATCH v6] x86/hpet: Reduce HPET counter read contention

From: Waiman Long
Date: Tue Sep 06 2016 - 11:46:12 EST


On 09/06/2016 11:27 AM, Waiman Long wrote:
On a large system with many CPUs, using HPET as the clock source can
have a significant impact on the overall system performance because
of the following reasons:
1) There is a single HPET counter shared by all the CPUs.
2) HPET counter reading is a very slow operation.

Using HPET as the default clock source may happen when, for example,
the TSC clock calibration exceeds the allowable tolerance. Something
the performance slowdown can be so severe that the system may crash
because of a NMI watchdog soft lockup, for example.

During the TSC clock calibration process, the default clock source
will be set temporarily to HPET. For systems with many CPUs, it is
possible that NMI watchdog soft lockup may occur occasionally during
that short time period where HPET clocking is active as is shown in
the kernel log below:

[ 71.618132] NetLabel: Initializing
[ 71.621967] NetLabel: domain hash size = 128
[ 71.626848] NetLabel: protocols = UNLABELED CIPSOv4
[ 71.632418] NetLabel: unlabeled traffic allowed by default
[ 71.638679] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
[ 71.646504] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[ 71.655313] Switching to clocksource hpet
[ 95.679135] BUG: soft lockup - CPU#144 stuck for 23s! [swapper/144:0]
[ 95.693363] BUG: soft lockup - CPU#145 stuck for 23s! [swapper/145:0]
[ 95.694203] Modules linked in:
[ 95.694697] CPU: 145 PID: 0 Comm: swapper/145 Not tainted 3.10.0-327.el7.x86_64 #1
[ 95.695580] BUG: soft lockup - CPU#582 stuck for 23s! [swapper/582:0]
[ 95.696145] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 008.001.006 SFW: 041.063.152 01/16/2016
[ 95.698128] BUG: soft lockup - CPU#357 stuck for 23s! [swapper/357:0]

This patch attempts to address the above issues by reducing HPET read
contention using the fact that if more than one CPUs are trying to
access HPET at the same time, it will be more efficient when only
one CPU in the group reads the HPET counter and shares it with the
rest of the group instead of each group member trying to read the
HPET counter individually.

This is done by using a combination word with a sequence number and
a bit lock. The CPU that gets the bit lock will be responsible for
reading the HPET counter and update the sequence number. The others
will monitor the change in sequence number and grab the HPET counter
value accordingly. This change is only enabled on SMP configuration.

On a 4-socket Haswell-EX box with 144 threads (HT on), running the
AIM7 compute workload (1500 users) on a 4.8-rc1 kernel (HZ=1000)
with and without the patch has the following performance numbers
(with HPET or TSC as clock source):

TSC = 1042431 jobs/min
HPET w/o patch = 798068 jobs/min
HPET with patch = 1029445 jobs/min

The perf profile showed a reduction of the %CPU time consumed by
read_hpet from 11.19% without patch to 1.24% with patch.

Signed-off-by: Waiman Long<Waiman.Long@xxxxxxx>

Will this patch be good enough to get merged for the next kernel version, probably 4.9? We have problem with our latest large SMP systems because of this issue. Even some new 4-socket systems had explicited this problem as noted by Prarit. We really need to get this fix upstream to have it merged into the distros.

Cheers,
Longman