Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention

From: Zhenzhong Duan
Date: Wed Jan 30 2019 - 22:49:22 EST


On 2019/1/30 16:06, Thomas Gleixner wrote:
On Tue, 22 Jan 2019, Zhenzhong Duan wrote:

On a large system with many CPUs, using PMTMR as the clock source can
have a significant impact on the overall system performance because
of the following reasons:
1) There is a single PMTMR counter shared by all the CPUs.
2) PMTMR counter reading is a very slow operation.

Using PMTMR as the default clock source may happen when, for example,
the TSC clock calibration exceeds the allowable tolerance and HPET
disabled by nohpet on kernel command line. Sometimes the performance

The question is why would anyone disable HPET on a larger machine when the
TSC is wreckaged?

There may be broken hardware where TSC is wreckaged.
On our instances(X8-8/X7-8), TSC isn't wreckaged. Sometimes we are lucky to pass the bootup stage, then TSC is the final default clocksource. See log:
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[ 13.963224] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[ 19.903175] clocksource: Switched to clocksource refined-jiffies
[ 20.190467] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 20.201634] clocksource: Switched to clocksource acpi_pm
[ 39.082577] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2113ba2fe3c, max_idle_ns: 440795266816 ns
[ 39.138781] clocksource: Switched to clocksource tsc

When we are unlucky, logs:
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[ 19.905741] clocksource: Switched to clocksource refined-jiffies
[ 20.181521] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 44.273786] watchdog: BUG: soft lockup - CPU#48 stuck for 23s! [swapper/48:0]
[ 44.279992] watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [migration/49:307]

So we paniced when acpi_pm is initializing and is chosed as default clocksource temporarily, it paniced just because we add nohpet parameter.

I'm not against the change per se, but I really want to understand why we
need all the complexity for something which should never be used in a real
world deployment.

Hmm, it's a strong word of "never be used". Customers may happen to use nohpet(sanity test?) and report bug to us. Sometimes they does report a bug that reproduce with their customed config. There may also be BIOS setting HPET disabled.

Thanks
Zhenzhong