Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention

From: Kin Cho
Date: Sat Feb 02 2019 - 02:56:29 EST


Zhenzhong,
The machine is running test this weekend, I'll try your simple fix next week.

All,
We're not aware of a specific customer need for acpi_pm/PMTMR,
but if we must keep acpi_pm/PMTMR in the kernel, let's fix it so it
actually works, even on machine like ours.
On our hardware currently it's broken both during clocksource selection
and as a permanent clocksource.

Thanks,

-kin

On 2/1/19 6:52 PM, Zhenzhong Duan wrote:
On 2019/1/31 22:26, Thomas Gleixner wrote:
On Thu, 31 Jan 2019, Zhenzhong Duan wrote:

On 2019/1/30 16:06, Thomas Gleixner wrote:
On Tue, 22 Jan 2019, Zhenzhong Duan wrote:

On a large system with many CPUs, using PMTMR as the clock source can
have a significant impact on the overall system performance because
of the following reasons:
ÂÂ 1) There is a single PMTMR counter shared by all the CPUs.
ÂÂ 2) PMTMR counter reading is a very slow operation.

Using PMTMR as the default clock source may happen when, for example,
the TSC clock calibration exceeds the allowable tolerance and HPET
disabled by nohpet on kernel command line. Sometimes the performance

The question is why would anyone disable HPET on a larger machine when the
TSC is wreckaged?

There may be broken hardware where TSC is wreckaged.

I know that.

I'm not against the change per se, but I really want to understand why we
need all the complexity for something which should never be used in a real
world deployment.

Hmm, it's a strong word of "never be used". Customers may happen to use
nohpet(sanity test?) and report bug to us. Sometimes they does report a bug
that reproduce with their customed config. There may also be BIOS setting HPET
disabled.

And because the customer MAY do completely nonsensical things (and there
are a lot more than the HPET) the kernel has to handle all of them?

Ok, then. I don't have more suggestion to convince you. I just think of a simple fix as below. I think it will work for both hpet and pmtmr. We will test it when the env is available.

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1353,6 +1353,7 @@ static int change_clocksource(void *data)

ÂÂÂÂÂÂÂ write_seqcount_end(&tk_core.seq);
ÂÂÂÂÂÂÂ raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+ÂÂÂÂÂÂ tick_clock_notify();

ÂÂÂÂÂÂÂ return 0;
Â}
@@ -1371,7 +1372,6 @@ int timekeeping_notify(struct clocksource *clock)
ÂÂÂÂÂÂÂ if (tk->tkr_mono.clock == clock)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;
ÂÂÂÂÂÂÂ stop_machine(change_clocksource, clock, NULL);
-ÂÂÂÂÂÂ tick_clock_notify();
ÂÂÂÂÂÂÂ return tk->tkr_mono.clock == clock ? 0 : -1;
Â}


Thanks
Zhenzhong