Re: [PATCH] acpi_pm: Reduce PMTMR counter read contention

From: Zhenzhong Duan
Date: Wed Feb 27 2019 - 22:33:23 EST


On 2019/2/18 11:48, Zhenzhong Duan wrote:
On 2019/2/11 5:08, Thomas Gleixner wrote:
On Sat, 2 Feb 2019, Zhenzhong Duan wrote:
On 2019/1/31 22:26, Thomas Gleixner wrote:

I'm not against the change per se, but I really want to understand
why we need all the complexity for something which should never be
used in a real world deployment.

Hmm, it's a strong word of "never be used". Customers may happen to
use nohpet(sanity test?) and report bug to us. Sometimes they does
report a bug that reproduce with their customed config. There may
also be BIOS setting HPET disabled.

And because the customer MAY do completely nonsensical things (and there
are a lot more than the HPET) the kernel has to handle all of them?

Ok, then. I don't have more suggestion to convince you.

You give up too fast :)

Ah, because I thought of a simple fix.

The point is, that we really want proper justifications for changes like
this. Some 'may, could and more handwaving' simply does not cut it.

So if you can just describe a realistic scenario, which does not involve
thoughtless flipping of BIOS options, then this becomes way more palatable.

I indeed don't see a realistic scenario in a product env needing to use nohpet. My only justification is now that we have nohpet as kernel parameter, we should fix the softlockup in large machines for enterprise use.

I just think of a simple fix as below. I think it will work for both hpet
and pmtmr. We will test it when the env is available.

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1353,6 +1353,7 @@ static int change_clocksource(void *data)

ÂÂÂÂÂÂÂÂ write_seqcount_end(&tk_core.seq);
ÂÂÂÂÂÂÂÂ raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+ÂÂÂÂÂÂ tick_clock_notify();

ÂÂÂÂÂÂÂÂ return 0;
 }
@@ -1371,7 +1372,6 @@ int timekeeping_notify(struct clocksource *clock)
ÂÂÂÂÂÂÂÂ if (tk->tkr_mono.clock == clock)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;
ÂÂÂÂÂÂÂÂ stop_machine(change_clocksource, clock, NULL);
-ÂÂÂÂÂÂ tick_clock_notify();
ÂÂÂÂÂÂÂÂ return tk->tkr_mono.clock == clock ? 0 : -1;
 }

This won't resolve the concurrency issues of HPET or PMTIMER in any
way.

Just got chance to test and Kin confirmed it fix the softlockup of PMTMR(with nohpet) and HPET(without nohpet, revert previous hpet commit) at bootup stage.

My understandig is, at bootup stage tick device is firstly initialized in periodic mode and then switch to one-shot mode. In periodic mode clock event interrupt is triggered every 1ms(HZ=1000), contention in HPET or PMTIMER exceeds 1ms and delayed the clock interrupt. Then CPUs continue to process interrupt one by one without a break, tick_clock_notify() have no chance to be called and we never switch to one-shot mode.

In one-shot mode, the contention is still there but next event is always set with a future value. We may missed some ticks, but the timer code is smart enough to pick up those missed ticks.

By moving tick_clock_notify() in stop_machine, kernel changes to one-shot mode early before the contention accumulate and lockup system.

Instead it breaks the careful orchestrated mechanism of clocksource
change.

Sorry, I didn't get a idea how it breaks, tick_clock_notify() is a simple function setting bitmask in percpu variable. Could you explain a bit?

Hi Thomas,

May I have your further comments? I think applying a simple patch to fix both hpet and pmtmr softlockup is better?

Thanks
Zhenzhong