Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

From: Nicholas Piggin
Date: Tue May 10 2022 - 07:16:32 EST


Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET hardlockup detector relies on tsc_khz to estimate the value of
> that the TSC will have when its HPET channel fires. A refined tsc_khz
> helps to estimate better the expected TSC value.
>
> Using the early value of tsc_khz may lead to a large error in the expected
> TSC value. Restarting the NMI watchdog detector has the effect of kicking
> its HPET channel and make use of the refined tsc_khz.
>
> When the HPET hardlockup is not in use, restarting the NMI watchdog is
> a noop.
>
> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Cc: "Ravi V. Shankar" <ravi.v.shankar@xxxxxxxxx>
> Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
> Cc: linuxppc-dev@xxxxxxxxxxxxxxxx
> Cc: x86@xxxxxxxxxx
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> ---
> Changes since v5:
> * Introduced this patch
>
> Changes since v4
> * N/A
>
> Changes since v3
> * N/A
>
> Changes since v2:
> * N/A
>
> Changes since v1:
> * N/A
> ---
> arch/x86/kernel/tsc.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cafacb2e58cc..cc1843044d88 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> /* Inform the TSC deadline clockevent devices about the recalibration */
> lapic_update_tsc_freq();
>
> + /*
> + * If in use, the HPET hardlockup detector relies on tsc_khz.
> + * Reconfigure it to make use of the refined tsc_khz.
> + */
> + lockup_detector_reconfigure();

I don't know if the API is conceptually good.

You change something that the lockup detector is currently using,
*while* the detector is running asynchronously, and then reconfigure
it. What happens in the window? If this code is only used for small
adjustments maybe it does not really matter but in principle it's
a bad API to export.

lockup_detector_reconfigure as an internal API is okay because it
reconfigures things while the watchdog is stopped [actually that
looks untrue for soft dog which uses watchdog_thresh in
is_softlockup(), but that should be fixed].

You're the arch so you're allowed to stop the watchdog and configure
it, e.g., hardlockup_detector_perf_stop() is called in arch/.

So you want to disable HPET watchdog if it was enabled, then update
wherever you're using tsc_khz, then re-enable.

Thanks,
Nick