Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM

From: Laurent Dufour
Date: Wed Jul 13 2022 - 10:23:25 EST


Le 12/07/2022 à 11:47, Laurent Dufour a écrit :
> Le 12/07/2022 à 03:46, Nicholas Piggin a écrit :
>> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>>> During a LPM, while the memory transfer is in progress on the arrival side,
>>> some latencies is generated when accessing not yet transferred pages on the
>>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
>>> leading to a kernel panic.
>>>
>>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>>> strong work around, some users would want this timeout to be eventually
>>> triggered if the system is hanging even during LPM.
>>>
>>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
>>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>> watchdog_tresh + factor%
>>>
>>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
>>> Once the memory transfer is achieved, the factor is reset to 0.
>>>
>>> Setting this value to a high number is like disabling the NMI watchdog
>>> during a LPM.
>>>
>>> Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxx>
>>> ---
>>> Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>>> arch/powerpc/platforms/pseries/mobility.c | 43 +++++++++++++++++++++
>>> 2 files changed, 55 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>>> index ddccd1077462..0bb0b7f27e96 100644
>>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>> Documentation/admin-guide/kernel-parameters.rst).
>>>
>>>
>>> +nmi_watchdog_factor (PPC only)
>>> +==================================
>>> +
>>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
>>> +set to 1). This factor represents the percentage added to
>>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
>>> +LPM. The soft lockup timeout is not impacted.
>>
>> Could "LPM" or "mobility" be a bit more prominent in the parameter name
>> and documentation? Something else might want to add a factor as well,
>> one day.
>
> In the V2 version, Nathan suggested "making the user-visible
> name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
> apply this to other contexts in the future."
>
> So I made the change to a more generic name. I think this is a good option
> since the documentation is explicit about the LPM particular case.
> If in the future this factor needs to apply during an other operation that
> name will be generic enough.
>
> Do you agree ?

Nick and I discussed that.
Nick prefers to have LPM in the tunable names, and thinks we can add a new
tunable if a separate user came up which required it.

We agree that 'nmi_wd_lpm_factor' is a good name.
I'll send a v5 updating that name.

>>
>> Otherwise the code looks okay.
>>
>> Reviewed-by: Nicholas Piggin <npiggin@xxxxxxxxx>
>>
>>> +
>>> +A value of 0 means no change. The default value is 200 meaning the NMI
>>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>>> +
>>> +
>>> numa_balancing
>>> ==============
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index 907a779074d6..649155faafc2 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -48,6 +48,39 @@ struct update_props_workarea {
>>> #define MIGRATION_SCOPE (1)
>>> #define PRRN_SCOPE -2
>>>
>>> +#ifdef CONFIG_PPC_WATCHDOG
>>> +static unsigned int nmi_wd_factor = 200;
>>> +
>>> +#ifdef CONFIG_SYSCTL
>>> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
>>> + {
>>> + .procname = "nmi_watchdog_factor",
>>> + .data = &nmi_wd_factor,
>>> + .maxlen = sizeof(int),
>>> + .mode = 0644,
>>> + .proc_handler = proc_douintvec_minmax,
>>> + },
>>> + {}
>>> +};
>>> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
>>> + {
>>> + .procname = "kernel",
>>> + .mode = 0555,
>>> + .child = nmi_wd_factor_ctl_table,
>>> + },
>>> + {}
>>> +};
>>> +
>>> +static int __init register_nmi_wd_factor_sysctl(void)
>>> +{
>>> + register_sysctl_table(nmi_wd_factor_sysctl_root);
>>> +
>>> + return 0;
>>> +}
>>> +device_initcall(register_nmi_wd_factor_sysctl);
>>> +#endif /* CONFIG_SYSCTL */
>>> +#endif /* CONFIG_PPC_WATCHDOG */
>>> +
>>> static int mobility_rtas_call(int token, char *buf, s32 scope)
>>> {
>>> int rc;
>>> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>>> static int pseries_migrate_partition(u64 handle)
>>> {
>>> int ret;
>>> + unsigned int factor = 0;
>>>
>>> +#ifdef CONFIG_PPC_WATCHDOG
>>> + factor = nmi_wd_factor;
>>> +#endif
>>> ret = wait_for_vasi_session_suspending(handle);
>>> if (ret)
>>> return ret;
>>>
>>> vas_migration_handler(VAS_SUSPEND);
>>>
>>> + if (factor)
>>> + watchdog_nmi_set_lpm_factor(factor);
>>> +
>>> ret = pseries_suspend(handle);
>>> if (ret == 0) {
>>> post_mobility_fixup();
>>> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
>>> } else
>>> pseries_cancel_migration(handle, ret);
>>>
>>> + if (factor)
>>> + watchdog_nmi_set_lpm_factor(0);
>>> +
>>> vas_migration_handler(VAS_RESUME);
>>>
>>> return ret;
>>> --
>>> 2.36.1
>>>
>>>
>