Re: [PATCH v2 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model

From: Petr Mladek
Date: Fri Mar 18 2022 - 06:42:34 EST


On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote:
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().

The above is clear.

> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.
>
> The async model is achieved by expanding watchdog_nmi_probe() with
> -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.

These two paragraphs are a bit confusing to me. It might be just a
problem with translation. I am not a native speaker. Anyway, I wonder
if the following is more clear:

<proposal>
But it is easy to take an opposite approach and try to initialize
the watchdog once again later.

The delayed probe is called using workqueues. It need to allocate
memory and must be proceed in a normal context.

The delayed probe is queued only when the early one returns -EBUSY.
It is the return code returned when PMU is not ready yet.
</proposal>

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -103,7 +103,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
> hardlockup_detector_perf_disable();
> }
>
> -/* Return 0, if a NMI watchdog is available. Error code otherwise */
> +/*
> + * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not
> + * ready, and arch code should wake up hld_detector_wait when ready. Other
> + * negative value if not support.
> + */

I wonder if the following is slightly more clear:

/*
* Arch specific API.
*
* Return 0 when NMI watchdog is available, negative value otherwise.
* The error code -EBUSY is special. It means that a deferred probe
* might succeed later.
*/

> int __weak __init watchdog_nmi_probe(void)
> {
> return hardlockup_detector_perf_init();
> @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void)
> #define watchdog_sysctl_init() do { } while (0)
> #endif /* CONFIG_SYSCTL */
>
> +static void lockup_detector_delay_init(struct work_struct *work);
> +bool lockup_detector_pending_init __initdata;
> +
> +struct wait_queue_head hld_detector_wait __initdata =
> + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> +
> +static struct work_struct detector_work __initdata =
> + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> +
> +static void __init lockup_detector_delay_init(struct work_struct *work)
> +{
> + int ret;
> +
> + wait_event(hld_detector_wait,
> + lockup_detector_pending_init == false);
> +
> + /*
> + * Here, we know the PMU should be ready, so set pending to true to
> + * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again.
> + */
> + lockup_detector_pending_init = true;

This does not make sense to me. We are here only when:

1. lockup_detector_init() queued this work.

2. Someone cleared @lockup_detector_pending_init and woke the
worker via wait_queue. IT might be either PMU init code
or the late lockup_detector_check().

watchdog_nmi_probe() might still return -EBUSY when PMU init failed.

If you wanted to try the delayed probe once again (3rd attempt) from
lockup_detector_check(), you would need to queue the work once again.
But you need to be sure that lockup_detector_check() was not called
yet. Otherwise, the 2nd work might wait forewer.

IMHO, it is not worth the complexity.

> + ret = watchdog_nmi_probe();
> + if (ret) {
> + pr_info("Delayed init of the lockup detector failed: %d\n", ret);
> + pr_info("Perf NMI watchdog permanently disabled\n");
> + return;
> + }
> +
> + nmi_watchdog_available = true;
> + lockup_detector_setup();
> + lockup_detector_pending_init = false;
> +}

Otherwise, it looks good to me.

Best Regards,
Petr