Re: [PATCHv2 3/4] kernel/watchdog: adapt the watchdog_hld interface for async model

From: Pingfan Liu
Date: Fri Oct 08 2021 - 01:54:17 EST


On Tue, Oct 05, 2021 at 09:03:17AM +0200, Petr Mladek wrote:
[...]
> > +static void lockup_detector_delay_init(struct work_struct *work);
> > +bool hld_detector_delay_initialized __initdata;
> > +
> > +struct wait_queue_head hld_detector_wait __initdata =
> > + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > +
> > +static struct work_struct detector_work __initdata =
> > + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > +
> > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > +{
> > + int ret;
> > +
> > + wait_event(hld_detector_wait, hld_detector_delay_initialized);
> > + ret = watchdog_nmi_probe();
> > + if (!ret) {
> > + nmi_watchdog_available = true;
> > + lockup_detector_setup();
>
> Is it really safe to call the entire lockup_detector_setup()
> later?
>
> It manipulates also softlockup detector. And more importantly,
> the original call is before smp_init(). It means that it was
> running when only single CPU was on.
>
For the race analysis, lockup_detector_reconfigure() is on the centre stage.
Since proc_watchdog_update() can call lockup_detector_reconfigure() to
re-initialize both soft and hard lockup detector, so the race issue
should be already taken into consideration.

> It seems that x86 has some problem with hardlockup detector as
> well. It later manipulates only the hardlockup detector. Also it uses
> cpus_read_lock() to prevent races with CPU hotplug, see
> fixup_ht_bug().
>
Yes. But hardlockup_detector_perf_{stop,start}() can not meet the
requirement, since no perf_event is created yet. So there is no handy
interface to re-initialize hardlockup detector directly.

>
> > + } else {
> > + WARN_ON(ret == -EBUSY);
> > + pr_info("Perf NMI watchdog permanently disabled\n");
> > + }
> > +}
> > +
> > void __init lockup_detector_init(void)
> > {
> > + int ret;
> > +
> > if (tick_nohz_full_enabled())
> > pr_info("Disabling watchdog on nohz_full cores by default\n");
> >
> > cpumask_copy(&watchdog_cpumask,
> > housekeeping_cpumask(HK_FLAG_TIMER));
> >
> > - if (!watchdog_nmi_probe())
> > + ret = watchdog_nmi_probe();
> > + if (!ret)
> > nmi_watchdog_available = true;
> > + else if (ret == -EBUSY)
> > + queue_work_on(smp_processor_id(), system_wq, &detector_work);
>
> IMHO, this is not acceptable. It will block one worker until someone
> wakes it. Only arm64 will have a code to wake up the work and only
> when pmu is successfully initialized. In all other cases, the worker
> will stay blocked forever.
>
What about consider -EBUSY and hld_detector_delay_initialized as a unit?
If watchdog_nmi_probe() returns -EBUSY, then
set the state of ld_detector_delay_initialized as "waiting", and then moved to state "finished".

And at the end of do_initcalls(), check the state is "finished". If not,
then throw a warning and wake up the worker.

> The right solution is to do it the other way. Queue the work
> from arm64-specific code when armv8_pmu_driver_init() succeeded.
>
Could it be better if watchdog can provide a common framework for future
extension instead of arch specific? The 2nd argument is to avoid the
message "Perf NMI watchdog permanently disabled" while later enabling
it. (Please see
lockup_detector_init()->watchdog_nmi_probe()->hardlockup_detector_perf_init(),
but if providing arch specific probe method, it can be avoided)

> Also I suggest to flush the work to make sure that it is finished
> before __init code gets removed.
>
Good point, and very interesting. I will look into it.

>
> The open question is what code the work will call. As mentioned
> above, I am not sure that lockup_detector_delay_init() is safe.
> IMHO, we need to manipulate only hardlockup detector and
> we have to serialize it against CPU hotplug.
>
As explained ahead, it has already consider the race against CPU
hotplug.

Thanks,

Pingfan