Re: [PATCH 3/4] watchdog/hardlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

From: John Ogness
Date: Fri Dec 22 2023 - 04:43:21 EST


On 2023-12-20, Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
> The interleaving problem was less bad with the "buddy" hardlockup
> detector. With "buddy" we always end up calling
> `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running
> one. trigger_single_cpu_backtrace() always at least serializes the
> individual stack crawls because it eventually uses
> printk_cpu_sync_get_irqsave(). Unfortunately the fact that
> trigger_single_cpu_backtrace() eventually calls
> printk_cpu_sync_get_irqsave() (on a different CPU) means that we have
> to drop the "lock" before calling it and we can't fully serialize all
> printouts associated with a given hardlockup.

I think that is good enough. Otherwise there would need to be some kind
of CPU handshaking to ensure things are synchronized correctly in case
multiple CPUs have triggered the situation.

> However, we still do get
> the advantage of serializing the output of print_modules() and
> print_irqtrace_events().
>
> Aside from serializing hardlockups from each other, this change also
> has the advantage of serializing hardlockups and softlockups from each
> other if they happen to happen at the same time since they are both
> using the same "lock".
>
> Even though nobody is expected to hang while holding the lock
> associated with printk_cpu_sync_get_irqsave(), out of an abundance of
> caution, we don't call printk_cpu_sync_get_irqsave() until after we
> print out about the hardlockup. This makes extra sure that, even if
> printk_cpu_sync_get_irqsave() somehow never runs we at least print
> that we saw the hardlockup.

I agree with calling printk() before trying to acquire ownership of the
cpu_sync.

> This is different than the choice made for
> softlockup because hardlockup is really our last resort.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>

Reviewed-by: John Ogness <john.ogness@xxxxxxxxxxxxx>