Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

From: Petr Mladek
Date: Wed Feb 07 2024 - 08:04:45 EST


On Tue 2024-02-06 11:51:50, John Ogness wrote:
> On 2024-02-06, Petr Mladek <pmladek@xxxxxxxx> wrote:
> > I have just got an idea how to make printk_cpu_sync_get_irqsave()
> > less error prone for deadlock on the panic() CPU. The idea is
> > to ignore the lock or give up locking after a timeout on
> > the panic CPU.
>
> This idea is out of scope for this series. But it is something we should
> think about. The issue has always been a possible problem in panic().
>
> > AFAIK, the lock is currently used only to serialize related
> > printk() calls. The only risk is that some messages might be
> > interleaved when it is ignored.
> >
> > I am not sure if this is a good idea though. It might create
> > another risk when the lock gets used to serialize more
> > things in the future and a race might create a real problem.
>
> With the printk series we are currently working on [0], only the panic
> CPU can store new printk messages anyway. So there would be nothing to
> synchronize against (and it could be safely ignored).

Right.

> kgdb uses the same technique to quiesce the CPUs. It does not use the
> printk_cpu_sync for this, but it is an example of a possible future
> usage not related to printk.
>
> My vote is to make it a NOP for the panic CPU and then keep an eye on
> any future uses.
Sounds good.

> Should I add this to v4 of [0]?

Let's not complicate this series any more. It is almost ready ;-)
We could do it by a separate patch in top of it or in another
patchset.

>
> [0] https://lore.kernel.org/lkml/20231214214201.499426-1-john.ogness@xxxxxxxxxxxxx

Best Regards,
Petr