Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

From: John Ogness
Date: Tue Feb 06 2024 - 05:46:33 EST


On 2024-02-06, Petr Mladek <pmladek@xxxxxxxx> wrote:
> I have just got an idea how to make printk_cpu_sync_get_irqsave()
> less error prone for deadlock on the panic() CPU. The idea is
> to ignore the lock or give up locking after a timeout on
> the panic CPU.

This idea is out of scope for this series. But it is something we should
think about. The issue has always been a possible problem in panic().

> AFAIK, the lock is currently used only to serialize related
> printk() calls. The only risk is that some messages might be
> interleaved when it is ignored.
>
> I am not sure if this is a good idea though. It might create
> another risk when the lock gets used to serialize more
> things in the future and a race might create a real problem.

With the printk series we are currently working on [0], only the panic
CPU can store new printk messages anyway. So there would be nothing to
synchronize against (and it could be safely ignored).

kgdb uses the same technique to quiesce the CPUs. It does not use the
printk_cpu_sync for this, but it is an example of a possible future
usage not related to printk.

My vote is to make it a NOP for the panic CPU and then keep an eye on
any future uses. Should I add this to v4 of [0]?

John

[0] https://lore.kernel.org/lkml/20231214214201.499426-1-john.ogness@xxxxxxxxxxxxx