Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl

From: Petr Mladek
Date: Wed Jul 24 2024 - 11:08:37 EST

Next message: Peter Schneider: "Re: [PATCH 6.10 00/11] 6.10.1-rc2 review"
Previous message: Alexander Duyck: "Re: [RFC v11 09/14] mm: page_frag: use __alloc_pages() to replace alloc_pages_node()"
In reply to: John Ogness: "Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl"
Next in thread: Rik van Riel: "Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed 2024-07-24 16:51:46, John Ogness wrote:
> On 2024-07-24, Petr Mladek <pmladek@xxxxxxxx> wrote:
> > My quess is that it looked like:
> >
> > CPU A CPU B
> >
> > printk()
> > console_try_lock_spinning()
> > console_unlock()
> > console_emit_next_record()
> > console_lock_spinning_enable();
> > con->write()
> > spin_lock(port->lock);
> >
> > printk_cpu_sync_get()
> > printk()
> > console_try_lock_spinning()
> > # spinning and wating for CPU B
> >
> > NMI:
> >
> > printk_cpu_sync_get()
> > # waiting for CPU A
> >
> > => DEADLOCK
> >
> >
> > The deadlock is caused under/by printk_cpu_sync_get() but only because
> > console_try_lock_spinning() is blocked. It is not a true "try_lock"
> > operation which should never get blocked.
> >
> > => The above patch should solve the problem as well. It will cause
> > that console_try_lock_spinning() would fail immediately on CPU A.
> >
> > Note that port->lock can't cause any deadlock in this scenario.
> > console_try_lock_spinning() will always fail on CPU A until
> > the NMI gets handled on CPU B.
> >
> > By other words, printk_cpu_sync_get() will behave as a tail lock
> > on CPU A because of the failing trylock.
>
> But only in _this_ scenario. The port lock could be taken by CPU B for
> non-console-printing reasons. Then you still have deadlock, due to
> spinning on the port lock.

I see. I agree that deferring printk on that CPU [0] is the right solution.

> [0] https://lore.kernel.org/lkml/87plrcqyii.fsf@xxxxxxxxxxxxxxxxxxxxx

Best Regards,
Petr

Next message: Peter Schneider: "Re: [PATCH 6.10 00/11] 6.10.1-rc2 review"
Previous message: Alexander Duyck: "Re: [RFC v11 09/14] mm: page_frag: use __alloc_pages() to replace alloc_pages_node()"
In reply to: John Ogness: "Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl"
Next in thread: Rik van Riel: "Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]