Re: Deadlock?: console_waiter/serial8250_ports/low_water_lock with 6.12-rc
From: John Ogness
Date: Tue Oct 29 2024 - 06:22:13 EST
On 2024-10-28, Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> I think the cause of the issue is:
>
> CPU X CPU Y
> ===== =====
> uart_write(): console_unlock(): // console lock is held by Y.
> uart_port_lock(); __console_flush_and_unlock():
> __uart_start(): __console_flush_all():
> pm_runtime_get(): console_emit_next_record():
> __pm_runtime_resume(): con->write(); <- serial8250_console_write() // will try to acquire uart_port_lock();
> spin_lock_irqsave(&dev->power.lock, flags):
> <this triggers the lockdep splats, probably because
> PM has done some print under "&dev->power.lock">
> lock_acquire():
> printk():
It is a known problem that calling printk() while holding the
uart_port_lock for non-printing purposes (such as pm) will deadlock the
system. You don't even need CPU-Y to be involved. CPU-X will deadlock
itself after acquiring the console_lock.
One possible solution would be to enable deferred_printk if the
uart_port_lock of a console is taken for non-printing purposes. The
correct solution is to convert the console driver to the new nbcon
model.
The reasons why nbcon avoids this issue:
1. It does not use the BKL-like console lock.
2. It is aware that something else is using the driver and will instead
just write to the lockless ringbuffer rather than endlessly spinning on
the lock (that it itself is already holding).
@jstultz: Is it possible that you could run your tests using the latest
version [0] of the proposed nbcon-based 8250 driver? This will not have
the issue and should cleanly apply to any recent kernel.
John Ogness
[0] https://lore.kernel.org/lkml/20241025105728.602310-1-john.ogness@xxxxxxxxxxxxx