On 2024-10-04, Petr Mladek <pmladek@xxxxxxxx> wrote:
On Fri 2024-10-04 02:08:52, Breno Leitao wrote:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.12.0-rc1-kbuilder-virtme-00033-gd4ac164bde7a #50 Not tainted
-----------------------------------------------------
swapper/0/1 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ff1100010a260518 (_xmit_ETHER#2){+.-.}-{2:2}, at: virtnet_poll_tx (./include/linux/netdevice.h:4361 drivers/net/virtio_net.c:2969)
and this task is already holding:
ffffffff86f2b5b8 (target_list_lock){....}-{2:2}, at: write_ext_msg (drivers/net/netconsole.c:?)
which would create a new lock dependency:
(target_list_lock){....}-{2:2} -> (_xmit_ETHER#2){+.-.}-{2:2}
but this new dependency connects a HARDIRQ-irq-safe lock:
(console_owner){-...}-{0:0}
...
to a HARDIRQ-irq-unsafe lock:
(_xmit_ETHER#2){+.-.}-{2:2}
...
other info that might help us debug this:
Chain exists of:
console_owner --> target_list_lock --> _xmit_ETHER#2
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(_xmit_ETHER#2);
local_irq_disable();
lock(console_owner);
lock(target_list_lock);
<Interrupt>
lock(console_owner);
I can trigger this lockdep splat on v6.11 as well.
It only requires a printk() call within any interrupt handler, sometime
after the netconsole is initialized and has had at least one run from
softirq context.
My understanding is that the fix is to always take "_xmit_ETHER#2"
lock with interrupts disabled.
That seems to be one possible solution. But maybe there is reasoning why
that should not be done. (??) Right now it is clearly a spinlock that is
being taken from both interrupt and softirq contexts and does not
disable interrupts.
I will check if there is some previous kernel release where this problem
does not exist.