Re: 6.12-rc1: Lockdep regression bissected (virtio-net/console/scheduler)

From: Pavel Begunkov
Date: Wed Oct 09 2024 - 12:07:15 EST


On 10/8/24 16:18, John Ogness wrote:
On 2024-10-04, Petr Mladek <pmladek@xxxxxxxx> wrote:
On Fri 2024-10-04 02:08:52, Breno Leitao wrote:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.12.0-rc1-kbuilder-virtme-00033-gd4ac164bde7a #50 Not tainted
-----------------------------------------------------
swapper/0/1 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ff1100010a260518 (_xmit_ETHER#2){+.-.}-{2:2}, at: virtnet_poll_tx (./include/linux/netdevice.h:4361 drivers/net/virtio_net.c:2969)

and this task is already holding:
ffffffff86f2b5b8 (target_list_lock){....}-{2:2}, at: write_ext_msg (drivers/net/netconsole.c:?)
which would create a new lock dependency:
(target_list_lock){....}-{2:2} -> (_xmit_ETHER#2){+.-.}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
(console_owner){-...}-{0:0}

...

to a HARDIRQ-irq-unsafe lock:
(_xmit_ETHER#2){+.-.}-{2:2}

...

other info that might help us debug this:

Chain exists of:
console_owner --> target_list_lock --> _xmit_ETHER#2

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(_xmit_ETHER#2);
local_irq_disable();
lock(console_owner);
lock(target_list_lock);
<Interrupt>
lock(console_owner);

I can trigger this lockdep splat on v6.11 as well.

It only requires a printk() call within any interrupt handler, sometime
after the netconsole is initialized and has had at least one run from
softirq context.

My understanding is that the fix is to always take "_xmit_ETHER#2"
lock with interrupts disabled.

That seems to be one possible solution. But maybe there is reasoning why
that should not be done. (??) Right now it is clearly a spinlock that is

It's expensive, and it's a hot path if I understand correctly which
lock that is. And, IIRC the driver might spend there some time, it's
always nicer to keep irqs enabled if possible.

being taken from both interrupt and softirq contexts and does not
disable interrupts.

It rather seems the xmit lock is bh protected, but printk is a one
off case taking it with irqs disabled. I wonder if the printk side
could help with that, e.g. offloading sending from hardirq to softirq?

I will check if there is some previous kernel release where this problem
does not exist.

--
Pavel Begunkov