Re: [PATCH v3 0/2] PCI/IOV: Fix deadlock when removing PF with enabled SR-IOV
From: Benjamin Block
Date: Mon Feb 23 2026 - 12:34:04 EST
On Mon, Feb 23, 2026 at 03:10:35PM +0100, Dragos Tatulea wrote:
> After pulling in these commits in our internal tree we can see the
> lockdep splat from below in many internal tests. We are still trying to
> find an easy repro for this. We had to internally revert both of them.
>
> I noticed some similar discussion in another thread [1] but there it
> seems that these changes are actually fixing the issue which is not
> the case for us.
>
> ------------[ cut here ]------------
> WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x39/0x40, CPU#2: modprobe/12956
> Modules linked in: mlx5_core(-) act_tunnel_key vxlan dummy act_mirred act_gact cls_flower act_police act_ct nf_flow_table [...]
> CPU: 2 UID: 0 PID: 12956 Comm: modprobe Not tainted 6.19.0net_next_e834b5e #1 PREEMPT
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:pci_stop_and_remove_bus_device+0x39/0x40
> Code: [...]
> RSP: 0018:ffff888164c9fd10 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff888188ff2000 RCX: 0000000000000001
> RDX: 0000000000000046 RSI: ffffffff8307e068 RDI: ffff88816bf4c9c0
> RBP: ffff888188ff2000 R08: 00000000000000f4 R09: ffff88816bf4c080
> R10: 0000000000000001 R11: 0000000000000003 R12: 0000000000000000
> R13: ffff888164c9fd27 R14: 0000000000000002 R15: 0000000000000000
> FS: 00007f52364bd740(0000) GS:ffff8885a9019000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005622dbf749d8 CR3: 0000000169132004 CR4: 0000000000372eb0
> Call Trace:
> <TASK>
> pci_iov_remove_virtfn+0xbd/0x120
> sriov_disable+0x30/0xe0
> mlx5_sriov_disable+0x50/0xa0 [mlx5_core]
> remove_one+0x68/0xe0 [mlx5_core]
> pci_device_remove+0x39/0xa0
> device_release_driver_internal+0x1e4/0x240
> driver_detach+0x47/0x90
> bus_remove_driver+0x84/0x110
> pci_unregister_driver+0x3b/0x90
This looks pretty much like what Ionut is trying to fix in
v1: https://lore.kernel.org/linux-pci/20260214193235.262219-3-ionut.nechita@xxxxxxxxxxxxx/T/
v2: https://lore.kernel.org/linux-pci/20260219212648.82606-1-ionut.nechita@xxxxxxxxxxxxx/T/
Maybe try giving those patches a spin. I think one easy way to hit this sort
of thing is to try unbinding a PF that has 1 or more VFs attached to it from
some device driver. The "trick" is that SR-IOV has to be active.
> mlx5_cleanup+0x13/0x40 [mlx5_core]
> __x64_sys_delete_module+0x16f/0x290
> ? kmem_cache_free+0x221/0x520
> do_syscall_64+0xa8/0x13f0
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f5235f2c3fb
> Code: [...]
> RSP: 002b:00007ffc6ba11518 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> RAX: ffffffffffffffda RBX: 00005558c4278f30 RCX: 00007f5235f2c3fb
> RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005558c4278f98
> RBP: 00007ffc6ba11540 R08: 1999999999999999 R09: 0000000000000000
> R10: 00007f5235fa5fe0 R11: 0000000000000206 R12: 0000000000000000
> R13: 00007ffc6ba11570 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> irq event stamp: 44859
> hardirqs last enabled at (44869): [<ffffffff814af7ca>] __up_console_sem+0x5a/0x70
> hardirqs last disabled at (44878): [<ffffffff814af7af>] __up_console_sem+0x3f/0x70
> softirqs last enabled at (44844): [<ffffffff81430312>] irq_exit_rcu+0x82/0xe0
> softirqs last disabled at (44821): [<ffffffff81430312>] irq_exit_rcu+0x82/0xe0
> ---[ end trace 0000000000000000 ]---
>
> [1] https://lore.kernel.org/all/20260222112904.171858-1-ionut.nechita@xxxxxxxxxxxxx/
--
Best Regards, Benjamin Block / Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt / Geschäftsführung: David Faller
Sitz der Ges.: Ehningen / Registergericht: AmtsG Stuttgart, HRB 243294