[PATCH v9 0/1] PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect sriov_add_vfs/sriov_del_vfs
From: Ionut Nechita (Wind River)
Date: Tue Mar 10 2026 - 03:44:51 EST
From: Ionut Nechita <ionut.nechita@xxxxxxxxxxxxx>
Hi Bjorn,
This is v9 of the fix for the SR-IOV race between driver .remove()
and concurrent hotplug events (particularly on s390).
This race has been independently observed by multiple organizations:
- IBM (s390 platform-generated hot-unplug events racing with
sriov_del_vfs during PF driver unload)
- NVIDIA (tested by Dragos Tatulea in earlier versions)
- Intel (xe driver hitting lockdep warnings and deadlocks when
calling pci_disable_sriov from .remove, as reported and discussed
in https://lore.kernel.org/all/20260227214048.12649-1-michal.wajdeczko@xxxxxxxxx/)
- Wind River (original reporter and patch author)
Changes since v8 (Mar 9):
- Added Reviewed-by from Niklas Schnelle (IBM) and Tested-by (s390)
- Added Fixes tags for commits 05703271c3cd ("PCI/IOV: Add PCI
rescan-remove locking when enabling/disabling SR-IOV") and
a5338e365c45 ("PCI/IOV: Fix race between SR-IOV enable/disable
and hotplug"), as suggested by Niklas Schnelle
- Removed the rescan/remove locking from sriov_numvfs_store() that
was introduced by commit a5338e365c45, since the locking is now
handled directly in sriov_add_vfs() and sriov_del_vfs() where it
is actually needed, reducing the lock scope (suggested by Niklas
Schnelle)
- Rebased on linux-next (20260309)
Changes since v7 (Mar 8):
- Added Reviewed-by and Tested-by from Benjamin Block (IBM), who
ran tests in the IBM s390 test lab
- Rebased on linux-next (20260309)
- No code changes from v7
Changes since v6 (Mar 6):
- Replaced local pci_rescan_remove_owner / pci_rescan_remove_count
variables with mutex_get_owner() for owner checking and a single
pci_rescan_remove_reentrant_count depth counter, as tested and
suggested by Benjamin Block
- Dropped Reviewed-by and Tested-by tags per Benjamin Block's
feedback, since the implementation changed substantially between
the reviewed version and the current one
- Added Suggested-by for Benjamin Block
- Rebased on linux-next (20260306)
Changes since v5 (Mar 3):
- Reworked based on Lukas Wunner's suggestion: instead of introducing
separate pci_lock_rescan_remove_reentrant() /
pci_unlock_rescan_remove_reentrant() helpers, make the existing
pci_lock_rescan_remove() / pci_unlock_rescan_remove() themselves
reentrant using owner tracking and a depth counter
- No new API: callers simply use pci_lock/unlock_rescan_remove()
without needing to track any return value
- No changes to include/linux/pci.h
- Rebased on linux-next (20260306)
Changes since v4 (Feb 28):
- Replaced local pci_rescan_remove_owner variable with
mutex_get_owner() to check lock ownership, as suggested by
Manivannan Sadhasivam and agreed by Benjamin Block
- Removed owner tracking from pci_lock_rescan_remove() and
pci_unlock_rescan_remove() - they are now unchanged from upstream
- Rebased on linux-next (20260302)
Changes since v3 (Feb 25):
- Rebased on linux-next (next-20260227)
- Declared pci_rescan_remove_owner as const pointer
(const struct task_struct *) to make clear it is not meant to
modify the task (Benjamin Block)
- Added Reviewed-by and Tested-by from Benjamin Block (IBM)
Changes since v2 (Feb 19):
- Rebased on linux-next (next-20260225)
- Added Tested-by from Dragos Tatulea (NVIDIA)
- No code changes from v2
Changes since v1 (Feb 14):
- Renamed from pci_lock_rescan_remove_nested() to
pci_lock_rescan_remove_reentrant() to avoid confusion with
mutex_lock_nested() lockdep annotations (Benjamin Block)
- Added pci_unlock_rescan_remove_reentrant(const bool locked) helper
to avoid open-coding conditional unlock at each call site
(Benjamin Block)
- Moved declarations from drivers/pci/pci.h to include/linux/pci.h
alongside existing lock/unlock declarations (Benjamin Block)
- Simplified callers: removed negation of return value and manual
conditional unlock in favor of the paired lock/unlock helpers
The problem: on s390, platform-generated hot-unplug events for VFs
can race with sriov_del_vfs() when a PF driver is being unloaded.
The platform event handler takes pci_rescan_remove_lock, but
sriov_del_vfs() does not, leading to double removal and list
corruption. We cannot use a plain mutex_lock() because
sriov_del_vfs() may be called from paths that already hold the
lock (deadlock), and mutex_trylock() cannot distinguish self from
other holders.
The same class of problem has been observed on Intel xe, where
pci_disable_sriov() is called from the driver's .remove() callback
without pci_rescan_remove_lock, but .remove() may itself be called
from a path that already holds the lock (e.g. remove_store ->
pci_stop_and_remove_bus_device_locked), leading to lockdep warnings
and potential deadlocks.
The fix makes pci_lock_rescan_remove() reentrant using
mutex_get_owner() and a depth counter: if the current task already
holds the lock, the counter is incremented;
pci_unlock_rescan_remove() decrements the counter and only releases
the mutex when it reaches zero. This keeps the existing API unchanged
while providing correct serialization.
The rescan/remove locking in sriov_numvfs_store() (from commit
a5338e365c45) is removed since the locking is now handled directly
in sriov_add_vfs() and sriov_del_vfs(), reducing the lock scope.
Link: https://lore.kernel.org/linux-pci/20260214193235.262219-3-ionut.nechita@xxxxxxxxxxxxx/ [v1]
Link: https://lore.kernel.org/linux-pci/20260219212648.82606-1-ionut.nechita@xxxxxxxxxxxxx/ [v2]
Link: https://lore.kernel.org/linux-pci/20260225202434.18737-1-ionut.nechita@xxxxxxxxxxxxx/ [v3]
Link: https://lore.kernel.org/linux-pci/20260228120138.51197-2-ionut.nechita@xxxxxxxxxxxxx/ [v4]
Link: https://lore.kernel.org/linux-pci/20260303080903.28693-1-ionut.nechita@xxxxxxxxxxxxx/ [v5]
Link: https://lore.kernel.org/linux-pci/20260306082108.17322-1-ionut.nechita@xxxxxxxxxxxxx/ [v6]
Link: https://lore.kernel.org/linux-pci/20260308135352.80346-1-ionut.nechita@xxxxxxxxxxxxx/ [v7]
Link: https://lore.kernel.org/linux-pci/20260309194920.16459-1-ionut.nechita@xxxxxxxxxxxxx/ [v8]
Ionut Nechita (Wind River) (1):
PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect
sriov_add_vfs/sriov_del_vfs
drivers/pci/iov.c | 9 +++++----
drivers/pci/probe.c | 11 +++++++++--
2 files changed, 14 insertions(+), 6 deletions(-)
base-commit: c8be6ef92d9bc54f012627375b87b44d3eefe451
--
2.53.0