Re: [RFC] Observed lockdep circular dependency in SR-IOV paths
From: Raghavendra Rao Ananta
Date: Wed Apr 08 2026 - 11:54:46 EST
On Mon, Apr 6, 2026 at 9:00 PM Alex Williamson <alex@xxxxxxxxxxx> wrote:
>
> On Mon, Apr 6, 2026, at 4:34 PM, Jason Gunthorpe wrote:
> > On Tue, Mar 31, 2026 at 10:23:06AM -0700, Raghavendra Rao Ananta wrote:
> >> Hi Alex,
> >>
> >> While running the vfio_pci_sriov_uapi_test [1] on a CONFIG_LOCKDEP
> >> enabled kernel (7.0-rc1), we observed the following lockdep circular
> >> locking dependency warning:
> >
> > This loos more like a class issue, the locks used by the VF driver are
> > different than the locks used by the PF driver, and even though the
> > devsets are shared a devset should never have both the PF and VF.
> >
> > Maybe shifting memory_lock into a different lock class for VFs is
> > enough.
> >
> > However, I think it is a bad idea to hold the memory_lock while
> > probing a device, I'd prefer to revisit f4162eb1e2fc ("vfio/pci:
> > Change the PF power state to D0 before enabling VFs") and change it's
> > logic to rely on a private flag instead of pci_num_vf()
> >
> > Then we don't need to hold the lock any longer than just setting the
> > flag.
>
> I agree, that's cleaner, only hold the write-lock on memory_lock in vfio_pci_core_sriov_configure() to bring the device to D0 and to set a flag on the PF vdev. That flag would replace the pci_num_vfs() test in vfio_pci_set_power_state(). The flag is cleared without memory_lock if pci_enable_sriov() fails or after pci_disable_sriov().
>
> It's a nice compartmentalized fix. The lockdep splat is a false positive, but this would eliminate the memory_lock -> work_completion arc of the detection.
Thanks for the suggestion. The pointer to f4162eb1e2fc helped provide
context. Based on the suggestion, I tried this diff and we no longer
see the lockdep warning:
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d43745fe4c843..f857a23c034a4 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -271,7 +271,7 @@ int vfio_pci_set_power_state(struct
vfio_pci_core_device *vdev, pci_power_t stat
int ret;
/* Prevent changing power state for PFs with VFs enabled */
- if (pci_num_vf(pdev) && state > PCI_D0)
+ if (vdev->sriov_pwr_active && state > PCI_D0)
return -EBUSY;
if (vdev->needs_pm_restore) {
@@ -2294,8 +2294,9 @@ int vfio_pci_core_sriov_configure(struct
vfio_pci_core_device *vdev,
down_write(&vdev->memory_lock);
vfio_pci_set_power_state(vdev, PCI_D0);
- ret = pci_enable_sriov(pdev, nr_virtfn);
+ vdev->sriov_pwr_active = true;
up_write(&vdev->memory_lock);
+ ret = pci_enable_sriov(pdev, nr_virtfn);
if (ret) {
pm_runtime_put(&pdev->dev);
goto out_del;
@@ -2309,6 +2310,7 @@ int vfio_pci_core_sriov_configure(struct
vfio_pci_core_device *vdev,
}
out_del:
+ vdev->sriov_pwr_active = false;
mutex_lock(&vfio_pci_sriov_pfs_mutex);
list_del_init(&vdev->sriov_pfs_item);
out_unlock:
My only concern was clearing 'sriov_pwr_active' after disabling sriov,
since the PF would still remain in D0. But from the f4162eb1e2fc 's
commit description, it seems like that was intentional?
Thank you.
Raghavendra