Re: [RFC] Observed lockdep circular dependency in SR-IOV paths
From: Alex Williamson
Date: Fri Apr 10 2026 - 12:51:26 EST
On Wed, 8 Apr 2026 08:46:30 -0700
Raghavendra Rao Ananta <rananta@xxxxxxxxxx> wrote:
> On Mon, Apr 6, 2026 at 9:00 PM Alex Williamson <alex@xxxxxxxxxxx> wrote:
> >
> > On Mon, Apr 6, 2026, at 4:34 PM, Jason Gunthorpe wrote:
> > > On Tue, Mar 31, 2026 at 10:23:06AM -0700, Raghavendra Rao Ananta wrote:
> > >> Hi Alex,
> > >>
> > >> While running the vfio_pci_sriov_uapi_test [1] on a CONFIG_LOCKDEP
> > >> enabled kernel (7.0-rc1), we observed the following lockdep circular
> > >> locking dependency warning:
> > >
> > > This loos more like a class issue, the locks used by the VF driver are
> > > different than the locks used by the PF driver, and even though the
> > > devsets are shared a devset should never have both the PF and VF.
> > >
> > > Maybe shifting memory_lock into a different lock class for VFs is
> > > enough.
> > >
> > > However, I think it is a bad idea to hold the memory_lock while
> > > probing a device, I'd prefer to revisit f4162eb1e2fc ("vfio/pci:
> > > Change the PF power state to D0 before enabling VFs") and change it's
> > > logic to rely on a private flag instead of pci_num_vf()
> > >
> > > Then we don't need to hold the lock any longer than just setting the
> > > flag.
> >
> > I agree, that's cleaner, only hold the write-lock on memory_lock in
> > vfio_pci_core_sriov_configure() to bring the device to D0 and to
> > set a flag on the PF vdev. That flag would replace the
> > pci_num_vfs() test in vfio_pci_set_power_state(). The flag is
> > cleared without memory_lock if pci_enable_sriov() fails or after
> > pci_disable_sriov().
> >
> > It's a nice compartmentalized fix. The lockdep splat is a false
> > positive, but this would eliminate the memory_lock ->
> > work_completion arc of the detection.
>
> Thanks for the suggestion. The pointer to f4162eb1e2fc helped provide
> context. Based on the suggestion, I tried this diff and we no longer
> see the lockdep warning:
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c
> b/drivers/vfio/pci/vfio_pci_core.c index d43745fe4c843..f857a23c034a4
> 100644 --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -271,7 +271,7 @@ int vfio_pci_set_power_state(struct
> vfio_pci_core_device *vdev, pci_power_t stat
> int ret;
>
> /* Prevent changing power state for PFs with VFs enabled */
> - if (pci_num_vf(pdev) && state > PCI_D0)
> + if (vdev->sriov_pwr_active && state > PCI_D0)
> return -EBUSY;
>
> if (vdev->needs_pm_restore) {
> @@ -2294,8 +2294,9 @@ int vfio_pci_core_sriov_configure(struct
> vfio_pci_core_device *vdev,
>
> down_write(&vdev->memory_lock);
> vfio_pci_set_power_state(vdev, PCI_D0);
> - ret = pci_enable_sriov(pdev, nr_virtfn);
> + vdev->sriov_pwr_active = true;
> up_write(&vdev->memory_lock);
> + ret = pci_enable_sriov(pdev, nr_virtfn);
> if (ret) {
> pm_runtime_put(&pdev->dev);
> goto out_del;
> @@ -2309,6 +2310,7 @@ int vfio_pci_core_sriov_configure(struct
> vfio_pci_core_device *vdev,
> }
>
> out_del:
> + vdev->sriov_pwr_active = false;
> mutex_lock(&vfio_pci_sriov_pfs_mutex);
> list_del_init(&vdev->sriov_pfs_item);
> out_unlock:
>
> My only concern was clearing 'sriov_pwr_active' after disabling sriov,
> since the PF would still remain in D0. But from the f4162eb1e2fc 's
> commit description, it seems like that was intentional?
I don't see any obvious problem with that ordering, once SR-IOV is
disabled the PF is again subject to both PM and directed power
control. Thanks,
Alex