Re: [PATCH] PCI: Fix devres regression in pci_intx()

From: Alex Williamson
Date: Wed Sep 04 2024 - 08:59:09 EST


On Wed, 04 Sep 2024 09:06:37 +0200
Philipp Stanner <pstanner@xxxxxxxxxx> wrote:

> On Tue, 2024-09-03 at 09:44 -0600, Alex Williamson wrote:
> > On Thu, 25 Jul 2024 14:07:30 +0200
> > Philipp Stanner <pstanner@xxxxxxxxxx> wrote:
> >
> > > pci_intx() is a function that becomes managed if
> > > pcim_enable_device()
> > > has been called in advance. Commit 25216afc9db5 ("PCI: Add managed
> > > pcim_intx()") changed this behavior so that pci_intx() always leads
> > > to
> > > creation of a separate device resource for itself, whereas earlier,
> > > a
> > > shared resource was used for all PCI devres operations.
> > >
> > > Unfortunately, pci_intx() seems to be used in some drivers'
> > > remove()
> > > paths; in the managed case this causes a device resource to be
> > > created
> > > on driver detach.
> > >
> > > Fix the regression by only redirecting pci_intx() to its managed
> > > twin
> > > pcim_intx() if the pci_command changes.
> > >
> > > Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()")
> >
> > I'm seeing another issue from this, which is maybe a more general
> > problem with managed mode.  In my case I'm using vfio-pci to assign
> > an
> > ahci controller to a VM.
>
> "In my case" doesn't mean OOT, does it? I can't fully follow.

"OOT" Out Of Tree? No, "In my case" is simply introducing the scenario
in which I see the issue. vfio-pci is an in-tree driver used to attach
devices to userspace drivers, such as QEMU. The ahci driver is loaded
during system boot, setting the is_managed flag. The ahci driver is
then unbound from the device and the vfio-pci driver is bound. The
vfio-pci driver provides a uAPI for userspace drivers to operate a
device in an an IOMMU protected context.

> >   ahci_init_one() calls pcim_enable_device()
> > which sets is_managed = true.  I notice that nothing ever sets
> > is_managed to false.  Therefore now when I call pci_intx() from vfio-
> > pci
> > under spinlock, I get a lockdep warning
>
> I suppose you see the lockdep warning because the new pcim_intx() can
> now allocate, whereas before 25216afc9db5 it was pcim_enable_device()
> which allocated *everything* related to PCI devres.
>
> > as I no go through pcim_intx()
> > code after 25216afc9db5 
>
> You alwas went through pcim_intx()'s logic. The issue seems to be that
> the allocation step was moved.

Unintentionally, yes, I believe so. vfio-pci is not a managed, devres
driver and therefore had no expectation of using the managed code path.

> > since the previous driver was managed.
>
> what do you mean by "previous driver"?

As noted, the ahci driver is first bound to the device at boot,
unbound, and the vfio-pci driver bound to the device. The ahci driver
is the previous driver.

> >   It seems
> > like we should be setting is_managed to false is the driver release
> > path, right?
>
> So the issue seems to be that the same struct pci_dev can be used by
> different drivers, is that correct?

Yes, and more generically, the driver release should undo everything
that has been configured by the driver probe.

> If so, I think that can be addressed trough having
> pcim_disable_device() set is_managed to false as you suggest.

If that's sufficient and drivers only call pcim_disable_device() in
their release function. I also note that f748a07a0b64 ("PCI: Remove
legacy pcim_release()") claims that:

Thanks to preceding cleanup steps, pcim_release() is now not needed
anymore and can be replaced by pcim_disable_device(), which is the
exact counterpart to pcim_enable_device().

However, that's not accurate as pcim_enable_device() adds a devm
action, unconditionally calls pci_enable_device() and sets is_managed
to true. If we assume pcim_pin_device() is a valid concept, don't we
still need to remove the devm action as well?

> Another solution can could at least consider would be to use a
> GFP_ATOMIC for allocation in get_or_create_intx_devres().

If we look at what pci_intx() does without devres, it's simply reading
and setting or clearing a bit in config space. I can attest that a
driver author would have no expectation that such a function allocates
memory and there are scenarios where we want to call this with
interrupts disabled, such as within an interrupt context. So, TBH, it
might make sense to consider whether an allocation in this path is
appropriate at all, but I'm obviously no expert in devres.

> I suppose your solution is the better one, though.

I see you've posted a patch, I'll test it as soon as I'm able. Thanks,

Alex