Re: [PATCH] PCI: Fix devres regression in pci_intx()

From: Philipp Stanner
Date: Fri Sep 06 2024 - 02:46:54 EST


On Fri, 2024-09-06 at 09:37 +0900, Damien Le Moal wrote:
> On 9/5/24 16:13, Philipp Stanner wrote:
> > On Thu, 2024-09-05 at 09:33 +0900, Damien Le Moal wrote:
> > > On 2024/09/05 6:10, Alex Williamson wrote:
> > > > On Wed, 4 Sep 2024 23:24:53 +0300
> > > > Andy Shevchenko <andy.shevchenko@xxxxxxxxx> wrote:
> > > >
> > > > > Wed, Sep 04, 2024 at 12:07:21PM -0600, Alex Williamson
> > > > > kirjoitti:
> > > > > > On Wed, 04 Sep 2024 15:37:25 +0200
> > > > > > Philipp Stanner <pstanner@xxxxxxxxxx> wrote: 
> > > > > > > On Wed, 2024-09-04 at 17:25 +0900, Damien Le Moal wrote: 
> > > > >
> > > > > ...
> > > > >
> > > > > > > If vfio-pci can get rid of pci_intx() alltogether, that
> > > > > > > might
> > > > > > > be a good
> > > > > > > thing. As far as I understood Andy Shevchenko, pci_intx()
> > > > > > > is
> > > > > > > outdated.
> > > > > > > There's only a hand full of users anyways. 
> > > > > >
> > > > > > What's the alternative? 
> > > > >
> > > > > From API perspective the pci_alloc_irq_vectors() & Co should
> > > > > be
> > > > > used.
> > > >
> > > > We can't replace a device level INTx control with a vector
> > > > allocation
> > > > function.
> > > >  
> > > > > > vfio-pci has a potentially unique requirement
> > > > > > here, we don't know how to handle the device interrupt, we
> > > > > > only
> > > > > > forward
> > > > > > it to the userspace driver.  As a level triggered
> > > > > > interrupt,
> > > > > > INTx will
> > > > > > continue to assert until that userspace driver handles the
> > > > > > device.
> > > > > > That's obviously unacceptable from a host perspective, so
> > > > > > INTx
> > > > > > is
> > > > > > masked at the device via pci_intx() where available, or at
> > > > > > the
> > > > > > interrupt controller otherwise.  The API with the userspace
> > > > > > driver
> > > > > > requires that driver to unmask the interrupt, again
> > > > > > resulting
> > > > > > in a call
> > > > > > to pci_intx() or unmasking the interrupt controller, in
> > > > > > order
> > > > > > to receive
> > > > > > further interrupts from the device.  Thanks, 
> > > > >
> > > > > I briefly read the discussion and if I understand it
> > > > > correctly
> > > > > the problem here
> > > > > is in the flow: when the above mentioned API is being called.
> > > > > Hence it's design
> > > > > (or architectural) level of issue and changing call from
> > > > > foo() to
> > > > > bar() won't
> > > > > magically make problem go away. But I might be mistaken.
> > > >
> > > > Certainly from a vector allocation standpoint we can change to
> > > > whatever
> > > > is preferred, but the direct INTx manipulation functions are a
> > > > different thing entirely and afaik there's nothing else that
> > > > can
> > > > replace them at a low level, nor can we just get rid of our
> > > > calls
> > > > to
> > > > pci_intx().  Thanks,
> > >
> > > But can these calls be moved out of the spinlock context ? If
> > > not,
> > > then we need
> > > to clarify that pci_intx() can be called from any context, which
> > > will
> > > require
> > > changing to a GFP_ATOMIC for the resource allocation, even if the
> > > use
> > > case
> > > cannot trigger the allocation. This is needed to ensure the
> > > correctness of the
> > > pci_intx() function use.
> >
> > We could do that I guess. As I keep saying, it's not intended to
> > have
> > pci_intx() allocate _permanently_. This is a temporary situation.
> > pci_intx() should have neither devres nor allocation.
> >
> > > Frankly, I am surprised that the might sleep splat you
> > > got was not already reported before (fuzzying, static analyzers
> > > might
> > > eventually
> > > catch that though).
> >
> > It's a super rare situation:
> >  * pci_intx() has very few callers
> >  * It only allocates if pcim_enable_device() instead of
> >    pci_enable_device() ran.
> >  * It only allocates when it's called for the FIRST TIME
> >  * All of the above is only a problem while you hold a lock
> >
> > >
> > > The other solution would be a version of pci_intx() that has a
> > > gfp
> > > flags
> > > argument to allow callers to use the right gfp flags for the call
> > > context.
> >
> > I don't think that's a good idea. As I said, I want to clean up all
> > that in the mid term.
> >
> > As a matter of fact, there is already __pcim_intx() in pci/devres.c
> > as
> > a pure unmanaged pci_intx() as a means to split and then cleanup
> > the
> > APIs.
>
> Yeah. That naming is in fact confusing. __pcim_intx() should really
> be named
> pci_intx()...
>
> > One path towards getting the hybrid behavior out of pci_intx()
> > could be
> > to rename __pcim_intx() to pci_intx_unmanaged() and port everyone
> > who
> > uses pci_enable_device() + pci_intx() to that version. That would
> > be
> > better than to have a third version with a gfp_t argument.
>
> Sounds good. But ideally, all users that rely on the managed variant
> should be
> converted to use pcim_intx() and pci_intx() changed to not call in
> devres. But
> that may be too much code churn (I have not checked).

Exactly that is the plan.

I looked into that yesterday and my idea is to publish __pcim_intx()
under the name pci_intx_unmanaged(), then port everyone who doesn't
rely on managed behavior to that function and then port everyone who
does rely on it to pcim_intx() (as you suggest).
Afterwards you can remove pci_intx_unmanaged() again and also remove
the hybrid behavior from pci_intx().

It's doable on not that much code. Getting it merged might be
politically difficult, though.

The only thing I'm really a bit afraid of is the pci_intx() user in
pci/msi/ – MSI calls that through yet another implicit devres call,
pcim_msi_release().
I *suspect* that pci_intx() in MSI can just be made
pci_intx_unmanaged(), but that should be checked carefully. The doubly
implicit devres magic caused some trouble in the past already, as we
had discussed here [1].


P.

[1] https://lore.kernel.org/all/ee44ea7ac760e73edad3f20b30b4d2fff66c1a85.camel@xxxxxxxxxx/