Re: [PATCH] powerpc/powernv/npu: Fix reference leak

From: Greg Kurz
Date: Tue May 14 2019 - 09:04:30 EST


On Tue, 14 May 2019 21:13:40 +1000
Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote:

> Greg Kurz <groug@xxxxxxxx> writes:
> > Michael,
> >
> > Any comments on this patch ? Should I repost with a shorter comment
> > as suggested by Alexey ?
>
> No the longer comment seems fine to me.
>
> I'm not a big fan of the patch, it's basically a hack :)
>

Yeah :)

> But for a backportable fix I guess it is OK.
>
> I would be happier though if we eventually fix up the code to do the
> refcounting properly.
>

I had started to do just that before deciding to go for the backportable
hack. Should I rebase the other patches I have on top of this patch and
repost the whole thing, so that we have both the ugly fix for stable and
the pretty one for 5.2 ?

Cheers,

--
Greg

> cheers
>
> > On Mon, 29 Apr 2019 12:36:59 +0200
> > Greg Kurz <groug@xxxxxxxx> wrote:
> >> On Mon, 29 Apr 2019 16:01:29 +1000
> >> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
> >>
> >> > On 20/04/2019 01:34, Greg Kurz wrote:
> >> > > Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
> >> > > has the effect of incrementing the reference count of the PCI device, as
> >> > > explained in drivers/pci/search.c:
> >> > >
> >> > > * Given a PCI domain, bus, and slot/function number, the desired PCI
> >> > > * device is located in the list of PCI devices. If the device is
> >> > > * found, its reference count is increased and this function returns a
> >> > > * pointer to its data structure. The caller must decrement the
> >> > > * reference count by calling pci_dev_put(). If no device is found,
> >> > > * %NULL is returned.
> >> > >
> >> > > Nothing was done to call pci_dev_put() and the reference count of GPU and
> >> > > NPU PCI devices rockets up.
> >> > >
> >> > > A natural way to fix this would be to teach the callers about the change,
> >> > > so that they call pci_dev_put() when done with the pointer. This turns
> >> > > out to be quite intrusive, as it affects many paths in npu-dma.c,
> >> > > pci-ioda.c and vfio_pci_nvlink2.c.
> >> >
> >> >
> >> > afaict this referencing is only done to protect the current traverser
> >> > and what you've done is actually a natural way (and the generic
> >> > pci_get_dev_by_id() does exactly the same), although this looks a bit weird.
> >> >
> >>
> >> Not exactly the same: pci_get_dev_by_id() always increment the refcount
> >> of the returned PCI device. The refcount is only decremented when this
> >> device is passed to pci_get_dev_by_id() to continue searching.
> >>
> >> That means that the users of the PCI device pointer returned by
> >> pci_get_dev_by_id() or its exported variants pci_get_subsys(),
> >> pci_get_device() and pci_get_class() do handle the refcount. They
> >> all pass the pointer to pci_dev_put() or continue the search,
> >> which calls pci_dev_put() internally.
> >>
> >> Direct and indirect callers of get_pci_dev() don't care for the
> >> refcount at all unless I'm missing something.
> >>
> >> >
> >> > > Also, the issue appeared in 4.16 and
> >> > > some affected code got moved around since then: it would be problematic
> >> > > to backport the fix to stable releases.
> >> > >
> >> > > All that code never cared for reference counting anyway. Call pci_dev_put()
> >> > > from get_pci_dev() to revert to the previous behavior.
> >> > >> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev
> >> > from pci_dn")
> >> > > Cc: stable@xxxxxxxxxxxxxxx # v4.16
> >> > > Signed-off-by: Greg Kurz <groug@xxxxxxxx>
> >> > > ---
> >> > > arch/powerpc/platforms/powernv/npu-dma.c | 15 ++++++++++++++-
> >> > > 1 file changed, 14 insertions(+), 1 deletion(-)
> >> > >
> >> > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> >> > > index e713ade30087..d8f3647e8fb2 100644
> >> > > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> >> > > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> >> > > @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock);
> >> > > static struct pci_dev *get_pci_dev(struct device_node *dn)
> >> > > {
> >> > > struct pci_dn *pdn = PCI_DN(dn);
> >> > > + struct pci_dev *pdev;
> >> > >
> >> > > - return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
> >> > > + pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
> >> > > pdn->busno, pdn->devfn);
> >> > > +
> >> > > + /*
> >> > > + * pci_get_domain_bus_and_slot() increased the reference count of
> >> > > + * the PCI device, but callers don't need that actually as the PE
> >> > > + * already holds a reference to the device.
> >> >
> >> > Imho this would be just enough.
> >> >
> >> > Anyway,
> >> >
> >> > Reviewed-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
> >> >
> >>
> >> Thanks !
> >>
> >> I now realize that I forgot to add the --cc option for stable on my stgit
> >> command line :-\.
> >>
> >> Cc'ing now.
> >>
> >> >
> >> > How did you find it? :)
> >> >
> >>
> >> While reading code to find some inspiration for OpenCAPI passthrough. :)
> >>
> >> I saw the following in vfio_pci_ibm_npu2_init():
> >>
> >> if (!pnv_pci_get_gpu_dev(vdev->pdev))
> >> return -ENODEV;
> >>
> >> and simply followed the function calls.
> >>
> >> >
> >> > > Since callers aren't
> >> > > + * aware of the reference count change, call pci_dev_put() now to
> >> > > + * avoid leaks.
> >> > > + */
> >> > > + if (pdev)
> >> > > + pci_dev_put(pdev);
> >> > > +
> >> > > + return pdev;
> >> > > }
> >> > >
> >> > > /* Given a NPU device get the associated PCI device. */
> >> > >
> >> >
> >>