Re: [PATCH] PCI: Fix devres regression in pci_intx()

From: Philipp Stanner
Date: Mon Jul 29 2024 - 11:46:03 EST


On Mon, 2024-07-29 at 20:29 +0900, Damien Le Moal wrote:
> On 7/27/24 03:43, pstanner@xxxxxxxxxx wrote:
> > > That said, I do not see that as an issue in itself. What I fail
> > > to
> > > understand
> > > though is why that intx devres is not deleted on device teardown.
> > > I
> > > think this
> > > may have something to do with the fact that pcim_intx() always
> > > does
> > > "res->orig_intx = !enable;", that is, it assumes that if there is
> > > a
> > > call to
> > > pcim_intx(dev, 0), then it is because intx where enabled already,
> > > which I do not
> > > think is true for most drivers... So we endup with INTX being
> > > wrongly
> > > enabled on
> > > device teardown by pcim_intx_restore(), and because of that, the
> > > intx
> > > resource
> > > is not deleted ?
> >
> > Spoiler: The device resources that have initially been created do
> > get
> > deleted. Devres works as intended. The issue is that the forces of
> > evil
> > invoke pci_intx() through another path, hidden behind an API,
> > through
> > another devres callback.
> >
> > So the device resource never gets deleated because it is *created*
> > on
> > driver detach, when devres already ran.
>
> That explains the issue :)
>
> > > Re-enabling intx on teardown is wrong I think, but that still
> > > does
> > > not explain
> > > why that resource is not deleted. I fail to see why.
> >
> > You came very close to the truth ;)
> >
> > With some help from my favorite coworker I did some tracing today
> > and
> > found this when doing `rmmod ahci`:
> >
> > => pci_intx
> > => pci_msi_shutdown
> > => pci_disable_msi
> > => devres_release_all
> > => device_unbind_cleanup
> > => device_release_driver_internal
> > => driver_detach
> > => bus_remove_driver
> > => pci_unregister_driver
> > => __do_sys_delete_module
> > => do_syscall_64
> > => entry_SYSCALL_64_after_hwframe
> >
> > The SYSCALL is my `rmmod`.
> >
> > As you can see, pci_intx() is invoked indirectly through
> > pci_disable_msi() – which gets invoked by devres, which is
> > precisely
> > one reason why you could not find the suspicious pci_intx() call in
> > the
> > ahci code base.
> >
> > Now the question is: Who set up that devres callback which
> > indirectly
> > calls pci_intx()?
> >
> > It is indeed MSI, here in msi/msi.c:
> >
> > static void pcim_msi_release(void *pcidev)
> > {
> >  struct pci_dev *dev = pcidev;
> >
> >  dev->is_msi_managed = false;
> >  pci_free_irq_vectors(dev); // <-- calls pci_disable_msi(), which
> > calls pci_intx(), which re-registers yet another devres callback
> > }
> >
> > /*
> >  * Needs to be separate from pcim_release to prevent an ordering
> > problem
> >
> > ==> Oh, here they even had a warning about that interacting with
> > devres somehow...
> >
> >  * vs. msi_device_data_release() in the MSI core code.
> >  */
> > static int pcim_setup_msi_release(struct pci_dev *dev)
> > {
> >  int ret;
> >
> >  if (!pci_is_managed(dev) || dev->is_msi_managed)
> >  return 0;
> >
> >  ret = devm_add_action(&dev->dev, pcim_msi_release, dev);
> >  if (ret)
> >  return ret;
> >
> >  dev->is_msi_managed = true;
> >  return 0;
> > }
> >
> > I don't know enough about AHCI to see where exactly it jumps into
> > these, but a candidate would be:
> >  * pci_enable_msi(), called among others in acard-ahci.c
> >
> > Another path is:
> >    1. ahci_init_one() calls
> >    2. ahci_init_msi() calls
> >    3. pci_alloc_irq_vectors() calls
> >    4. pci_alloc_irq_vectors_affinity() calls
> >    5. __pci_enable_msi_range() OR __pci_enable_msix_range() call
> >    6. pci_setup_msi_context() calls
> >    7. pcim_setup_msi_release() which registers the callback to
> >       pci_intx()
>
> ahci_init_one() is the function used by the default AHCI driver
> (ahci.ko), so
> this path is correct.
>
> > Ha!
> >
> > I think I earned myself a Friday evening beer now 8-)
>
> I was way ahead of you :)
>
> > Now the interesting question will be how the heck we're supposed to
> > clean that up.
>
> If pcim_intx() always gets called on device release, AND MSI/MSIX are
> also
> managed interrupts with a devres action on release, I would say that
> pcim_msi_release() should NOT lead to a path calling pci_intx(dev,
> 0), as that
> would create the intx devres again.

Yes, that is obviously wrong – the thing is that we cannot remove the
devres aspect of pci_intx() as long as the other callers (especially
drivers) rely on it.

And, most notably, we (= I) don't know if *MSI* relies on the devres
aspect being present in pci_intx() in MSI's cleanup path.

Looking at what pci_intx_for_msi() is good for and where it's used
might give a clue.

@Krzysztof: Aren't you the MSI expert? Any idea what this might be all
about?

There are definitely some hacks one could do to remove the devres
aspect from pci_intx() _just for MSI_, for example:
* set struct pci_dev.is_managed = false in the disable path in MSI
* Use yet another function, pci_intx_unmanaged(), in MSI

It's obviously work and all that and I don't think one should even try
it without an understanding of what MSI is supposed to do.


> But given the comment you point out above,
> it seems that there are some ordering constraints between disabling
> msi and intx
> ? If that is the case, then may be this indeed will be tricky.

No, I don't think that's related.
I think the ordering issue should have disappeared now in v6.11
(although I wouldn't bet my hand on it)

The issue was that until including v6.10, struct pci_devres bundled
everything needed by several managed PCI functions into just that one
struct. It is handled by 1 creator function and 1 cleanup function
(Should have never been implemented that way, since it violates the
basic concepts of devres, but here we are, should, would, could...).

I presume the author was then about to register that separate cleanup
devres callback in MSI and discovered that it would be racy if that MSI
code uses the same struct pci_devres, which is administrated by a
different function. It could have been freed already when the MSI
callback awakes -> UAF.

>
> > Another interesting question is: Did that only work by coincidence
> > during the last 15 years, or is it by design that the check in
> > pci_intx():
> >
> > if (new != pci_command)
> >
> > only evaluates to true if we are not in a detach path.
>
> I would say that it evaluates to true for any device using intx,
> which tend to
> be rare these days. In such case, then enabling on device attach and
> disabling
> on detach would lead to new != pci_command, including in the hidden
> intx disable
> path triggered by disabling msi. But the devres being recreated on
> detach
> definitely seem to depend on the disabling order though. So we may
> have been
> lucky indeed.

Hm. I hope you are right. If not, then other drivers would experience
the same regression you found in AHCI.

Let's keep our eyes open...

P.

>
> >
> > If it were coincidence, it would not have caused faults as it did
> > now
> > with my recent work, because the old version did not allocate in
> > pci_intx().
> >
> > But it could certainly have been racy and might run into a UAF
> > since
> > the old pci_intx() would have worked on memory that is also managed
> > by
> > devres, but has been registered at a different place. I guess that
> > is
> > what that comment in the MSI code quoted above is hinting at.
> >
> >
> > Christoph indeed rightfully called it voodoo ^^
> >
> >
> > Cheers,
> > P.
> >
> >
>