Re: [PATCH] PCI: Fix devres regression in pci_intx()
From: Damien Le Moal
Date: Mon Jul 29 2024 - 07:30:08 EST
On 7/27/24 03:43, pstanner@xxxxxxxxxx wrote:
>> That said, I do not see that as an issue in itself. What I fail to
>> understand
>> though is why that intx devres is not deleted on device teardown. I
>> think this
>> may have something to do with the fact that pcim_intx() always does
>> "res->orig_intx = !enable;", that is, it assumes that if there is a
>> call to
>> pcim_intx(dev, 0), then it is because intx where enabled already,
>> which I do not
>> think is true for most drivers... So we endup with INTX being wrongly
>> enabled on
>> device teardown by pcim_intx_restore(), and because of that, the intx
>> resource
>> is not deleted ?
>
> Spoiler: The device resources that have initially been created do get
> deleted. Devres works as intended. The issue is that the forces of evil
> invoke pci_intx() through another path, hidden behind an API, through
> another devres callback.
>
> So the device resource never gets deleated because it is *created* on
> driver detach, when devres already ran.
That explains the issue :)
>> Re-enabling intx on teardown is wrong I think, but that still does
>> not explain
>> why that resource is not deleted. I fail to see why.
>
> You came very close to the truth ;)
>
> With some help from my favorite coworker I did some tracing today and
> found this when doing `rmmod ahci`:
>
> => pci_intx
> => pci_msi_shutdown
> => pci_disable_msi
> => devres_release_all
> => device_unbind_cleanup
> => device_release_driver_internal
> => driver_detach
> => bus_remove_driver
> => pci_unregister_driver
> => __do_sys_delete_module
> => do_syscall_64
> => entry_SYSCALL_64_after_hwframe
>
> The SYSCALL is my `rmmod`.
>
> As you can see, pci_intx() is invoked indirectly through
> pci_disable_msi() – which gets invoked by devres, which is precisely
> one reason why you could not find the suspicious pci_intx() call in the
> ahci code base.
>
> Now the question is: Who set up that devres callback which indirectly
> calls pci_intx()?
>
> It is indeed MSI, here in msi/msi.c:
>
> static void pcim_msi_release(void *pcidev)
> {
> struct pci_dev *dev = pcidev;
>
> dev->is_msi_managed = false;
> pci_free_irq_vectors(dev); // <-- calls pci_disable_msi(), which calls pci_intx(), which re-registers yet another devres callback
> }
>
> /*
> * Needs to be separate from pcim_release to prevent an ordering problem
>
> ==> Oh, here they even had a warning about that interacting with devres somehow...
>
> * vs. msi_device_data_release() in the MSI core code.
> */
> static int pcim_setup_msi_release(struct pci_dev *dev)
> {
> int ret;
>
> if (!pci_is_managed(dev) || dev->is_msi_managed)
> return 0;
>
> ret = devm_add_action(&dev->dev, pcim_msi_release, dev);
> if (ret)
> return ret;
>
> dev->is_msi_managed = true;
> return 0;
> }
>
> I don't know enough about AHCI to see where exactly it jumps into
> these, but a candidate would be:
> * pci_enable_msi(), called among others in acard-ahci.c
>
> Another path is:
> 1. ahci_init_one() calls
> 2. ahci_init_msi() calls
> 3. pci_alloc_irq_vectors() calls
> 4. pci_alloc_irq_vectors_affinity() calls
> 5. __pci_enable_msi_range() OR __pci_enable_msix_range() call
> 6. pci_setup_msi_context() calls
> 7. pcim_setup_msi_release() which registers the callback to
> pci_intx()
ahci_init_one() is the function used by the default AHCI driver (ahci.ko), so
this path is correct.
> Ha!
>
> I think I earned myself a Friday evening beer now 8-)
I was way ahead of you :)
> Now the interesting question will be how the heck we're supposed to
> clean that up.
If pcim_intx() always gets called on device release, AND MSI/MSIX are also
managed interrupts with a devres action on release, I would say that
pcim_msi_release() should NOT lead to a path calling pci_intx(dev, 0), as that
would create the intx devres again. But given the comment you point out above,
it seems that there are some ordering constraints between disabling msi and intx
? If that is the case, then may be this indeed will be tricky.
> Another interesting question is: Did that only work by coincidence
> during the last 15 years, or is it by design that the check in
> pci_intx():
>
> if (new != pci_command)
>
> only evaluates to true if we are not in a detach path.
I would say that it evaluates to true for any device using intx, which tend to
be rare these days. In such case, then enabling on device attach and disabling
on detach would lead to new != pci_command, including in the hidden intx disable
path triggered by disabling msi. But the devres being recreated on detach
definitely seem to depend on the disabling order though. So we may have been
lucky indeed.
>
> If it were coincidence, it would not have caused faults as it did now
> with my recent work, because the old version did not allocate in
> pci_intx().
>
> But it could certainly have been racy and might run into a UAF since
> the old pci_intx() would have worked on memory that is also managed by
> devres, but has been registered at a different place. I guess that is
> what that comment in the MSI code quoted above is hinting at.
>
>
> Christoph indeed rightfully called it voodoo ^^
>
>
> Cheers,
> P.
>
>
--
Damien Le Moal
Western Digital Research