Re: [PATCH 2/2] NTB: PCI Quirk to Enable Switchtec NT Functionality with IOMMU On

From: Bjorn Helgaas
Date: Wed May 23 2018 - 08:40:02 EST


On Tue, May 22, 2018 at 04:23:13PM -0600, Logan Gunthorpe wrote:
> On 22/05/18 03:51 PM, Bjorn Helgaas wrote:
> > I don't think the question of when the aliases need to be added is
> > quite closed. Logan said "it seems pci_add_dma_alias() must be called
> > before the driver is initialized and therefore in a quirk", but that
> > doesn't make clear *why* the alias needs to be added before the driver
> > is initialized. The alias shouldn't be needed until the device does a
> > DMA, and it shouldn't do that until after the driver initializes.
>
> No, Doug tried it in the driver first and it didn't work. The symbol is
> also not exported which was probably done because it can't be used in
> the driver.
>
> > I suspect the reason the existing quirks are in drivers/pci/quirks.c
> > is because the IOMMU driver is in the host OS, but the host may not
> > have a driver for the device if the device is passed through to a
> > guest OS. In that case, the only way to add the alias is by using a
> > quirk that is always built into the host OS.
>
> Digging into the code a bit, it's not because it must be done by the
> Host OS but because it must be done before the IOMMU groups are created.
> The IOMMU code registers a bus_notifier and creates the groups based on
> the dma_alias mask when it receives the BUS_NOTIFY_ADD_DEVICE event.
> This event is notified in device_add() just before a call to
> bus_probe_device()[1]. Therefore, if a driver attempts to use
> pci_add_dma_alias() as part of it's probe routine, it will be too late
> as the IOMMU has already setup the groups based on the original version
> of the dma_alias_mask.

This (and Alex's) analysis is very useful and I'd like to capture it
somehow, perhaps by expanding the poor pci_add_dma_alias() function
comment I added with f0af9593372a ("PCI: Add pci_add_dma_alias() to
abstract implementation").

The admonition to "call early" without any details about *how* early
or *why* it needs to be called early is not really very useful.

If we added your analysis, it would be a great help to anybody who
reworks IOMMU groups in the future.

> I suspect this is by design as the groups must be created before and any
> dma_maps are done on the device and some drivers may create dma_maps
> during probe.
>
> Logan
>
> [1]
> https://elixir.bootlin.com/linux/v4.17-rc6/source/drivers/base/core.c#L1863