Re: PCIe host controller behind IOMMU on ARM

From: Liviu.Dudau@xxxxxxx
Date: Thu Nov 12 2015 - 05:32:10 EST


On Thu, Nov 12, 2015 at 09:26:33AM +0000, Phil Edworthy wrote:
> Hi Liviu, Arnd,
>
> On 11 November 2015 18:25, LIviu wrote:
> > On Mon, Nov 09, 2015 at 12:32:13PM +0000, Phil Edworthy wrote:
> > > Hi Liviu, Will,
> > >
> > > On 04 November 2015 15:19, Phil wrote:
> > > > On 04 November 2015 15:02, Liviu wrote:
> > > > > On Wed, Nov 04, 2015 at 02:48:38PM +0000, Phil Edworthy wrote:
> > > > > > Hi Liviu,
> > > > > >
> > > > > > On 04 November 2015 14:24, Liviu wrote:
> > > > > > > On Wed, Nov 04, 2015 at 01:57:48PM +0000, Phil Edworthy wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am trying to hook up a PCIe host controller that sits behind an
> > IOMMU,
> > > > > > > > but having some problems.
> > > > > > > >
> > > > > > > > I'm using the pcie-rcar PCIe host controller and it works fine without
> > > > > > > > the IOMMU, and I can attach the IOMMU to the controller such that
> > any
> > > > > calls
> > > > > > > > to dma_alloc_coherent made by the controller driver uses the
> > > > iommu_ops
> > > > > > > > version of dma_ops.
> > > > > > > >
> > > > > > > > However, I can't see how to make the endpoints to utilise the
> > dma_ops
> > > > that
> > > > > > > > the controller uses. Shouldn't the endpoints inherit the dma_ops from
> > the
> > > > > > > > controller?
> > > > > > >
> > > > > > > No, not directly.
> > > > > > >
> > > > > > > > Any pointers for this?
> > > > > > >
> > > > > > > You need to understand the process through which a driver for
> > endpoint
> > > > get
> > > > > > > an address to be passed down to the device. Have a look at
> > > > > > > Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
> > > > > > > (Hint: EP driver needs to call dma_map_single).
> > > > > > >
> > > > > > > Also, you need to make sure that the bus address that ends up being set
> > > > into
> > > > > > > the endpoint gets translated correctly by the host controller into an
> > address
> > > > > > > that the IOMMU can then translate into physical address.
> > > > > > Sure, though since this is bog standard Intel PCIe ethernet card which
> > works
> > > > > > fine when the IOMMU is effectively unused, I donât think there is a
> > problem
> > > > > > with that.
> > > > > >
> > > > > > The driver for the PCIe controller sets up the IOMMU mapping ok when I
> > > > > > do a test call to dma_alloc_coherent() in the controller's driver. i.e. when I
> > > > > > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > > > > > __iommu_alloc_buffer() and __alloc_iova().
> > > > > >
> > > > > > When an endpoint driver allocates and maps a dma coherent buffer it
> > > > > > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> > > > >
> > > > > Why do you think that? Remember that the only thing attached to the
> > IOMMU
> > > > is
> > > > > the
> > > > > host controller. The endpoint is on the PCIe bus, which gets a different
> > > > > translation
> > > > > that the IOMMU knows nothing about. If it helps you to visualise it better,
> > think
> > > > > of the host controller as another IOMMU device. It's the ops of the host
> > > > > controller
> > > > > that should be invoked, not the IOMMU's.
> > > > Ok, that makes sense. I'll have a think and poke it a bit more...
> >
> > Hi Phil,
> >
> > Not trying to ignore your email, but I thought this is more in Will's backyard.
> >
> > > Somewhat related to this, since our PCIe controller HW is limited to
> > > 32-bit AXI address range, before trying to hook up the IOMMU I have
> > > tried to limit the dma_mask for PCI cards to DMA_BIT_MASK(32). The
> > > reason being that Linux uses a 1 to 1 mapping between PCI addresses
> > > and cpu (phys) addresses when there isn't an IOMMU involved, so I
> > > think that we need to limit the PCI address space used.
> >
> > I think you're mixing things a bit or not explaining them very well. Having the
> > PCIe controller limited to 32-bit AXI does not mean that the PCIe bus cannot
> > carry 64-bit addresses. It depends on how they get translated by the host bridge
> > or its associated ATS block. I can't see why you can't have a setup where
> > the CPU addresses are 32-bit but the PCIe bus addresses are all 64-bit.
> > You just have to be careful on how you setup your mem64 ranges so that they
> > don't
> > overlap with the 32-bit ranges when translated.
> From a HW point of view I agree that we can setup the PCI host bridge such that
> it uses 64-bit PCI address, with 32-bit cpu addresses. Though in practice doesn't
> this mean that the dma ops used by card drivers has to be provided by our PCI
> host bridge driver so we can apply the translation to those PCI addresses?

I thought all addresses that are set into the cards go through
pcibios_resource_to_bus() which give you the PCI address to set, although I have to
admit that when DMA gets involved I'm not 100% sure of the whole flow.

Best regards,
Liviu

> This comes back to my point below about how to do this. Adding a bus notifier
> to do this may be too late, and arm64 doesn't implement set_dma_ops().
>
> > And no, you should not limit at the card driver the DMA_BIT_MASK() unless the
> > card is not capable of supporting more than 32-bit addresses.
> If there was infrastructure that checked all parents dma-ranges when the
> dma_set_mask() function is called as Arnd pointed out, this would nicely solve
> the problem.
>
> > > Since pci_setup_device() sets up dma_mask, I added a bus notifier in the
> > > PCIe controller driver so I can change the mask, if needed, on the
> > > BUS_NOTIFY_BOUND_DRIVER action.
> > > However, I think there is the potential for card drivers to allocate and
> > > map buffers before the bus notifier get called. Additionally, I've seen
> > > drivers change their behaviour based on the success or failure of
> > > dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)), so the
> > > driver could, theoretically at least, operate in a way that is not
> > > compatible with a more restricted dma_mask (though I can't think
> > > of any way this would not work with hardware I've seen).
> > >
> > > So, I think that using a bus notifier is the wrong way to go, but I donât
> > > know what other options I have. Any suggestions?
> >
> > I would first have a look at how the PCIe bus addresses are translated by the
> > host controller.
> >
> > Best regards,
> > Liviu
> >
> Thanks
> Phil

--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
Â\_(ã)_/Â
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/