Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

From: Arnd Bergmann
Date: Wed May 21 2014 - 04:55:12 EST


On Wednesday 21 May 2014 10:16:09 Thierry Reding wrote:
> On Tue, May 20, 2014 at 10:31:29PM +0200, Arnd Bergmann wrote:
> > On Tuesday 20 May 2014 16:00:02 Thierry Reding wrote:
> > > On Tue, May 20, 2014 at 03:34:46PM +0200, Arnd Bergmann wrote:
> > > > On Tuesday 20 May 2014 15:17:43 Thierry Reding wrote:
> > > > > On Tue, May 20, 2014 at 02:41:18PM +0200, Arnd Bergmann wrote:
> > > > > > On Tuesday 20 May 2014 14:02:43 Thierry Reding wrote:
> > > > > [...]
> > > > > > > Couldn't a single-master IOMMU be windowed?
> > > > > >
> > > > > > Ah, yes. That would actually be like an IBM pSeries, which has a windowed
> > > > > > IOMMU but uses one window per virtual machine. In that case, the window could
> > > > > > be a property of the iommu node though, rather than part of the address
> > > > > > in the link.
> > > > >
> > > > > Does that mean that the IOMMU has one statically configured window which
> > > > > is the same for each virtual machine? That would require some other
> > > > > mechanism to assign separate address spaces to each virtual machine,
> > > > > wouldn't it? But I suspect that if the IOMMU allows that it could be
> > > > > allocated dynamically at runtime.
> > > >
> > > > The way it works on pSeries is that upon VM creation, the guest is assigned
> > > > one 256MB window for use by assigned DMA capable devices. When the guest
> > > > creates a mapping, it uses a hypercall to associate a bus address in that
> > > > range with a guest physical address. The hypervisor checks that the bus
> > > > address is within the allowed range, and translates the guest physical
> > > > address into a host physical address, then enters both into the I/O page
> > > > table or I/O TLB.
> > >
> > > So when a VM is booted it is passed a device tree with that DMA window?
> >
> > Correct.
> >
> > > Given what you describe above this seems to be more of a configuration
> > > option to restrict the IOMMU to a subset of the physical memory for
> > > purposes of virtualization. So I agree that this wouldn't be a good fit
> > > for what we're trying to achieve with iommus or dma-ranges in this
> > > binding.
> >
> > Thinking about it again now, I wonder if there are any other use cases
> > for windowed IOMMUs. If this is the only one, there might be no use
> > in the #address-cells model I suggested instead of your original
> > #iommu-cells.
>
> So in this case virtualization is the reason why we need the DMA window.
> The reason for that is that the guest has no other way of knowing what
> other guests might be using, so it's essentially a mechanism for the
> host to manage the DMA region and allocate subregions for each guest. If
> virtualization isn't an issue then it seems to me that the need for DMA
> windows goes away because the operating system will track DMA regions
> anyway.
>
> The only reason I can think of why a windowed IOMMU would be useful is
> to prevent two or more devices from stepping on each others' toes. But
> that's a problem that the OS should already be handling during DMA
> buffer allocation, isn't it?

Right. As long as we always unmap the buffers from the IOMMU after they
have stopped being in use, it's very unlikely that even a broken device
driver causes a DMA into some bus address that happens to be mapped for
another device.

> > > > > > I would like to add an explanation about dma-ranges to the binding:
> > > > > >
> > > > > > 8<--------
> > > > > > The parent bus of the iommu must have a valid "dma-ranges" property
> > > > > > describing how the physical address space of the IOMMU maps into
> > > > > > memory.
> > > > >
> > > > > With physical address space you mean the addresses after translation,
> > > > > not the I/O virtual addresses, right? But even so, how will this work
> > > > > when there are multiple IOMMU devices? What determines which IOMMU is
> > > > > mapped via which entry?
> > > > >
> > > > > Perhaps having multiple IOMMUs implies that there will have to be some
> > > > > partitioning of the parent address space to make sure two IOMMUs don't
> > > > > translate to the same ranges?
> > > >
> > > > These dma-ranges properties would almost always be for the entire RAM,
> > > > and we can treat anything else as a bug.
> > >
> > > Would it typically be a 1:1 mapping? In that case could we define an
> > > empty dma-ranges property to mean exactly that? That would make it
> > > consistent with the ranges property.
> >
> > Yes, I believe that is how it's already defined.
>
> I've gone through the proposal at [0] again, but couldn't find a mention
> of an empty "dma-ranges" property. But regardless I think that a 1:1
> mapping is the obvious meaning of an empty "dma-ranges" property.
>
> [0]: http://www.openfirmware.org/ofwg/proposals/Closed/Accepted/410-it.txt
>
> One thing I'm not sure about is whether dma-ranges should be documented
> in this binding at all. Since there's an accepted standard proposal it
> would seem that it doesn't need to be specifically mentioned. One other
> option would be to link to the above proposal from the binding and then
> complement that with what an empty "dma-ranges" property means.
>
> Or we could possible document this in a file along with other standard
> properties. I don't think we currently do that for any properties, but
> my concern is that there will always be a limited number of people
> knowing about how such properties are supposed to work. If all of a
> sudden all these people would disappear, everybody else would be left
> with references to these properties but nowhere to look for their
> meaning.

I think it makes sense to document how the standard dma-ranges
interacts with the new iommu binding, because it's not obvious
what happens if you have both together, or iommu without a parent
dma-ranges.

> > > > > > A device with an "iommus" property will ignore the "dma-ranges" property
> > > > > > of the parent node and rely on the IOMMU for translation instead.
> > > > >
> > > > > Do we need to consider the case where an IOMMU listed in iommus isn't
> > > > > enabled (status = "disabled")? In that case presumably the device would
> > > > > either not function or may optionally continue to master onto the parent
> > > > > untranslated.
> > > >
> > > > My reasoning was that the DT should specify whether we use the IOMMU
> > > > or not. Being able to just switch on or off the IOMMU sounds nice as
> > > > well, so we could change the text above to do that.
> > > >
> > > > Another option would be to do this in the IOMMU code, basically
> > > > falling back to the IOMMU parent's dma-ranges property and using
> > > > linear dma_map_ops when that is disabled.
> > >
> > > Yes, it should be trivial for the IOMMU core code to take care of this
> > > special case. Still I think it's worth mentioning it in the binding so
> > > that it's clearly specified.
> >
> > Agreed.
>
> Okay, I have a new version of the binding that I think incorporates all
> the changes discussed so far. It uses #address-cells and #size-cells to
> define the length of the specifier, but if we decide against that it can
> easily be changed again.

Ok.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/