Re: Summary of LPC guest MSI discussion in Santa Fe

From: Alex Williamson
Date: Fri Nov 11 2016 - 11:05:50 EST


On Fri, 11 Nov 2016 08:50:56 -0700
Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

> On Fri, 11 Nov 2016 12:19:44 +0100
> Joerg Roedel <joro@xxxxxxxxxx> wrote:
>
> > On Thu, Nov 10, 2016 at 10:46:01AM -0700, Alex Williamson wrote:
> > > In the case of x86, we know that DMA mappings overlapping the MSI
> > > doorbells won't be translated correctly, it's not a valid mapping for
> > > that range, and therefore the iommu driver backing the IOMMU API
> > > should describe that reserved range and reject mappings to it.
> >
> > The drivers actually allow mappings to the MSI region via the IOMMU-API,
> > and I think it should stay this way also for other reserved ranges.
> > Address space management is done by the IOMMU-API user already (and has
> > to be done there nowadays), be it a DMA-API implementation which just
> > reserves these regions in its address space allocator or be it VFIO with
> > QEMU, which don't map RAM there anyway. So there is no point of checking
> > this again in the IOMMU drivers and we can keep that out of the
> > mapping/unmapping fast-path.
>
> It's really just a happenstance that we don't map RAM over the x86 MSI
> range though. That property really can't be guaranteed once we mix
> architectures, such as running an aarch64 VM on x86 host via TCG.
> AIUI, the MSI range is actually handled differently than other DMA
> ranges, so a iommu_map() overlapping a range that the iommu cannot map
> should fail just like an attempt to map beyond the address width of the
> iommu.

(clarification, this is x86 specific, the MSI controller - interrupt
remapper - is embedded in the iommu AIUI, so the iommu is actually not
able to provide DMA translation for this range. In architectures where
the MSI controller is separate from the iommu, I agree that the iommu
has no responsibility to fault mapping of iova ranges in the shadow of
an external MSI controller)

> > > For PCI devices userspace can examine the topology of the iommu group
> > > and exclude MMIO ranges of peer devices based on the BARs, which are
> > > exposed in various places, pci-sysfs as well as /proc/iomem. For
> > > non-PCI or MSI controllers... ???
> >
> > Right, the hardware resources can be examined. But maybe this can be
> > extended to also cover RMRR ranges? Then we would be able to assign
> > devices with RMRR mappings to guests.
>
> RMRRs are special in a different way, the VT-d spec requires that the
> OS honor RMRRs, the user has no responsibility (and currently no
> visibility) to make that same arrangement. In order to potentially
> protect the physical host platform, the iommu drivers should prevent a
> user from remapping RMRRS. Maybe there needs to be a different
> interface used by untrusted users vs in-kernel drivers, but I think the
> kernel really needs to be defensive in the case of user mappings, which
> is where the IOMMU API is rooted. Thanks,
>
> Alex
> _______________________________________________
> iommu mailing list
> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
> https://lists.linuxfoundation.org/mailman/listinfo/iommu