Re: Summary of LPC guest MSI discussion in Santa Fe
From: Christoffer Dall
Date: Wed Nov 09 2016 - 14:23:15 EST
On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote:
> On 11/09/2016 12:03 PM, Will Deacon wrote:
> >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote:
> >>On 11/08/2016 06:35 PM, Alex Williamson wrote:
> >>>On Tue, 8 Nov 2016 21:29:22 +0100
> >>>Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote:
> >>>>Is my understanding correct, that you need to tell userspace about the
> >>>>location of the doorbell (in the IOVA space) in case (2), because even
> >>>>though the configuration of the device is handled by the (host) kernel
> >>>>through trapping of the BARs, we have to avoid the VFIO user programming
> >>>>the device to create other DMA transactions to this particular address,
> >>>>since that will obviously conflict and either not produce the desired
> >>>>DMA transactions or result in unintended weird interrupts?
> >
> >Yes, that's the crux of the issue.
> >
> >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
> >>>it's potentially a DMA target and we'll get bogus data on DMA read from
> >>>the device, and lose data and potentially trigger spurious interrupts on
> >>>DMA write from the device. Thanks,
> >>>
> >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
> >>they address match before the SMMU checks are done. if
> >>all DMA addrs had to go through SMMU first, then the DMA access could
> >>be ignored/rejected.
> >
> >That's actually not true :( The SMMU can't generally distinguish between MSI
> >writes and DMA writes, so it would just see a write transaction to the
> >doorbell address, regardless of how it was generated by the endpoint.
> >
> >Will
> >
> So, we have real systems where MSI doorbells are placed at the same IOVA
> that could have memory for a guest
I don't think this is a property of a hardware system. THe problem is
userspace not knowing where in the IOVA space the kernel is going to
place the doorbell, so you can end up (basically by chance) that some
IPA range of guest memory overlaps with the IOVA space for the doorbell.
>, but not at the same IOVA as memory on real hw ?
On real hardware without an IOMMU the system designer would have to
separate the IOVA and RAM in the physical address space. With an IOMMU,
the SMMU driver just makes sure to allocate separate regions in the IOVA
space.
The challenge, as I understand it, happens with the VM, because the VM
doesn't allocate the IOVA for the MSI doorbell itself, but the host
kernel does this, independently from the attributes (e.g. memory map) of
the VM.
Because the IOVA is a single resource, but with two independent entities
allocating chunks of it (the host kernel for the MSI doorbell IOVA, and
the VFIO user for other DMA operations), you have to provide some
coordination between those to entities to avoid conflicts. In the case
of KVM, the two entities are the host kernel and the VFIO user (QEMU/the
VM), and the host kernel informs the VFIO user to never attempt to use
the doorbell IOVA already reserved by the host kernel for DMA.
One way to do that is to ensure that the IPA space of the VFIO user
corresponding to the doorbell IOVA is simply not valid, ie. the reserved
regions that avoid for example QEMU to allocate RAM there.
(I suppose it's technically possible to get around this issue by letting
QEMU place RAM wherever it wants but tell the guest to never use a
particular subset of its RAM for DMA, because that would conflict with
the doorbell IOVA or be seen as p2p transactions. But I think we all
probably agree that it's a disgusting idea.)
> How are memory holes passed to SMMU so it doesn't have this issue for bare-metal
> (assign an IOVA that overlaps an MSI doorbell address)?
>
As I understand it, the SMMU driver manages the whole IOVA space when
VFIO is *not* involved, so it simply allocates non-overlapping regions.
The problem occurs when you have two independent entities essentially
attempting to mange the same resource (and the problem is exacerbated by
the VM potentially allocating slots in the IOVA space which may have
other limitations it doesn't know about, for example the p2p regions,
because the VM doesn't know anything about the topology of the
underlying physical system).
Christoffer