Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
From: Eric Auger
Date: Fri May 20 2016 - 12:02:50 EST
Alex, Robin,
While my 3 part series primarily addresses the problematic of mapping
MSI doorbells into arm-smmu, it fails in :
1) determining whether the MSI controller is downstream or upstream to
the IOMMU,
=> indicates whether the MSI doorbell must be mapped
=> participates in the decision about 2)
2) determining whether it is safe to assign a PCIe device.
I think we share this understanding with Robin. All above of course
stands for ARM.
I get stuck with those 2 issues and I have few questions about iommu
group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
if you could answer part of those questions and advise about the
strategy to fix those.
Best Regards
Eric
QUESTIONS:
1) Robin, you pointed some host controllers which also are MSI
controllers
(http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
that case MSIs never reach the IOMMU. I failed in finding anything about
MSIs in PCIe ACS spec. What should be the iommu groups in that
situation. Isn't the upstreamed code able to see some DMA transfers are
not properly isolated and alias devices in the same group? According to
your security warning, Alex, I would think the code does not recognize
it, can you confirm please?
2) can other PCIe components be MSI controllers?
3) Am I obliged to consider arbitrary topologies where an MSI controller
stands between the PCIe host and the iommu? in the PCIe space or
platform space? If this only relates to PCIe couldn' I check if an MSI
controller exists in the PCIe tree?
4) Robin suggested in a private thread to enumerate through a list of
"registered" doorbells and if any belongs to an unsafe MSI controller,
consider the assignment is unsafe. This would be a first step before
doing something more complex. Alex, would that be acceptable to you for
issue #2?
5) About issue #1: don't we miss tools in dt/ACPI to describe the
location of the iommu on ARM? This is not needed on x86 because
irq_remapping and IOMMU are at the same place but my understanding is
that it is on ARM where
- there is no connection between the MSI controller - which implements
irq remapping - and the iommu
- MSI are conveyed on the same address space as standard memory
transactions.
6) can't we live with iommu/MSI controller respective location uncertainty?
- in my current series, with the above Xilinx MSI controller, I would
see there is an arm-smmu requiring mapping behind the PCI host, would
query the characteristics of the MSI doorbell (not implemented by that
controller), so no mapping would be done. So it would work I think.
- However in case we have this topology: PCIe host -> MSI controller
generally used behind an IOMMU (so registering a doorbell) -> IOMMU,
this wouldn't work since the doorbell would be mapped.
On 05/04/2016 01:54 PM, Eric Auger wrote:
> This series allows the user-space to register a reserved IOVA domain.
> This completes the kernel integration of the whole functionality on top
> of part 1 (v9) & 2 (v8).
>
> It also depends on [PATCH 1/3] iommu: Add MMIO mapping type series,
> http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>
> We reuse the VFIO DMA MAP ioctl with a new flag to bridge to the
> msi-iommu API. The need for provisioning such MSI IOVA range is reported
> through capability chain, using VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY.
>
> vfio_iommu_type1 checks if the MSI mapping is safe when attaching the
> vfio group to the container (allow_unsafe_interrupts modality).
>
> On ARM/ARM64, the IOMMU does not astract IRQ remapping. the modality is
> abstracted on MSI controller side. The GICv3 ITS is the first controller
> advertising the modality.
>
> More details & context can be found at:
> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>
> Best Regards
>
> Eric
>
> Testing:
> - functional on ARM64 AMD Overdrive HW (single GICv2m frame) with
> Intel X540-T2 (SR-IOV capable)
> - also tested on Armada-7040 using an intel IXGBE (82599ES) by
> Yehuda Yitschak (v8)
> - Not tested: ARM GICv3 ITS
>
> References:
> [1] [RFC 0/2] VFIO: Add virtual MSI doorbell support
> (https://lkml.org/lkml/2015/7/24/135)
> [2] [RFC PATCH 0/6] vfio: Add interface to map MSI pages
> (https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html)
> [3] [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
> (http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858)
>
> Git: complete series available at
> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9
>
> previous version at
> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc5-pcie-passthrough-v8
>
> History:
> v8 -> v9:
> - report MSI geometry through capability chain (last patch only);
> with the current limitation that an arbitrary number of 16 page
> requirement is reported. To be improved later on.
>
>
> v7 -> v8:
> - use renamed msi-iommu API
> - VFIO only responsible for setting the IOVA aperture
> - use new DOMAIN_ATTR_MSI_GEOMETRY iommu domain attribute
>
> v6 -> v7:
> - vfio_find_dma now accepts a dma_type argument.
> - should have recovered the capability to unmap the whole user IOVA range
> - remove computation of nb IOVA pages -> will post a separate RFC for that
> while respinning the QEMU part
>
> RFC v5 -> patch v6:
> - split to ease the review process
>
> RFC v4 -> RFC v5:
> - take into account Thomas' comments on MSI related patches
> - split "msi: IOMMU map the doorbell address when needed"
> - increase readability and add comments
> - fix style issues
> - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
> - platform ITS now advertises IOMMU_CAP_INTR_REMAP
> - fix compilation issue with CONFIG_IOMMU API unset
> - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
>
> RFC v3 -> v4:
> - Move doorbell mapping/unmapping in msi.c
> - fix ref count issue on set_affinity: in case of a change in the address
> the previous address is decremented
> - doorbell map/unmap now is done on msi composition. Should allow the use
> case for platform MSI controllers
> - create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
> to reserved IOVA management (looking like dma-iommu glue)
> - series reordering to ease the review:
> - first part is related to IOMMU
> - second related to MSI sub-system
> - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
> - expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
> [this partially addresses Marc's comments on iommu_get/put_single_reserved
> size/alignment problematic - which I did not ignore - but I don't know
> how much I can do at the moment]
>
> RFC v2 -> RFC v3:
> - should fix wrong handling of some CONFIG combinations:
> CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
> - fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)
>
> PATCH v1 -> RFC v2:
> - reverted to RFC since it looks more reasonable ;-) the code is split
> between VFIO, IOMMU, MSI controller and I am not sure I did the right
> choices. Also API need to be further discussed.
> - iova API usage in arm-smmu.c.
> - MSI controller natively programs the MSI addr with either the PA or IOVA.
> This is not done anymore in vfio-pci driver as suggested by Alex.
> - check irq remapping capability of the group
>
> RFC v1 [2] -> PATCH v1:
> - use the existing dma map/unmap ioctl interface with a flag to register a
> reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
> - a single reserved IOVA contiguous region now is allowed
> - use of an RB tree indexed by PA to store allocated reserved slots
> - use of a vfio_domain iova_domain to manage iova allocation within the
> window provided by the userspace
> - vfio alloc_map/unmap_free take a vfio_group handle
> - vfio_group handle is cached in vfio_pci_device
> - add ref counting to bindings
> - user modality enabled at the end of the series
>
>
> Eric Auger (7):
> vfio: introduce a vfio_dma type field
> vfio/type1: vfio_find_dma accepting a type argument
> vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
> vfio: allow reserved msi iova registration
> vfio/type1: also check IRQ remapping capability at msi domain
> iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
> vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability
> chains
>
> drivers/iommu/arm-smmu-v3.c | 3 +-
> drivers/iommu/arm-smmu.c | 3 +-
> drivers/vfio/vfio_iommu_type1.c | 270 +++++++++++++++++++++++++++++++++++++---
> include/uapi/linux/vfio.h | 40 +++++-
> 4 files changed, 298 insertions(+), 18 deletions(-)
>