Re: [RFC PATCH v4 7/7] powerpc/powernv/pci-ioda: Add IOMMU_CAP_INTR_REMAP for IODA host bridge

From: Yongji Xie
Date: Fri Mar 18 2016 - 07:53:13 EST

On 2016/3/17 20:48, Alex Williamson wrote:
On Thu, 17 Mar 2016 19:38:29 +0800
Yongji Xie <xyjxie@xxxxxxxxxxxxxxxxxx> wrote:

On 2016/3/17 0:32, Alex Williamson wrote:
On Mon, 7 Mar 2016 15:48:38 +0800
Yongji Xie <xyjxie@xxxxxxxxxxxxxxxxxx> wrote:
This patch adds IOMMU_CAP_INTR_REMAP for IODA host bridge so that
we can mmap MSI-X table in vfio driver.

Signed-off-by: Yongji Xie <xyjxie@xxxxxxxxxxxxxxxxxx>
arch/powerpc/platforms/powernv/pci-ioda.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f90dc04..f01b9ab 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1955,6 +1955,20 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
.free = pnv_ioda2_table_free,
+static bool pnv_ioda_iommu_capable(enum iommu_cap cap)
+ switch (cap) {
+ return true;
+ default:
+ return false;
+ }
+static struct iommu_ops pnv_ioda_iommu_ops = {
+ .capable = pnv_ioda_iommu_capable,
static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
struct pnv_ioda_pe *pe, unsigned int base,
unsigned int segs)
@@ -3078,6 +3092,9 @@ static void pnv_pci_ioda_fixup(void)
/* Link NPU IODA tables to their PCI devices. */
+ bus_set_iommu(&pci_bus_type, &pnv_ioda_iommu_ops);
Doesn't this set you up for a world of hurt? bus_set_iommu() calls
iommu_bus_init() which sets up notifiers, which maybe you don't care
about, but it also means that iommu_domain_alloc(&pci_bus_type) will
segfault because you're not providing a domain_alloc callback here.
It seems to be hard to add IOMMU_CAP_INTR_REMAP on
PPC64 platform.

And can we add a new ioctl in vfio_iommu_driver to check
if interrupt remapping is supported so that we can use our
own way to determine that on PPC64 platform?
I'd prefer not. At the vfio user API level, the question is whether
the user can mmap over the msix table, testing a property/ioctl on the
iommu driver seems like an odd way to discover that. We should be
determining whether that's safe in the kernel and exporting that info
on the vfio device itself, where it seems like we have various ways we
could do this within the existing ioctls. Thanks,


Yes, you are right. It's not a good idea to add a new ioctl in
vfio_iommu_driver. Now I'd like to talk about the way to
determining whether it's safe to mmap over the msix table.

We currently use IOMMU_CAP_INTR_REMAP to determine that.
But there are some problems on PPC64 which never set
iommu_ops and ARM SMMU which set this capability but not
provide interrupt isolation. Can we add a variable/property
which can be set in vfio_iommu_driver->ops->attach_group()
and used in vfio_pci_driver to determine whether we can allow
mmapping msix table? If so, we can still use
IOMMU_CAP_INTR_REMAP, or some arch-independent ways
when IOMMU_CAP_INTR_REMAP doesn't work.

Yongji Xie