RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

From: Shameerali Kolothum Thodi
Date: Tue Nov 12 2019 - 12:56:48 EST


Hi Eric,

> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 12 November 2019 14:21
> To: 'Auger Eric' <eric.auger@xxxxxxxxxx>; eric.auger.pro@xxxxxxxxx;
> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> kvm@xxxxxxxxxxxxxxx; kvmarm@xxxxxxxxxxxxxxxxxxxxx; joro@xxxxxxxxxx;
> alex.williamson@xxxxxxxxxx; jacob.jun.pan@xxxxxxxxxxxxxxx;
> yi.l.liu@xxxxxxxxx; jean-philippe.brucker@xxxxxxx; will.deacon@xxxxxxx;
> robin.murphy@xxxxxxx
> Cc: kevin.tian@xxxxxxxxx; vincent.stehle@xxxxxxx; ashok.raj@xxxxxxxxx;
> marc.zyngier@xxxxxxx; tina.zhang@xxxxxxxxx; Linuxarm
> <linuxarm@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>
> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>
[...]
> > >>> I am trying to get this running on one of our platform that has smmuv3
> dual
> > >>> stage support. I am seeing some issues with this when an ixgbe vf dev is
> > >>> made pass-through and is behind a vSMMUv3 in Guest.
> > >>>
> > >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> > >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> > >>>
> > >>> And this is my Qemu cmd line,
> > >>>
> > >>> ./qemu-system-aarch64
> > >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host
> \
> > >>> -kernel Image \
> > >>> -drive if=none,file=ubuntu,id=fs \
> > >>> -device virtio-blk-device,drive=fs \
> > >>> -device vfio-pci,host=0000:01:10.1 \
> > >>> -bios QEMU_EFI.fd \
> > >>> -net none \
> > >>> -m 4G \
> > >>> -nographic -D -d -enable-kvm \
> > >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> > >>>
> > >>> The basic ping from Guest works fine,
> > >>> root@ubuntu:~# ping 10.202.225.185
> > >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> > >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> > >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> > >>> ...
> > >>>
> > >>> But if I increase ping packet size,
> > >>>
> > >>> root@ubuntu:~# ping -s 1024 10.202.225.185
> > >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> > >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> > >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> > >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> > >>>
> > >>> And from Host kernel I get,
> > >>> [ 819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
> detected
> > >>> [ 824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [ 828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [ 830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets
> detected
> > >>> [ 832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [ 834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
> detected
> > >>>
> > >>> Also noted that iperf cannot work as it fails to establish the connection
> > with
> > >> iperf
> > >>> server.
> > >>>
> > >>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> > >> reference.
> > >>> I haven't debugged this further yet and thought of checking with you if
> this
> > is
> > >>> something you have seen already or not. Or maybe I am missing
> something
> > >> here?
> > >>
> > >> Please can you try to edit and modify hw/vfio/common.c, function
> > >> vfio_iommu_unmap_notify
> > >>
> > >>
> > >> /*
> > >> if (size <= 0x10000) {
> > >> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> > >> ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
> > >> ustruct.info.addr_info.flags =
> > IOMMU_INV_ADDR_FLAGS_ARCHID;
> > >> if (iotlb->leaf) {
> > >> ustruct.info.addr_info.flags |=
> > >> IOMMU_INV_ADDR_FLAGS_LEAF;
> > >> }
> > >> ustruct.info.addr_info.archid = iotlb->arch_id;
> > >> ustruct.info.addr_info.addr = start;
> > >> ustruct.info.addr_info.granule_size = size;
> > >> ustruct.info.addr_info.nb_granules = 1;
> > >> trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
> > >> iotlb->leaf);
> > >> } else {
> > >> */
> > >> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> > >> ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
> > >> ustruct.info.pasid_info.archid = iotlb->arch_id;
> > >> ustruct.info.pasid_info.flags =
> > IOMMU_INV_PASID_FLAGS_ARCHID;
> > >> trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
> > >> // }
> > >>
> > >> This modification leads to invalidate the whole asid each time we get a
> > >> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
> > >> saw this was the cause of such kind of issues. Please let me know if it
> > >> fixes your perf issues
> > >
> > > Yes, this seems to fix the issue.
> > >
> > > root@ubuntu:~# iperf -c 10.202.225.185
> > > ------------------------------------------------------------
> > > Client connecting to 10.202.225.185, TCP port 5001
> > > TCP window size: 85.0 KByte (default)
> > > ------------------------------------------------------------
> > > [ 3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port
> > 5001
> > > [ ID] Interval Transfer Bandwidth
> > > [ 3] 0.0-10.0 sec 2.27 GBytes 1.95 Gbits/sec
> > > root@ubuntu:~#
> > >
> > > But the performance seems to be very poor as this is a 10Gbps interface(Of
> > course
> > > invalidating the whole asid may not be very helpful). It is interesting that
> why
> > the
> > > single iova invalidation is not working.
> > >
> > > and then we may discuss further about the test
> > >> configuration.
> > >
> > > Sure. Please let me know.
> >
> > I reported that issue earlier on the ML. I have not been able to find
> > any integration issue in the kernel/qemu code but maybe I am too blind
> > now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down
> > to the physical IOMMU. I also pass the LEAF flag.
>
> Ok.
>
> > As you are an expert of the SMMUv3 PMU, if your implementation has any
> > and you have cycles to look at this, it would be helpful to run it and
> > see if something weird gets highlighted.
>
> :). Sure. I will give it a try and report back if anything suspicious.

I just noted that CMDQ_OP_TLBI_NH_VA is missing the vmid filed which seems
to be the cause for single IOVA TLBI not working properly.

I had this fix in arm-smmuv3.c,

@@ -947,6 +947,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
break;
case CMDQ_OP_TLBI_NH_VA:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;


With this, your original qemu branch is working.

root@ubuntu:~# iperf -c 10.202.225.185
------------------------------------------------------------
Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.202.225.169 port 44894 connected with 10.202.225.185 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 3.21 GBytes 2.76 Gbits/sec

Could you please check this...

I also have a rebase of your patches on top of 5.4-rc5. This has some optimizations
>From Will such as batched TLBI inv. Please find it here,

https://github.com/hisilicon/kernel-dev/tree/private-vSMMUv3-v9-v5.4-rc5

This gives me a better performance with iperf,

root@ubuntu:~# iperf -c 10.202.225.185
------------------------------------------------------------
Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.202.225.169 port 55450 connected with 10.202.225.185 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 4.91 GBytes 4.22 Gbits/sec root@ubuntu:~#

If possible please check this branch as well.

Thanks,
Shameer

> Thanks,
> Shameer
>
>
> > Thanks
> >
> > Eric
> > >
> > > Cheers,
> > > Shameer
> > >
> > >> Thanks
> > >>
> > >> Eric
> > >>
> > >>
> > >>
> > >>>
> > >>> Please let me know.
> > >>>
> > >>> Thanks,
> > >>> Shameer
> > >>>
> > >>>> Best Regards
> > >>>>
> > >>>> Eric
> > >>>>
> > >>>> This series can be found at:
> > >>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> > >>>>
> > >>>> It series includes Tina's patch steming from
> > >>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
> > >>>> specific irq" plus patches originally contributed by Yi.
> > >>>>
> > >>>> History:
> > >>>>
> > >>>> v8 -> v9:
> > >>>> - introduce specific irq framework
> > >>>> - single fault region
> > >>>> - iommu_unregister_device_fault_handler failure case not handled
> > >>>> yet.
> > >>>>
> > >>>> v7 -> v8:
> > >>>> - rebase on top of v5.2-rc1 and especially
> > >>>> 8be39a1a04c1 iommu/arm-smmu-v3: Add a master->domain
> pointer
> > >>>> - dynamic alloc of s1_cfg/s2_cfg
> > >>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
> > >>>> - check there is no HW MSI regions
> > >>>> - asid invalidation using pasid extended struct (change in the uapi)
> > >>>> - add s1_live/s2_live checks
> > >>>> - move check about support of nested stages in domain finalise
> > >>>> - fixes in error reporting according to the discussion with Robin
> > >>>> - reordered the patches to have first iommu/smmuv3 patches and then
> > >>>> VFIO patches
> > >>>>
> > >>>> v6 -> v7:
> > >>>> - removed device handle from bind/unbind_guest_msi
> > >>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
> > >>>> enforcement"
> > >>>> - added few uapi comments as suggested by Jean, Jacop and Alex
> > >>>>
> > >>>> v5 -> v6:
> > >>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
> > >>>>
> > >>>> v4 -> v5:
> > >>>> - fix bug reported by Vincent: fault handler unregistration now happens
> in
> > >>>> vfio_pci_release
> > >>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
> > >>>> uapi changes suggested by Kean-Philippe (except fetch_addr)
> > >>>> - iommu: introduce device fault report API: removed the PRI part.
> > >>>> - see individual logs for more details
> > >>>> - reset the ste abort flag on detach
> > >>>>
> > >>>> v3 -> v4:
> > >>>> - took into account Alex, jean-Philippe and Robin's comments on v3
> > >>>> - rework of the smmuv3 driver integration
> > >>>> - add tear down ops for msi binding and PASID table binding
> > >>>> - fix S1 fault propagation
> > >>>> - put fault reporting patches at the beginning of the series following
> > >>>> Jean-Philippe's request
> > >>>> - update of the cache invalidate and fault API uapis
> > >>>> - VFIO fault reporting rework with 2 separate regions and one
> mmappable
> > >>>> segment for the fault queue
> > >>>> - moved to PATCH
> > >>>>
> > >>>> v2 -> v3:
> > >>>> - When registering the S1 MSI binding we now store the device handle.
> > This
> > >>>> addresses Robin's comment about discimination of devices beonging
> > to
> > >>>> different S1 groups and using different physical MSI doorbells.
> > >>>> - Change the fault reporting API: use
> VFIO_PCI_DMA_FAULT_IRQ_INDEX
> > to
> > >>>> set the eventfd and expose the faults through an mmappable fault
> > region
> > >>>>
> > >>>> v1 -> v2:
> > >>>> - Added the fault reporting capability
> > >>>> - asid properly passed on invalidation (fix assignment of multiple
> > >>>> devices)
> > >>>> - see individual change logs for more info
> > >>>>
> > >>>>
> > >>>> Eric Auger (8):
> > >>>> vfio: VFIO_IOMMU_SET_MSI_BINDING
> > >>>> vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
> > >>>> vfio/pci: Register an iommu fault handler
> > >>>> vfio/pci: Allow to mmap the fault queue
> > >>>> vfio: Add new IRQ for DMA fault reporting
> > >>>> vfio/pci: Add framework for custom interrupt indices
> > >>>> vfio/pci: Register and allow DMA FAULT IRQ signaling
> > >>>> vfio: Document nested stage control
> > >>>>
> > >>>> Liu, Yi L (2):
> > >>>> vfio: VFIO_IOMMU_SET_PASID_TABLE
> > >>>> vfio: VFIO_IOMMU_CACHE_INVALIDATE
> > >>>>
> > >>>> Tina Zhang (1):
> > >>>> vfio: Use capability chains to handle device specific irq
> > >>>>
> > >>>> Documentation/vfio.txt | 77 ++++++++
> > >>>> drivers/vfio/pci/vfio_pci.c | 283
> > >> ++++++++++++++++++++++++++--
> > >>>> drivers/vfio/pci/vfio_pci_intrs.c | 62 ++++++
> > >>>> drivers/vfio/pci/vfio_pci_private.h | 24 +++
> > >>>> drivers/vfio/pci/vfio_pci_rdwr.c | 45 +++++
> > >>>> drivers/vfio/vfio_iommu_type1.c | 166 ++++++++++++++++
> > >>>> include/uapi/linux/vfio.h | 109 ++++++++++-
> > >>>> 7 files changed, 747 insertions(+), 19 deletions(-)
> > >>>>
> > >>>> --
> > >>>> 2.20.1
> > >>>>
> > >>>> _______________________________________________
> > >>>> kvmarm mailing list
> > >>>> kvmarm@xxxxxxxxxxxxxxxxxxxxx
> > >>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> > >