RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

From: Shameerali Kolothum Thodi
Date: Mon Feb 22 2021 - 03:57:24 EST




> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx]
> Sent: 21 February 2021 18:21
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
> eric.auger.pro@xxxxxxxxx; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> kvmarm@xxxxxxxxxxxxxxxxxxxxx; will@xxxxxxxxxx; joro@xxxxxxxxxx;
> maz@xxxxxxxxxx; robin.murphy@xxxxxxx; alex.williamson@xxxxxxxxxx
> Cc: jean-philippe@xxxxxxxxxx; zhangfei.gao@xxxxxxxxxx;
> zhangfei.gao@xxxxxxxxx; vivek.gautam@xxxxxxx;
> jacob.jun.pan@xxxxxxxxxxxxxxx; yi.l.liu@xxxxxxxxx; tn@xxxxxxxxxxxx;
> nicoleotsuka@xxxxxxxxx; yuzenghui <yuzenghui@xxxxxxxxxx>; Zengtao (B)
> <prime.zeng@xxxxxxxxxxxxx>; linuxarm@xxxxxxxxxxxxx
> Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>
> Hi Shameer,
> On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Eric Auger [mailto:eric.auger@xxxxxxxxxx]
> >> Sent: 18 November 2020 11:22
> >> To: eric.auger.pro@xxxxxxxxx; eric.auger@xxxxxxxxxx;
> >> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >> kvm@xxxxxxxxxxxxxxx; kvmarm@xxxxxxxxxxxxxxxxxxxxx; will@xxxxxxxxxx;
> >> joro@xxxxxxxxxx; maz@xxxxxxxxxx; robin.murphy@xxxxxxx;
> >> alex.williamson@xxxxxxxxxx
> >> Cc: jean-philippe@xxxxxxxxxx; zhangfei.gao@xxxxxxxxxx;
> >> zhangfei.gao@xxxxxxxxx; vivek.gautam@xxxxxxx; Shameerali Kolothum
> >> Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
> >> jacob.jun.pan@xxxxxxxxxxxxxxx; yi.l.liu@xxxxxxxxx; tn@xxxxxxxxxxxx;
> >> nicoleotsuka@xxxxxxxxx; yuzenghui <yuzenghui@xxxxxxxxxx>
> >> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> >>
> >> This series brings the IOMMU part of HW nested paging support
> >> in the SMMUv3. The VFIO part is submitted separately.
> >>
> >> The IOMMU API is extended to support 2 new API functionalities:
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >>
> >> Then those capabilities gets implemented in the SMMUv3 driver.
> >>
> >> The virtualizer passes information through the VFIO user API
> >> which cascades them to the iommu subsystem. This allows the guest
> >> to own stage 1 tables and context descriptors (so-called PASID
> >> table) while the host owns stage 2 tables and main configuration
> >> structures (STE).
> >
> > I am seeing an issue with Guest testpmd run with this series.
> > I have two different setups and testpmd works fine with the
> > first one but not with the second.
> >
> > 1). Guest doesn't have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 0000
> > Flags: fast devsel
> > Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> > Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/0000:00:02.0/driver_override
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
> socket0 -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL: Invalid NUMA socket, default to 0
> > EAL: using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
> 0)
> > EAL: No legacy callbacks, legacy socket not created
> > Interactive-mode selected
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> >
> > Warning! port-topology=paired and odd forward ports number, the last port
> will pair with itself.
> >
> > Configuring Port 0 (socket 0)
> > Port 0: 8E:A6:8C:43:43:45
> > Checking link statuses...
> > Done
> > testpmd>
> >
> > 2). Guest have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 0000
> > Flags: bus master, fast devsel, latency 0
> > Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> > Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> > Kernel driver in use: hns3
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/0000:00:02.0/driver_override
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
> socket0 -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL: Invalid NUMA socket, default to 0
> > EAL: using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
> 0)
> > 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0)
> lost(1) from PF in_irq:0
> > hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> > hns3vf_init_vf(): Failed to fetch configuration: -62
> > hns3vf_dev_init(): Failed to init vf: -62
> > EAL: Releasing pci mapped resource for 0000:00:02.0
> > EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> > EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> > EAL: Requested device 0000:00:02.0 cannot be used
> > EAL: Bus (pci) probe failed.
> > EAL: No legacy callbacks, legacy socket not created
> > testpmd: No probed ethernet devices
> > Interactive-mode selected
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> > Done
> > testpmd>
> >
> > And in this case, smmu(host) reports a translation fault,
> >
> > [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> > [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> > [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> > [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> > [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
> >
> > Tested with Intel 82599 card(ixgbevf) as well. but same errror.
>
> So this should be fixed in the next release. The problem came from the
> fact the MSI giova was not duly unregistered. When vfio is not in used
> on guest side, the guest kernel allocates giovas for MSIs @fffef000 - 40
> is the ITS translater offset ;-) - When passthrough is in use, the iova
> is allocated @0x8000000. As fffef000 MSI giova was not properly
> unregistered, the host kernel used it - despite it has been unmapped by
> the guest kernel -, hence the translation fault. So the fix is to
> unregister the MSI in the VFIO QEMU code when msix are disabled. So to
> me this is a QEMU integration issue.

Super!. I was focusing on the TLBI side and was slightly worried it is somehow
related our specific hardware. That’s a relief :).

Thanks,
Shameer