Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

From: Auger Eric
Date: Sun Feb 21 2021 - 13:23:15 EST


Hi Shameer,
On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger [mailto:eric.auger@xxxxxxxxxx]
>> Sent: 18 November 2020 11:22
>> To: eric.auger.pro@xxxxxxxxx; eric.auger@xxxxxxxxxx;
>> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>> kvm@xxxxxxxxxxxxxxx; kvmarm@xxxxxxxxxxxxxxxxxxxxx; will@xxxxxxxxxx;
>> joro@xxxxxxxxxx; maz@xxxxxxxxxx; robin.murphy@xxxxxxx;
>> alex.williamson@xxxxxxxxxx
>> Cc: jean-philippe@xxxxxxxxxx; zhangfei.gao@xxxxxxxxxx;
>> zhangfei.gao@xxxxxxxxx; vivek.gautam@xxxxxxx; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
>> jacob.jun.pan@xxxxxxxxxxxxxxx; yi.l.liu@xxxxxxxxx; tn@xxxxxxxxxxxx;
>> nicoleotsuka@xxxxxxxxx; yuzenghui <yuzenghui@xxxxxxxxxx>
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
>
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
>
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
>
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: fast devsel
> Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
>
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
>
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Invalid NUMA socket, default to 0
> EAL: using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
>
> Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
>
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
>
> 2). Guest have kernel driver built-in for pass-through dev.
>
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: bus master, fast devsel, latency 0
> Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
>
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
>
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Invalid NUMA socket, default to 0
> EAL: using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
> hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> hns3vf_init_vf(): Failed to fetch configuration: -62
> hns3vf_dev_init(): Failed to init vf: -62
> EAL: Releasing pci mapped resource for 0000:00:02.0
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> EAL: Requested device 0000:00:02.0 cannot be used
> EAL: Bus (pci) probe failed.
> EAL: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Done
> testpmd>
>
> And in this case, smmu(host) reports a translation fault,
>
> [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
>
> Tested with Intel 82599 card(ixgbevf) as well. but same errror.

So this should be fixed in the next release. The problem came from the
fact the MSI giova was not duly unregistered. When vfio is not in used
on guest side, the guest kernel allocates giovas for MSIs @fffef000 - 40
is the ITS translater offset ;-) - When passthrough is in use, the iova
is allocated @0x8000000. As fffef000 MSI giova was not properly
unregistered, the host kernel used it - despite it has been unmapped by
the guest kernel -, hence the translation fault. So the fix is to
unregister the MSI in the VFIO QEMU code when msix are disabled. So to
me this is a QEMU integration issue.

Thank you very much for testing and reporting!

Thanks

Eric
>
> Not able to root cause the problem yet. With the hope that, this is
> related to tlb entries not being invlaidated properly, I tried explicitly
> issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
> the STE update, but no luck yet :(
>
> Please let me know if I am missing something here or has any clue if you
> can replicate this on your setup.
>
> Thanks,
> Shameer
>
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
>> (including the VFIO part in his last version: v11)
>>
>> The series includes a patch from Jean-Philippe. It is better to
>> review the original patch:
>> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> The VFIO series is sent separately.
>>
>> History:
>>
>> v12 -> v13:
>> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>> reported by Shameer. This urged me to revisit patch 4 into
>> iommu/smmuv3: Allow s1 and s2 configs to coexist where
>> s1_cfg and s2_cfg are not dynamically allocated anymore.
>> Instead I use a new set field in existing structs
>> - fixed 2 others config checks
>> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>> according to the last version
>>
>> v11 -> v12:
>> - rebase on top of v5.10-rc4
>>
>> Eric Auger (14):
>> iommu: Introduce attach/detach_pasid_table API
>> iommu: Introduce bind/unbind_guest_msi
>> iommu/smmuv3: Allow s1 and s2 configs to coexist
>> iommu/smmuv3: Get prepared for nested stage support
>> iommu/smmuv3: Implement attach/detach_pasid_table
>> iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>> iommu/smmuv3: Implement cache_invalidate
>> dma-iommu: Implement NESTED_MSI cookie
>> iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>> iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>> regions
>> iommu/smmuv3: Implement bind/unbind_guest_msi
>> iommu/smmuv3: Report non recoverable faults
>> iommu/smmuv3: Accept configs with more than one context descriptor
>> iommu/smmuv3: Add PASID cache invalidation per PASID
>>
>> Jean-Philippe Brucker (1):
>> iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
>> ++++++++++++++++++--
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
>> drivers/iommu/dma-iommu.c | 142 ++++-
>> drivers/iommu/iommu.c | 105 ++++
>> include/linux/dma-iommu.h | 16 +
>> include/linux/iommu.h | 41 ++
>> include/uapi/linux/iommu.h | 54 ++
>> 7 files changed, 1042 insertions(+), 78 deletions(-)
>>
>> --
>> 2.21.3
>