RE: [PATCH 2/2] iommu/vt-d: Remove caching mode check before devtlb flush

From: Tian, Kevin
Date: Tue Apr 09 2024 - 03:30:29 EST


> From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Sunday, April 7, 2024 10:43 PM
>
> The Caching Mode (CM) of the Intel IOMMU indicates if the hardware
> implementation caches not-present or erroneous translation-structure
> entries except the first-stage translation. The caching mode is
> unrelated to the device TLB , therefore there is no need to check
> it before a device TLB invalidation operation.
>
> Before the scalable mode is introduced, caching mode is treated as
> an indication that the driver is running in a VM guest. This is just
> a software contract as shadow page table is the only way to implement
> a virtual IOMMU. But the VT-d spec doesn't state this anywhere. After
> the scalable mode is introduced, this doesn't stand for anymore, as
> caching mode is not relevant for the first-stage translation. A virtual
> IOMMU implementation is free to support first-stage translation only
> with caching mode cleared.

I didn't get the connection to the scalable mode.

if required we can still use caching mode to imply running as a guest.
Just need to make sure its implementation conforming to the VT-d spec.

>
> Remove the caching mode check before device TLB invalidation to ensure
> compatibility with the scalable mode use cases.
>
> Fixes: 792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by
> default")
> Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> ---
> drivers/iommu/intel/iommu.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 493b6a600394..681789b1258d 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1501,7 +1501,7 @@ static void iommu_flush_iotlb_psi(struct
> intel_iommu *iommu,
> else
> __iommu_flush_iotlb_psi(iommu, did, pfn, pages, ih);
>
> - if (!cap_caching_mode(iommu->cap) && !map)
> + if (!map)
> iommu_flush_dev_iotlb(domain, addr, mask);

as commented earlier better squash this in patch1.

> }
>
> @@ -1575,8 +1575,7 @@ static void intel_flush_iotlb_all(struct
> iommu_domain *domain)
> iommu->flush.flush_iotlb(iommu, did, 0, 0,
> DMA_TLB_DSI_FLUSH);
>
> - if (!cap_caching_mode(iommu->cap))
> - iommu_flush_dev_iotlb(dmar_domain, 0,
> MAX_AGAW_PFN_WIDTH);
> + iommu_flush_dev_iotlb(dmar_domain, 0,
> MAX_AGAW_PFN_WIDTH);
> }
>

I'm hesitating to agree with this change. Strictly speaking it's correct.
but w/o supporting batch invalidation this implies performance drop
on viommu due to more VM-exits and there may incur user complaints
when their VMs upgrade to a newer kernel version.

So it'd be better to keep this behavior and fix it together with batch
invalidation support. Anyway none of the viommu implementations
today (either shadow or nested translation) relies on the correct
devtlb behavior from the guest otherwise it's already broken.