Re: [PATCH 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush
From: Jason Gunthorpe
Date: Fri Jul 04 2025 - 09:38:26 EST
On Fri, Jul 04, 2025 at 09:30:56PM +0800, Lu Baolu wrote:
> The vmalloc() and vfree() functions manage virtually contiguous, but not
> necessarily physically contiguous, kernel memory regions. When vfree()
> unmaps such a region, it tears down the associated kernel page table
> entries and frees the physical pages.
>
> In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware
> shares and walks the CPU's page tables. Architectures like x86 share
> static kernel address mappings across all user page tables, allowing the
> IOMMU to access the kernel portion of these tables.
>
> Modern IOMMUs often cache page table entries to optimize walk performance,
> even for intermediate page table levels. If kernel page table mappings are
> changed (e.g., by vfree()), but the IOMMU's internal caches retain stale
> entries, Use-After-Free (UAF) vulnerability condition arises. If these
> freed page table pages are reallocated for a different purpose, potentially
> by an attacker, the IOMMU could misinterpret the new data as valid page
> table entries. This allows the IOMMU to walk into attacker-controlled
> memory, leading to arbitrary physical memory DMA access or privilege
> escalation.
>
> To mitigate this, introduce a new iommu interface to flush IOMMU caches
> and fence pending page table walks when kernel page mappings are updated.
> This interface should be invoked from architecture-specific code that
> manages combined user and kernel page tables.
>
> Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices")
> Cc: stable@xxxxxxxxxxxxxxx
> Co-developed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> ---
> arch/x86/mm/tlb.c | 2 ++
> drivers/iommu/iommu-sva.c | 32 +++++++++++++++++++++++++++++++-
> include/linux/iommu.h | 4 ++++
> 3 files changed, 37 insertions(+), 1 deletion(-)
Reported-by: Jann Horn <jannh@xxxxxxxxxx>
> @@ -1540,6 +1541,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> kernel_tlb_flush_range(info);
>
> put_flush_tlb_info();
> + iommu_sva_invalidate_kva_range(start, end);
> }
This is much less call sites than I guessed!
> +void iommu_sva_invalidate_kva_range(unsigned long start, unsigned long end)
> +{
> + struct iommu_mm_data *iommu_mm;
> +
> + might_sleep();
> +
> + if (!static_branch_unlikely(&iommu_sva_present))
> + return;
> +
> + guard(mutex)(&iommu_sva_lock);
> + list_for_each_entry(iommu_mm, &iommu_sva_mms, mm_list_elm)
> + mmu_notifier_arch_invalidate_secondary_tlbs(iommu_mm->mm, start, end);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_invalidate_kva_range);
I don't think it needs to be exported it only arch code is calling it?
Looks Ok to me:
Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
Jason