Re: [PATCH 7/9] vfio/iommufd: use iova_to_phys_length for efficient unmap

From: Jason Gunthorpe

Date: Sun May 31 2026 - 19:58:43 EST


On Sun, May 31, 2026 at 05:36:35PM +0800, Guanghui Feng wrote:
> /*
> - * This is pretty slow, it would be nice to get the page size
> - * back from the driver, or have the driver directly fill the
> - * batch.
> + * Use iova_to_phys_length to get both the physical address
> + * and the PTE page size in a single page table walk, allowing
> + * us to skip ahead by the contiguous region size instead of
> + * walking the page tables for every PAGE_SIZE step.
> */
> - phys = iommu_iova_to_phys(domain, iova) - page_offset;
> - if (!batch_add_pfn(batch, PHYS_PFN(phys)))
> - return;
> - iova += PAGE_SIZE - page_offset;
> + phys = iommu_iova_to_phys_length(domain, iova, &pgsize) -
> + page_offset;
> + if (!pgsize || pgsize < PAGE_SIZE)
> + pgsize = PAGE_SIZE;

It is actually a bug if it returns something < PAGE_SIZE, it should
WARN_ON and try to continue.

> @@ -1177,25 +1177,41 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
>
> iommu_iotlb_gather_init(&iotlb_gather);
> while (pos < dma->size) {
> - size_t unmapped, len;
> + size_t unmapped, len, pgsize;
> phys_addr_t phys, next;
> dma_addr_t iova = dma->iova + pos;
>
> - phys = iommu_iova_to_phys(domain->domain, iova);
> + /* Single page table walk returns both phys and PTE size */
> + phys = iommu_iova_to_phys_length(domain->domain, iova,
> + &pgsize);
> if (WARN_ON(!phys)) {
> pos += PAGE_SIZE;
> continue;
> }
> + if (!pgsize || pgsize < PAGE_SIZE)
> + pgsize = PAGE_SIZE;
>
> /*
> * To optimize for fewer iommu_unmap() calls, each of which
> * may require hardware cache flushing, try to find the
> * largest contiguous physical memory chunk to unmap.
> + *
> + * Calculate remaining contiguous bytes within this PTE from
> + * our position, then try to join following physically
> + * contiguous PTEs.
> */
> - for (len = PAGE_SIZE; pos + len < dma->size; len += PAGE_SIZE) {
> - next = iommu_iova_to_phys(domain->domain, iova + len);
> + len = pgsize - (iova & (pgsize - 1));
> + for (; pos + len < dma->size; ) {
> + size_t next_pgsize;

Things should be arranged so the iommu_iova_to_phys_length() always
returns the best length, either because it called into iommupt to get
it or because it accumulated internally on an old driver.

Probably to make this work well the API should include the last
address to reach so it can stop iterating at the right point.

Jason