Re: [PATCH v4 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
From: Wei Wang
Date: Sun Dec 21 2025 - 21:14:31 EST
On 12/17/25 12:24 PM, Alexey Kardashevskiy wrote:
On 17/12/25 03:13, Wei Wang wrote:
Before requesting the IOMMU driver to map an IOVA to a physical address,
set the IOMMU_MMIO flag in dma->prot when the physical address corresponds
to MMIO. This allows the IOMMU driver to handle MMIO mappings specially.
For example, on AMD CPUs with SME enabled, the IOMMU driver avoids setting
the C-bit if iommu_map() is called with IOMMU_MMIO set in prot. This
prevents issues with PCIe P2P communication when IOVA is used.
Signed-off-by: Wei Wang <wei.w.wang@xxxxxxxxxxx>
Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>
---
drivers/vfio/vfio_iommu_type1.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/ vfio_iommu_type1.c
index 5167bec14e36..dfe53da53b80 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -583,7 +583,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
* returned initial pfn are provided; subsequent pfns are contiguous.
*/
static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
- unsigned long npages, int prot, unsigned long *pfn,
+ unsigned long npages, int *prot, unsigned long *pfn,
struct vfio_batch *batch)
{
unsigned long pin_pages = min_t(unsigned long, npages, batch- >capacity);
@@ -591,7 +591,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
unsigned int flags = 0;
long ret;
- if (prot & IOMMU_WRITE)
+ if (*prot & IOMMU_WRITE)
flags |= FOLL_WRITE;
mmap_read_lock(mm);
@@ -601,6 +601,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
*pfn = page_to_pfn(batch->pages[0]);
batch->size = ret;
batch->offset = 0;
+ *prot &= ~IOMMU_MMIO;
Do you expect IOMMU_MMIO here, why?
Then, what if this vaddr_get_pfns() called with vaddr which is some RAM immediately followed by MMIO? The whole vfio_dma descriptor will get IOMMU_MMIO, hardly desirable (also quite unlikely though).
Yeah, thanks for pointing this out. The current implementation logic allows the adjacent RAM and MMIO address ranges to be handled via separate vfio_iommu_map() calls (in vfio_pin_map_dma()). Given the issues you mentioned above, it might not be appropriate to clear the IOMMU_MMIO flag here. I’m considering removing the above IOMMU_MMIO
flag clearing in vaddr_get_pfns() and adding the following changes
to vfio_pin_pages_remote():
out:
dma->has_rsvd |= rsvd;
+ if (!rsvd)
+ dma->prot &= ~IOMMU_MMIO;
ret = vfio_lock_acct(dma, lock_acct, false);
unpin_out:
if (ret < 0) {
if (pinned && !rsvd) {
for (pfn = *pfn_base ; pinned ; pfn++, pinned--)
put_pfn(pfn, dma->prot);
}
vfio_batch_unpin(batch, dma);
return ret;
}
return pinned;
}
*pfn_base is the address that will be returned to vfio_pin_map_dma() to do vfio_iommu_map, and rsvd indicates the status of *pfn_base — MMIO addresses are guaranteed to have rsvd=true. Thus, when !rsvd, *pfn_base is not MMIO, and the IOMMU_MMIO flag needs to be cleared.
Then revisit the two corner cases:
- RAM immediately followed by MMIO: The first call to vaddr_get_pfns() will set *pfn_base to a RAM physical address, and rsvd will be updated to false. In the subsequent iteration of the “while (npage)” loop, vaddr_get_pfns() will detect an MMIO address and update dma->prot to include IOMMU_MMIO. Since the value returned to vfio_pin_map_dma() corresponds to a RAM’s pfn (with rsvd=false) obtained in the first call above, the IOMMU_MMIO flag will be cleared when going to “out:”.
- MMIO immediately followed by RAM: The first vaddr_get_pfns() invocation will identify an MMIO address, set *pfn_base to this MMIO address, and then mark rsvd=true. When going to “out:”, the IOMMU_MMIO flag will remain set.