Re: [PATCH] vfio: fix VFIO_IOMMU_UNMAP_DMA when end of range would overflow u64
From: Alejandro Jimenez
Date: Mon Oct 06 2025 - 21:24:08 EST
Hi Alex, Jason,
On 10/6/25 8:39 PM, Alex Mastro wrote:
On Mon, Oct 06, 2025 at 07:50:39PM -0300, Jason Gunthorpe wrote:
Could we block right at the ioctl inputs that end at ULONG_MAX? Maybe
that is a good enough fix?
That would be a simpler, and perhaps less risky fix than surgically fixing every
affected place.
If going this way, we'd also have to deny the MAP requests. Right now we have this problem because you can map a range ending at the top of the address space, but not unmap it. The map operation doesn't overflow because vfio_dma_do_map() calls vfio_link_dma() with dma->size == 0, and it grows the size later in vfio_pin_map_dma()...
https://elixir.bootlin.com/linux/v6.17.1/source/drivers/vfio/vfio_iommu_type1.c#L1676-L1677
If we were to do that, I think VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE shouldAgree.
report coherent limits -- i.e. it should "lie" and say the last page (or
whatever) of u64 addressable space is inaccessible so that the UAPI is coherent
with itself. It should be legal to map/unmap iova ranges up to the limits it
claims to support.
I have doubts that anyone actually relies on MAP_DMA-ing such
end-of-u64-mappings in practice, so perhaps it's OK?
The AMD IOMMU supports a 64-bit IOVA, so when using the AMD vIOMMU with DMA remapping enabled + VF passthrough + a Linux guest with iommu.forcedac=1 we hit this issue since the driver (mlx5) starts requesting mappings for IOVAs right at the top of the address space.
I mentioned this issue on the cover letter for:
https://lore.kernel.org/qemu-devel/20250919213515.917111-1-alejandro.j.jimenez@xxxxxxxxxx/
and linked to my candidate fix:
https://github.com/aljimenezb/linux/commit/014be8cafe7464d278729583a2dd5d94514e2e2a
The reason why I hadn't send it to the list yet is because I noticed the additional locations Jason mentioned earlier in the thread (e.g. vfio_find_dma(), vfio_iommu_replay()) and wanted to put together a reproducer that also triggered those paths.
I mentioned in the notes for the patch above why I chose a slightly more complex method than the '- 1' approach, since there is a chance that iova+size could also go beyond the end of the address space and actually wrap around.
My goal was not to trust the inputs at all if possible. We could also use check_add_overflow() if we want to add explicit error reporting in vfio_iommu_type1 when an overflow is detected. i.e. catch bad callers that don't sanitize their inputs as opposed to letting them handle the error on the return path...
Thank you,
Alejandro