Re: [PATCH v3 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs

From: Matt Evans

Date: Mon Jun 15 2026 - 14:05:18 EST


Hi Kevin,

On 12/06/2026 09:42, Tian, Kevin wrote:
>> From: Matt Evans <matt@xxxxxxxxxx>
>> Sent: Wednesday, June 10, 2026 11:43 PM
>>
>> +int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *priv,
>> + struct vm_area_struct *vma,
>> + unsigned long address,
>> + unsigned int order,
>> + unsigned long *out_pfn)
>> +{
>> + /*
>> + * Given a VMA (start, end, pgoffs) and a fault address,
>> + * search the corresponding DMABUF's phys_vec[] to find the
>> + * range representing the address's offset into the VMA, and
>> + * its PFN.
>> + *
>> + * The phys_vec[] ranges represent contiguous spans of VAs
>> + * upwards from the buffer offset 0; the actual PFNs might be
>> + * in any order, overlap/alias, etc. Calculate an offset of
>> + * the desired page given VMA start/pgoff and address, then
>> + * search upwards from 0 to find which span contains it.
>> + *
>> + * On success, a valid PFN for a page sized by 'order' is
>> + * returned into out_pfn.
>> + *
>> + * Failure occurs if:
>> + * - The page would cross the edge of the VMA
>> + * - The page isn't entirely contained within a range
>> + * - We find a range, but the final PFN isn't aligned to the
>> + * requested order.
>> + *
>> + * (Upon failure, the caller is expected to try again with a
>> + * smaller order; the tests above will always succeed for
>> + * order=0 as the limit case.)
>> + *
>> + * It's suboptimal if DMABUFs are created with neigbouring
>
> s/neigbouring/neighboring/

Ah, not a typo. :) That is en_GB and AFAIK is permitted.

>> + * ranges that are physically contiguous, since hugepages
>> + * can't straddle range boundaries. (The construction of the
>> + * ranges vector should merge such ranges.)
>
> though the field is called 'phys_vec', removing 'vector' in description
> is clearer here.

Fair, reworded.

>> + *
>> + * Finally, vma_pgoff_adjust is used for a DMABUF representing
>> + * a VFIO BAR mmap, which is created from the start of the
>> + * offset region.
>
> Elaborate it a little bit that the vm_pgoff is already counted in paddr
> of phys_vec so it should be skipped when finding the pfn.

OK! Expanded this paragraph slightly to explain that vma_pgoff_adjust
avoids double-accounting, and that a BAR mmap() DMABUF is created such
that the start of the VMA (even with an offset) equals the start of the
DMABUF and equals the start of the physical range.

>> + */
>> +
>> + const unsigned long pagesize = PAGE_SIZE << order;
>> + unsigned long vma_off = ((vma->vm_pgoff - priv->vma_pgoff_adjust)
>> <<
>> + PAGE_SHIFT) & VFIO_PCI_OFFSET_MASK;
>> + unsigned long rounded_page_addr = ALIGN_DOWN(address,
>> pagesize);
>> + unsigned long rounded_page_end = rounded_page_addr + pagesize;
>> + unsigned long page_buf_offset;
>> + unsigned long page_buf_offset_end;
>
> what about "fault_offset[_end]"? page_buf is a bit confusing.

I went round several times with these names, thanks for the input. Just
tried it out and your suggestion is clearer.

>> + unsigned long range_buf_offset = 0;
>
> could this be called 'range_start' then the 'range_start' in latter loop
> is renamed to 'phys_start'?
>
> Not strong... just feel such naming helps me understand the logic easier

Anything that helps helps, thanks. I ended up renaming this to
range_start_offset (as offset is IMHO important).

I'm a fan of diagrams but this is too large to include in a comment.
But for posterity on the list, and using the new names, an illustration
of a DMABUF with 3 ranges in phys_vec, where a mapping's
faulting page offset lies in range [1]:

fault_addr--+
v VMA
+-----------------+----------+-----------------+
| | Faulting | |
| | (hg)page | |
| | | |
|---- vma_off ---->+-----------------+----------+-----------------+
| . .
| . .
|--------- fault_offset ------------>. . DMABUF
+-------------------------+---------------------------+--------------+
| phys_vec[0] | phys_vec[1] . | phys_vec[2] |
| .paddr | . . | |
| .len | . . | |
+-------------------------+---------------------------+--------------+
0 : . . : L
|-- range_start_offset -->: . . -->: range_len
: . . :
V . . :
+----------+----------+-----+
|.paddr | PFN | |
| | | |
| | | |
+----------+----------+-----+
P

P = paddr + (fault_offset - range_start_offset)
L = sum(phys_vec[0...2].len)

>> + unsigned int i;
>> +
>> + if (rounded_page_addr < vma->vm_start || rounded_page_end >
>> vma->vm_end) {
>> + if (order > 0)
>> + return -EAGAIN;
>> +
>> + /* A fault address outside of the VMA is absurd. */
>> + WARN(1, "Fault addr 0x%lx outside VMA 0x%lx-0x%lx\n",
>> + address, vma->vm_start, vma->vm_end);
>> + return -EFAULT;
>> + }
>> +
>> + /*
>> + * page_buff_offset[_end] is the span of DMABUF offsets
>> + * corresponding to the faulting page:
>> + */
>
> if the naming is kept then s/page_buff_offset/page_buf_offset/
>
> otherwise,
>
> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>

Thank you,


Matt