Re: [PATCH] vfio: Request THP-aligned mmap for device fds
From: Lorenzo Stoakes
Date: Thu Jun 18 2026 - 12:02:31 EST
On Thu, Jun 18, 2026 at 12:30:49PM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 18, 2026 at 04:04:06PM +0100, Matthew Wilcox wrote:
> > On Thu, Jun 18, 2026 at 03:55:58PM +0100, Lorenzo Stoakes wrote:
> > > On Wed, Jun 17, 2026 at 04:29:28PM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Jun 17, 2026 at 07:34:06PM +0100, Matthew Wilcox wrote:
> > > >
> > > > > I don't see this as being something that drivers should be involved with
> > > > > at all. The MM should be able to get this right without any hints from
> > > > > the file-provider. Yes, that means I also want to get rid of the setting
> > > > > of get_unmapped_area in ext4/xfs/other filesystems.
> > > > >
> > > > > Looking at generic_get_unmapped_area_topdown(), I think we can do this by
> > > > > making an additional call to vm_unmapped_area() before the existing two,
> > > > > setting info.align_mask and info.align_offset appropriately.
> > > > >
> > > > > Now, what's "appropriately"? I think it's based on length (>= PMD_SIZE,
> > > > > then >= PUD_SIZE), but we should also take CONTPTE architectures into
> > > > > account.
> > > >
> > > > The info.align_mask and info.align_offset do need information from the
> > > > driver based on what it intends to map into the VMA that is being
> > > > created.
> >
> > What you're saying is that offset 0 of the opened file might correspond
> > to a PFN that is not aligned in any way? I had assumed that when trying
> > to do the mapping of (2MB+4KiB to 64MB), that the offset specified to
> > mmap was 2MB+4KiB. But you seem to be saying that the offset in that
> > case would be 0 and someone needs to know that it corresponds to a PFN
> > that is misaligned?
>
> I do expect that the pgoff space is usually aligned to the pfn space,
> most drivers do that or could be improved to do that. There will be
> some off cases, but maybe we don't care, and VFIO should be fine.
Some stuff has weird assumptions about pfn=0 at start of the range (DMA for
instance).
Presumably not applicable to VFIO but that's a thing we need to stop
doing... (I have some patches I deferred from a while back changing the DMA
stuff).
>
> That is certainly an easier place to start.
>
> Jason
Thanks, Lorenzo