Re: [PATCH] vfio: Request THP-aligned mmap for device fds
From: Matthew Wilcox
Date: Thu Jun 18 2026 - 11:08:19 EST
On Thu, Jun 18, 2026 at 03:55:58PM +0100, Lorenzo Stoakes wrote:
> On Wed, Jun 17, 2026 at 04:29:28PM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 17, 2026 at 07:34:06PM +0100, Matthew Wilcox wrote:
> >
> > > I don't see this as being something that drivers should be involved with
> > > at all. The MM should be able to get this right without any hints from
> > > the file-provider. Yes, that means I also want to get rid of the setting
> > > of get_unmapped_area in ext4/xfs/other filesystems.
> > >
> > > Looking at generic_get_unmapped_area_topdown(), I think we can do this by
> > > making an additional call to vm_unmapped_area() before the existing two,
> > > setting info.align_mask and info.align_offset appropriately.
> > >
> > > Now, what's "appropriately"? I think it's based on length (>= PMD_SIZE,
> > > then >= PUD_SIZE), but we should also take CONTPTE architectures into
> > > account.
> >
> > The info.align_mask and info.align_offset do need information from the
> > driver based on what it intends to map into the VMA that is being
> > created.
What you're saying is that offset 0 of the opened file might correspond
to a PFN that is not aligned in any way? I had assumed that when trying
to do the mapping of (2MB+4KiB to 64MB), that the offset specified to
mmap was 2MB+4KiB. But you seem to be saying that the offset in that
case would be 0 and someone needs to know that it corresponds to a PFN
that is misaligned?
> > Filesystems probably have quite different requirements than drivers
> > using remap_pfn() or vmf_insert_pfn() that have locked down pfn's.
>
> I think part of the problem here is that we don't differentiate between
> drivers and filesystems, and what might be sensible for one is perhaps not
> sensible for another.
>
> We're too generic really.
>
> With mmap_prepare we have a lot of flexibility as to what we do. That
> callback is idempotent and as limited as possible, and actions like remap
> are achieved through calling a kernel function like mmap-action_remap().
mmap_prepare() is called too late. We've already assigned the virtual
address range before we call __mmap_region(), and there's no attempt to
adjust 'addr' in __mmap_region() after calling mmap_prepare().