Re: [LSF/MM/BPF TOPIC] Per-process page size

From: David Hildenbrand (Arm)

Date: Wed Feb 18 2026 - 04:16:56 EST

On 2/18/26 09:58, Dev Jain wrote:

On 18/02/26 2:09 pm, Dev Jain wrote:

On 17/02/26 8:52 pm, Matthew Wilcox wrote:

Please don't use the term "enlighten". Tht's used to describe something
something or other with hypervisors. Come up with a new term or use one
that already exists.

Sure.

That's going to be messy. I don't have a good idea for solving this
problem, but the page cache really isn't set up to change minimum folio
order while the inode is in use.

Holding mapping->invalidate_lock, bumping mapping->min_folio_order and
dropping-rereading the range suffers from a race - filemap_fault operating
on some other partially populated 64K range will observe in filemap_get_folio
that nothing is in the pagecache. Then, it will read the updated min_order
in __filemap_get_folio, then use filemap_add_folio to add a 64K folio, but since
the 64K range is partially populated, we get stuck in an infinite loop due to -EEXIST.

So I figured that deleting the entire pagecache is simpler. We will also bail
out early in __filemap_add_folio if the folio order asked by the caller to
create is less than mapping_min_folio_order. Eventually the caller is going
to read the correct min order. This algorithm avoids the race above, however...

my assumption here was that we are synchronized on mapping->invalidate_lock.
The kerneldoc above read_cache_folio() and some other comments convinced me
of that, but I just checked with a VM_WARN_ON(!is_rwsem_locked()) in
__filemap_add_folio and this doesn't seem to be the case for all code paths...
If the algorithm sounds reasonable, I wonder what is the correct synchronization
mechanism here.

I may have been vague here... to avoid the race I described above, we must
ensure that after all folios have been dropped from pagecache, and min order
is bumped up, no other code path remembers the old order and partially
populates a 64K range. For this we need synchronization.

And I don't think you can reliably do that when other processes might be using the files concurrently.

It's best to start like Ryan suggested: lifting min_order on these systems for now and leaving dynamically switching the min order as future work.

--
Cheers,

David