Re: [PATCH v2] mm: limit filemap_fault readahead to VMA boundaries

From: Frederick Mayle

Date: Mon Apr 27 2026 - 19:00:02 EST

On Mon, Apr 27, 2026 at 9:23 AM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote:
>
> On Mon, Apr 27, 2026 at 5:41 AM 'David Hildenbrand (Arm)' via
> android-mm <android-mm@xxxxxxxxxx> wrote:
> >
> > On 4/27/26 05:01, Frederick Mayle wrote:
> > > When a file mapping covers a strict subset of a file, an access to the
> > > mapping can trigger readahead of file pages outside the mapped region.
> > > Readahead is meant to prefetch pages likely to be accessed soon, but
> > > these pages aren't accessible via the same means, so it fair to say we
> > > don't have a good indicator they'll be accessed soon. Take an ELF file
> > > for example: An access to the end of a program's read-only segment isn't
> > > a sign that nearby file contents will be accessed next (they are likely
> > > to be mapped discontiguously, or not at all). The pressure from loading
> > > these pages into the cache can evict more useful pages.
> > >
> > > To improve the behavior, make three changes:
> > >
> > > * Introduce a new readahead_control field, max_index, as a hard limit on
> > > the readahead. The existing file_ra_state->size can't be used as a
> > > limit, it is more of a hint and can be increased by various
> > > heuristics.
> > > * Set readahead_control->max_index to the end of the VMA in all of the
> > > readahead paths that can be triggered from a fault on a file mapping
> > > (both "sync" and "async" readahead).
> > > * Limit the read-around range start to the VMA's start.
> > >
> > > Note that these changes only affect readahead triggered in the context
> > > of a fault, they do not affect readahead triggered by read syscalls. If
> > > a user mixes the two types of accesses, the behavior is expected to be
> > > the following: if a fault causes readahead and places a PG_readahead
> > > marker and then a read(2) syscall hits the PG_readahead marker, the
> > > resulting async readahead *will not* be limited to the VMA end.
> > > Conversely, if a read(2) syscall places a PG_readahead marker and then a
> > > fault hits the marker, the async readahead *will* be limited to the VMA
> > > end.
> > >
> > > There is an edge case that the above motivation glosses over: A single
> > > file mapping might be backed by multiple VMAs. For example, a whole file
> > > could be mapped RW, then part of the mapping made RO using mprotect.
> > > This patch would hurt performance of a sequential faulted read of such a
> > > mapping, the degree depending on how fragmented the VMAs are. A usage
> > > pattern like that is likely rare and already suffering from sub-optimal
> > > performance because, e.g., the fragmented VMAs limit the fault-around,
> > > so each VMA boundary in a sequential faulted read would cause a minor
> > > fault. Still, this patch would make it worse. See a previous discussion
> > > of this topic at [1].
> >
> > I agree that workloads that do a lot of mprotect() magic likely do not depend on
> > readahead optimizations.
> >
> > But I'm sure we'll learn quickly if that is not the case :)
>
> Hi David,
>
> There is already this limit for the exec VMAs, so perhaps these use
> cases are in fact rare enough; but we'll need to see ...
>
> https://lore.kernel.org/all/20250609092729.274960-6-ryan.roberts@xxxxxxx/
>
> Frederick, could we also now remove that logic (EXEC mappings)? Maybe
> in a follow up patch.

The VM_EXEC branch is still meaningfully different from the read-around branch,
it sets `ra->order = exec_folio_order()` and `ra->async_size = 0`, but I think
most of the details could be unified. I can try that in a follow up.