Re: [PATCH] mm: limit filemap_fault readahead to VMA boundaries
From: Frederick Mayle
Date: Thu Apr 23 2026 - 18:03:12 EST
On Wed, Apr 22, 2026 at 6:32 AM Pedro Falcato <pfalcato@xxxxxxx> wrote:
>
> On Tue, Apr 21, 2026 at 05:56:07PM -0700, Frederick Mayle wrote:
> > When a file mapping covers a strict subset of a file, an access to the
> > mapping can trigger readahead of file pages outside the mapped region.
> > Readahead is meant to prefetch pages likely to be accessed soon, but
> > these pages aren't accessible via the same means, so it fair to say we
> > don't have a good indicator they'll be accessed soon. Take an ELF file
> > for example: An access to the end of a program's read-only segment isn't
> > a sign that nearby file contents will be accessed next (they are likely
> > to be mapped discontiguously, or not at all). The pressure from loading
> > these pages into the cache can evict more useful pages.
> >
> > To improve the behavior, make three changes:
> >
> > * Introduce a new readahead_control option, max_index, as a hard limit
> > on the readahead. The existing file_ra_state->size can't be used as a
> > limit, it is more of a hint and can be increased by various
> > heuristics.
> > * Set readahead_control->max_index to the end of the VMA in all of the
> > readahead paths that can be triggered from a fault on a file mapping
> > (both "sync" and "async" readahead).
> > * Limit the read-around range start to the VMA's start.
> >
> > Note that these changes only affect readahead triggered in the context
> > of a fault, they do not affect readahead triggered by read syscalls. If
> > a user mixes the two types of accesses, the behavior is expected to be
> > the following: if a fault causes readahead and places a PG_readahead
> > marker and then a read(2) syscall hits the PG_readahead marker, the
> > resulting async readahead *will not* be limited to the VMA end.
> > Conversely, if a read(2) syscall places a PG_readahead marker and then a
> > fault hits the marker, the async readahead *will* be limited to the VMA
> > end.
> >
> > There is an edge case that the above motivation glosses over: A single
> > file mapping might be backed by multiple VMAs. For example, a whole file
> > could be mapped RW, then part of the mapping made RO using mprotect.
> > This patch would hurt performance of a sequential read of such a
> > mapping, the degree depending on how fragmented the VMAs are. A usage
> > pattern like that is likely rare and already suffering from sub-optimal
> > performance because, e.g., the fragmented VMAs limit the fault-around,
> > so each VMA boundary in a sequential read would cause a minor fault.
> > Still, this would make it worse. See a previous discussion of this topic
> > at [1].
> >
> > Tested by mapping and reading a small subset of a large file, then using
> > the cachestat syscall to verify the number of cached pages didn't exceed
> > the mapping size.
> >
> > In practical scenarios, the effect depends on the specific file and
> > usage. Sometimes there is no effect at all, but, for some ELF files in
> > Android, we see ~20% fewer pages pull into the cache.
>
> Didn't Android have a gigantically modified RA window? Could this be why
> you're seeing such large effects? Or is this no longer the case?
On the device I used to test this, the relevant storage device had a readahead
size of 128kb. In general, it can be configured by device OEMs, so perhaps some
devices in the ecosystem have giant RA windows.
Android binaries can have a lot of padding and sometimes unrelated APK sections
adjacent in the same file (see https://lwn.net/Articles/1016860/). Those may be
the source of the effect, but I haven't verified.
AppImage files have some similarities. They are an ELF files that include a
mountable filesystem. Here is a quick test I ran on Debian with the latest
neovim appimage (./cachestat is a binary that simply invokes the cachestat
syscall on a file):
echo 3 >/proc/sys/vm/drop_caches && \
./cachestat ~/nvim-linux-x86_64.appimage && \
~/nvim-linux-x86_64.appimage -es >/dev/null && \
./cachestat ~/nvim-linux-x86_64.appimage
This patch reduces nr_cache from 4134 to 2131.
>
> --
> Pedro