Re: [PATCH mm-hotfixses] Revert "mm: limit filemap_fault readahead to VMA boundaries"

From: Frederick Mayle

Date: Tue Jun 23 2026 - 15:29:17 EST


On Mon, Jun 22, 2026 at 3:29 PM Pedro Falcato <pfalcato@xxxxxxx> wrote:
>
> On Mon, Jun 22, 2026 at 10:57:30AM -0700, Suren Baghdasaryan wrote:
> > On Mon, Jun 22, 2026 at 10:11 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Jun 22, 2026 at 09:58:55AM -0700, Andrew Morton wrote:
> > > > > > much as I personally find restricting mmap readahead to a VMA a sensible
> > > > > > thing to do). We just need to figure out how to improve the Android
> > > > > > usecase.
> > > >
> > > > Would a helpful heuristic be to do what 7b32f64bc512 is doing, but only
> > > > if PROT_EXEC?
> > >
> > > I was wondering the same thing. But I think it's right to back this out
> > > for now and try that after -rc1 so it gets some time soaking and
> > > bot-teesting.
> >
> > Thanks for the suggestion! That sounds sensible to me.
>
> I don't think this works. Here's an example readelf -a from a random,
> trivial ELF I have:
>
> pfalcato@pedro-suse:~/linux> cc -g main.c
> pfalcato@pedro-suse:~/linux> readelf -a a.out
> ELF Header:
> Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
> Class: ELF64
> Data: 2's complement, little endian
> Version: 1 (current)
> OS/ABI: UNIX - System V
> ABI Version: 0
> Type: EXEC (Executable file)
> Machine: Advanced Micro Devices X86-64
> Version: 0x1
> Entry point address: 0x401040
> Start of program headers: 64 (bytes into file)
> Start of section headers: 18448 (bytes into file)
> Flags: 0x0
> Size of this header: 64 (bytes)
> Size of program headers: 56 (bytes)
> Number of program headers: 14
> Size of section headers: 64 (bytes)
> Number of section headers: 38
> Section header string table index: 37
>
> Section Headers:
> [Nr] Name Type Address Offset
> Size EntSize Flags Link Info Align
> [ 0] NULL 0000000000000000 00000000
> 0000000000000000 0000000000000000 0 0 0
> [ 1] .note.gnu.pr[...] NOTE 0000000000400350 00000350
> 0000000000000040 0000000000000000 A 0 0 8
> [ 2] .note.gnu.bu[...] NOTE 0000000000400390 00000390
> 0000000000000024 0000000000000000 A 0 0 4
> [ 3] .interp PROGBITS 00000000004003b4 000003b4
> 000000000000001c 0000000000000000 A 0 0 1
> [ 4] .hash HASH 00000000004003d0 000003d0
> 0000000000000024 0000000000000004 A 6 0 8
> [ 5] .gnu.hash GNU_HASH 00000000004003f8 000003f8
> 000000000000001c 0000000000000000 A 6 0 8
> [ 6] .dynsym DYNSYM 0000000000400418 00000418
> 0000000000000060 0000000000000018 A 7 1 8
> [ 7] .dynstr STRTAB 0000000000400478 00000478
> 000000000000004a 0000000000000000 A 0 0 1
> [ 8] .gnu.version VERSYM 00000000004004c2 000004c2
> 0000000000000008 0000000000000002 A 6 0 2
> [ 9] .gnu.version_r VERNEED 00000000004004d0 000004d0
> 0000000000000030 0000000000000000 A 7 1 8
> [10] .rela.dyn RELA 0000000000400500 00000500
> 0000000000000030 0000000000000018 A 6 0 8
> [11] .rela.plt RELA 0000000000400530 00000530
> 0000000000000018 0000000000000018 AI 6 24 8
> [12] .init PROGBITS 0000000000401000 00001000
> 000000000000001b 0000000000000000 AX 0 0 4
> [13] .plt PROGBITS 0000000000401020 00001020
> 0000000000000020 0000000000000010 AX 0 0 16
> [14] .text PROGBITS 0000000000401040 00001040
> 000000000000011b 0000000000000000 AX 0 0 16
> [15] .fini PROGBITS 000000000040115c 0000115c
> 000000000000000d 0000000000000000 AX 0 0 4
> [16] .rodata PROGBITS 0000000000402000 00002000
> 0000000000000004 0000000000000004 AM 0 0 4
> [17] .eh_frame_hdr PROGBITS 0000000000402004 00002004
> 000000000000002c 0000000000000000 A 0 0 4
> [18] .eh_frame PROGBITS 0000000000402030 00002030
> 0000000000000088 0000000000000000 A 0 0 8
> [19] .note.ABI-tag NOTE 00000000004020b8 000020b8
> 0000000000000020 0000000000000000 A 0 0 4
> [20] .init_array INIT_ARRAY 0000000000403de8 00002de8
> 0000000000000008 0000000000000008 WA 0 0 8
> [21] .fini_array FINI_ARRAY 0000000000403df0 00002df0
> 0000000000000008 0000000000000008 WA 0 0 8
> [22] .dynamic DYNAMIC 0000000000403df8 00002df8
> 00000000000001e0 0000000000000010 WA 7 0 8
> [23] .got PROGBITS 0000000000403fd8 00002fd8
> 0000000000000010 0000000000000008 WA 0 0 8
> [24] .got.plt PROGBITS 0000000000403fe8 00002fe8
> 0000000000000020 0000000000000008 WA 0 0 8
> [25] .data PROGBITS 0000000000404008 00003008
> 0000000000000010 0000000000000000 WA 0 0 8
> [26] .bss NOBITS 0000000000404018 00003018
> 0000000000000008 0000000000000000 WA 0 0 1
> [27] .comment PROGBITS 0000000000000000 00003018
> 0000000000000019 0000000000000001 MS 0 0 1
> [28] .debug_aranges PROGBITS 0000000000000000 00003040
> 0000000000000150 0000000000000000 0 0 16
> [29] .debug_info PROGBITS 0000000000000000 00003190
> 0000000000000444 0000000000000000 0 0 1
> [30] .debug_abbrev PROGBITS 0000000000000000 000035d4
> 0000000000000245 0000000000000000 0 0 1
> [31] .debug_line PROGBITS 0000000000000000 00003819
> 0000000000000274 0000000000000000 0 0 1
> [32] .debug_str PROGBITS 0000000000000000 00003a8d
> 0000000000000540 0000000000000001 MS 0 0 1
> [33] .debug_line_str PROGBITS 0000000000000000 00003fcd
> 0000000000000163 0000000000000001 MS 0 0 1
> [34] .debug_rnglists PROGBITS 0000000000000000 00004130
> 0000000000000042 0000000000000000 0 0 1
> [35] .symtab SYMTAB 0000000000000000 00004178
> 0000000000000360 0000000000000018 36 20 8
> [36] .strtab STRTAB 0000000000000000 000044d8
> 00000000000001bc 0000000000000000 0 0 1
> [37] .shstrtab STRTAB 0000000000000000 00004694
> 0000000000000176 0000000000000000 0 0 1
> Key to Flags:
> W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
> L (link order), O (extra OS processing required), G (group), T (TLS),
> C (compressed), x (unknown), o (OS specific), E (exclude),
> D (mbind), l (large), p (processor specific)
>
> Notice the section header table, and how it starts after program text and
> program data, and how all the other ELF gunk (debug info, symtab,
> strtab(s)) also goes after .data. So (mostly) the real problematic readahead
> would be on the RW VMA that covers .data.
>
> (This also matches my understanding of linkers, where they generally do
> (to put it simply) ELF headers - program headers - .text - .data - .bss, with
> stripable gunk after it.)
>
> It's also the case that synchronous RA on VM_EXEC is already pretty
> conservative and limited, see the big if (vm_flags & VM_EXEC) in
> do_sync_mmap_readahead(). (I think the underlying logic behind also
> implies that async RA will not be started against these pages, but I
> am not sure).
>
> --
> Pedro

Yes, I think readahead of VM_EXEC is already restricted to the VMA.
Maybe there is an edge case where someone does buffered reads on an
ELF file, leaving a PG_readahead flag inside the VM_EXEC range, then
it could trigger async readahead beyond the end, but that sounds
minor.

For next steps: Suppose we show the mmap usage in this video encoder
is significantly inefficient compared to buffered reads or a big mmap
and that project accepts a contribution to move away from the small
mmaps. Would we be comfortable attempting this again as is? Probably
there would be a lag before all users of the encoder update and they
may see this bad perf.

If it isn't OK as is for the foreseeable future, we could consider
applying the limit conditionally. Maybe a CONFIG or sysfs option. Or,
use some heuristic to turn it on automatically, e.g. only for ELF
files, or files with VM_EXEC mappings.

We are planning to measure how this change affects memory usage of
Android devices in the field, but we don't have the data yet.