Re: [PATCH v2] mm: readahead: make thp readahead conditional to mmap_miss logic
From: Dev Jain
Date: Mon Oct 06 2025 - 00:48:15 EST
On 06/10/25 7:24 am, Roman Gushchin wrote:
Commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
introduced a special handling for VM_HUGEPAGE mappings: even if the
readahead is disabled, 1 or 2 HPAGE_PMD_ORDER pages are
allocated.
This change causes a significant regression for containers with a
tight memory.max limit, if VM_HUGEPAGE is widely used. Prior to this
commit, mmap_miss logic would eventually lead to the readahead
disablement, effectively reducing the memory pressure in the
cgroup. With this change the kernel is trying to allocate 1-2 huge
pages for each fault, no matter if these pages are used or not
before being evicted, increasing the memory pressure multi-fold.
To fix the regression, let's make the new VM_HUGEPAGE conditional
to the mmap_miss check, but keep independent from the ra->ra_pages.
This way the main intention of commit 4687fdbb805a ("mm/filemap:
Support VM_HUGEPAGE for file mappings") stays intact, but the
regression is resolved.
The logic behind this changes is simple: even if a user explicitly
requests using huge pages to back the file mapping (using VM_HUGEPAGE
flag), under a very strong memory pressure it's better to fall back
to ordinary pages.
Signed-off-by: Roman Gushchin <roman.gushchin@xxxxxxxxx>
Reviewed-by: Jan Kara <jack@xxxxxxx>
Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
Cc: Dev Jain <dev.jain@xxxxxxx>
Cc: linux-mm@xxxxxxxxx
--
v2: fixed VM_SEQ_READ handling (by Dev Jain)
As you said in a previous mail, we definitely need some other way of measuring
memory pressure here. Since your workload is doing madvise(MADV_HUGEPAGE) (which
means it expects to take advantage of the hugepage by accessing it frequently) and still
getting lots of cache misses, I guess it must be due to swapping out due to
tight memory.max. Anyways, the change looks correct to me.
Reviewed-by: Dev Jain <dev.jain@xxxxxxx>