Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
From: Baolin Wang
Date: Wed Apr 15 2026 - 02:11:59 EST
On 4/14/26 4:34 AM, Zi Yan wrote:
On 13 Apr 2026, at 16:20, Matthew Wilcox wrote:
On Mon, Apr 13, 2026 at 03:20:19PM -0400, Zi Yan wrote:
collapse_file() requires FSes supporting large folio with at least
PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
Why? These are bugs. I don't think we gain anything from continuing.
The goal is to catch these issues during development. VM_BUG_ON crashes
the system and that is too much for such issues in collapse_file().
+ /*
+ * skip files without PMD-order folio support
+ * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
+ */
+ if (!shmem_file(file) && mapping_max_folio_order(mapping) < PMD_ORDER)
+ return SCAN_FAIL;
I wonder if it should. If the commit message to 5a90c155defa is
to be believed,
Since 'deny' is for emergencies and 'force' is for testing, performance
issues should not be a problem in real production environments, so don't
call mapping_set_large_folios() in __shmem_get_inode() when large folio is
disabled with mount huge=never option (default policy).
so maybe MADV_COLLAPSE should honour huge=never?
Documentation/filesystems/tmpfs.rst implies that we do!
huge=never Do not allocate huge pages. This is the default.
huge=always Attempt to allocate huge page every time a new page is needed.
huge=within_size Only allocate huge page if it will be fully within i_size.
Also respect madvise(2) hints.
huge=advise Only allocate huge page if requested with madvise(2).
so what's the difference between huge=never and huge=madvise?
I think madvise means MADV_HUGEPAGE for the region, not MADV_COLLAPSE.
Right.
In v1, I did the check for shmem, but that regressed MADV_COLLAPSE, which
always can collapse THPs on shmem. I know it sounds unreasonable, but
that ship has sailed.
Previously, I tried to make MADV_COLLAPSE also honour the THP configuration of shmem/tmpfs[1], but Hugh strongly objected and explained the original intent of MADV_COLLAPSE[2]. I’ll quote Hugh’s comments:
"
Seldom has a feature been so thorougly documented as MADV_COLLAPSE,
in its 6.1 commits and in the "man 2 madvise" page: which are
explicit about MADV_COLLAPSE providing a way to get THPs where the
sysfs setting governing automatic behaviour does not insert them.
We would all prefer a less messy world of THP tunables. I certainly
find plenty to dislike there too; and wish that a less assertive name
than "never" had been chosen originally for the default off position.
But please don't break the accepted and documented behaviour of
MADV_COLLAPSE now.
If you want to exclude all possibility of THPs, then please use the
prctl(PR_SET_THP_DISABLE); or shmem_enabled=deny (I think it was me
who insisted that be respected by MADV_COLLAPSE back then).
"
Afterwards, we reached an agreement to keep the current logic, and Lorenzo helped update the docs, see commit a27848a03504 (“docs: update THP documentation to clarify sysfs ‘never’ setting”).
[1] https://lore.kernel.org/all/cover.1750815384.git.baolin.wang@xxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/75c02dbf-4189-958d-515e-fa80bb2187fc@xxxxxxxxxx/