On 22/04/2024 08:02, Baolin Wang wrote:
Anonymous pages have already been supported for multi-size (mTHP) allocation
through commit 19eaf44954df, that can allow THP to be configured through the
sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
However, the anonymous shared pages will ignore the anonymous mTHP rule
configured through the sysfs interface, and can only use the PMD-mapped
THP, that is not reasonable. Many implement anonymous page sharing through
mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage scenarios,
therefore, users expect to apply an unified mTHP strategy for anonymous pages,
also including the anonymous shared pages, in order to enjoy the benefits of
mTHP. For example, lower latency than PMD-mapped THP, smaller memory bloat
than PMD-mapped THP, contiguous PTEs on ARM architecture to reduce TLB miss etc.
This sounds like a very useful addition!
Out of interest, can you point me at any workloads (and off-the-shelf benchmarks
for those workloads) that predominantly use shared anon memory?
The primary strategy is that, the use of huge pages for anonymous shared pages
still follows the global control determined by the mount option "huge=" parameter
or the sysfs interface at '/sys/kernel/mm/transparent_hugepage/shmem_enabled'.
The utilization of mTHP is allowed only when the global 'huge' switch is enabled.
Subsequently, the mTHP sysfs interface (/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled)
is checked to determine the mTHP size that can be used for large folio allocation
for these anonymous shared pages.
I'm not sure about this proposed control mechanism; won't it break
compatibility? I could be wrong, but I don't think shmem's use of THP used to
depend upon the value of /sys/kernel/mm/transparent_hugepage/enabled? So it
doesn't make sense to me that we now depend upon the
/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled values (which by
default disables all sizes except 2M, which is set to "inherit" from
/sys/kernel/mm/transparent_hugepage/enabled).
The other problem is that shmem_enabled has a different set of options
(always/never/within_size/advise/deny/force) to enabled (always/madvise/never)
Perhaps it would be cleaner to do the same trick we did for enabled; Introduce
/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled, which can have all the
same values as the top-level /sys/kernel/mm/transparent_hugepage/shmem_enabled,
plus the additional "inherit" option. By default all sizes will be set to
"never" except 2M, which is set to "inherit".
Of course the huge= mount option would also need to take a per-size option in
this case. e.g. huge=2048kB:advise,64kB:always
TODO:
- More testing and provide some performance data.
- Need more discussion about the large folio allocation strategy for a 'regular
file' operation created by memfd_create(), for example using ftruncate(fd) to specify
the 'file' size, which need to follow the anonymous mTHP rule too?
- Do not split the large folio when share memory swap out.
- Can swap in a large folio for share memory.
Baolin Wang (5):
mm: memory: extend finish_fault() to support large folio
mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio()
mm: shmem: add THP validation for PMD-mapped THP related statistics
mm: shmem: add mTHP support for anonymous share pages
mm: shmem: add anonymous share mTHP counters
include/linux/huge_mm.h | 4 +-
mm/huge_memory.c | 8 ++-
mm/memory.c | 25 +++++++---
mm/shmem.c | 107 ++++++++++++++++++++++++++++++----------
4 files changed, 108 insertions(+), 36 deletions(-)