Re: [PATCH v4 0/6] add mTHP support for anonymous shmem

From: Daniel Gomez
Date: Mon Jun 10 2024 - 08:10:59 EST


Hi Baolin,

On Tue, Jun 04, 2024 at 06:17:44PM +0800, Baolin Wang wrote:
> Anonymous pages have already been supported for multi-size (mTHP) allocation
> through commit 19eaf44954df, that can allow THP to be configured through the
> sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
>
> However, the anonymous shmem will ignore the anonymous mTHP rule configured
> through the sysfs interface, and can only use the PMD-mapped THP, that is not
> reasonable. Many implement anonymous page sharing through mmap(MAP_SHARED |
> MAP_ANONYMOUS), especially in database usage scenarios, therefore, users expect
> to apply an unified mTHP strategy for anonymous pages, also including the
> anonymous shared pages, in order to enjoy the benefits of mTHP. For example,
> lower latency than PMD-mapped THP, smaller memory bloat than PMD-mapped THP,
> contiguous PTEs on ARM architecture to reduce TLB miss etc.
>
> As discussed in the bi-weekly MM meeting[1], the mTHP controls should control
> all of shmem, not only anonymous shmem, but support will be added iteratively.
> Therefore, this patch set starts with support for anonymous shmem.
>
> The primary strategy is similar to supporting anonymous mTHP. Introduce
> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
> which can have almost the same values as the top-level
> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
> additional "inherit" option and dropping the testing options 'force' and
> 'deny'. By default all sizes will be set to "never" except PMD size, which
> is set to "inherit". This ensures backward compatibility with the anonymous
> shmem enabled of the top level, meanwhile also allows independent control of
> anonymous shmem enabled for each mTHP.
>
> Use the page fault latency tool to measure the performance of 1G anonymous shmem

I'm not familiar with this tool. Could you share which repo/tool you are
referring to?

Also, are you running or are you aware of any other tools/tests available for
shmem that we can use to make sure we do not introduce any regressions?

Thanks!
Daniel

> with 32 threads on my machine environment with: ARM64 Architecture, 32 cores,
> 125G memory:
> base: mm-unstable
> user-time sys_time faults_per_sec_per_cpu faults_per_sec
> 0.04s 3.10s 83516.416 2669684.890
>
> mm-unstable + patchset, anon shmem mTHP disabled
> user-time sys_time faults_per_sec_per_cpu faults_per_sec
> 0.02s 3.14s 82936.359 2630746.027
>
> mm-unstable + patchset, anon shmem 64K mTHP enabled
> user-time sys_time faults_per_sec_per_cpu faults_per_sec
> 0.08s 0.31s 678630.231 17082522.495
>
> From the data above, it is observed that the patchset has a minimal impact when
> mTHP is not enabled (some fluctuations observed during testing). When enabling 64K
> mTHP, there is a significant improvement of the page fault latency.
>
> [1] https://lore.kernel.org/all/f1783ff0-65bd-4b2b-8952-52b6822a0835@xxxxxxxxxx/
>
> Changes from v3:
> - Drop 'force' and 'deny' testing options for each mTHP.
> - Use new helper update_mmu_tlb_range(), per Lance.
> - Update documentation to drop "anonymous thp" terminology, per David.
> - Initialize the 'suitable_orders' in shmem_alloc_and_add_folio(),
> reported by kernel test robot.
> - Fix the highest mTHP order in shmem_get_unmapped_area().
> - Update some commit message.
>
> Changes from v2:
> - Rebased to mm/mm-unstable.
> - Remove 'huge' parameter for shmem_alloc_and_add_folio(), per Lance.
>
> Changes from v1:
> - Drop the patch that re-arranges the position of highest_order() and
> next_order(), per Ryan.
> - Modify the finish_fault() to fix VA alignment issue, per Ryan and
> David.
> - Fix some building issues, reported by Lance and kernel test robot.
> - Update some commit message.
>
> Changes from RFC:
> - Rebase the patch set against the new mm-unstable branch, per Lance.
> - Add a new patch to export highest_order() and next_order().
> - Add a new patch to align mTHP size in shmem_get_unmapped_area().
> - Handle the uffd case and the VMA limits case when building mapping for
> large folio in the finish_fault() function, per Ryan.
> - Remove unnecessary 'order' variable in patch 3, per Kefeng.
> - Keep the anon shmem counters' name consistency.
> - Modify the strategy to support mTHP for anonymous shmem, discussed with
> Ryan and David.
> - Add reviewed tag from Barry.
> - Update the commit message.
>
> Baolin Wang (6):
> mm: memory: extend finish_fault() to support large folio
> mm: shmem: add THP validation for PMD-mapped THP related statistics
> mm: shmem: add multi-size THP sysfs interface for anonymous shmem
> mm: shmem: add mTHP support for anonymous shmem
> mm: shmem: add mTHP size alignment in shmem_get_unmapped_area
> mm: shmem: add mTHP counters for anonymous shmem
>
> Documentation/admin-guide/mm/transhuge.rst | 23 ++
> include/linux/huge_mm.h | 23 ++
> mm/huge_memory.c | 17 +-
> mm/memory.c | 57 +++-
> mm/shmem.c | 344 ++++++++++++++++++---
> 5 files changed, 403 insertions(+), 61 deletions(-)
>
> --
> 2.39.3
>