[PATCH v3 0/7] mm: Remember a/d bits for migration entries
From: Peter Xu
Date: Tue Aug 09 2022 - 18:02:13 EST
v3:
- Rebased to akpm/mm-unstable
- Use BIT() [Nadav]
- One more patch to add comment for "ifdef"s [Nadav]
- Added one ascii table for migration swp offset layout [David]
- Move comment to be above "if"s [Ying]
- Separate the dirty bit carry-over of pmd split to separate patch [Ying]
- Added two patches to cache both max_swapfile_size and
migration_entry_supports_ad() at the end
rfc: https://lore.kernel.org/all/20220729014041.21292-1-peterx@xxxxxxxxxx
v1: https://lore.kernel.org/all/20220803012159.36551-1-peterx@xxxxxxxxxx
v2: https://lore.kernel.org/all/20220804203952.53665-1-peterx@xxxxxxxxxx
Problem
=======
When migrate a page, right now we always mark the migrated page as old &
clean.
However that could lead to at least two problems:
(1) We lost the real hot/cold information while we could have persisted.
That information shouldn't change even if the backing page is changed
after the migration,
(2) There can be always extra overhead on the immediate next access to
any migrated page, because hardware MMU needs cycles to set the young
bit again for reads, and dirty bits for write, as long as the
hardware MMU supports these bits.
Many of the recent upstream works showed that (2) is not something trivial
and actually very measurable. In my test case, reading 1G chunk of memory
- jumping in page size intervals - could take 99ms just because of the
extra setting on the young bit on a generic x86_64 system, comparing to 4ms
if young set.
This issue is originally reported by Andrea Arcangeli.
Solution
========
To solve this problem, this patchset tries to remember the young/dirty bits
in the migration entries and carry them over when recovering the ptes.
We have the chance to do so because in many systems the swap offset is not
really fully used. Migration entries use swp offset to store PFN only,
while the PFN is normally not as large as swp offset and normally smaller.
It means we do have some free bits in swp offset that we can use to store
things like A/D bits, and that's how this series tried to approach this
problem.
max_swapfile_size() is used here to detect per-arch offset length in swp
entries. We'll automatically remember the A/D bits when we find that we
have enough swp offset field to keep both the PFN and the extra bits.
Since max_swapfile_size() can be slow, the last two patches cache the
results for it and also swap_migration_ad_supported as a whole.
Tests
=====
After the patchset applied, the immediate read access test [1] of above 1G
chunk after migration can shrink from 99ms to 4ms. The test is done by
moving 1G pages from node 0->1->0 then read it in page size jumps. The
test is with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.
Similar effect can also be measured when writting the memory the 1st time
after migration.
After applying the patchset, both initial immediate read/write after page
migrated will perform similarly like before migration happened.
Patch Layout
============
Patch 1-2: Cleanups from either previous versions or on swapops.h macros.
Patch 3-4: Prepare for the introduction of migration A/D bits
Patch 5: The core patch to remember young/dirty bit in swap offsets.
Patch 6-7: Cache relevant fields to make migration_entry_supports_ad() fast.
Please review, thanks.
[1] https://github.com/xzpeter/clibs/blob/master/misc/swap-young.c
Peter Xu (7):
mm/x86: Use SWP_TYPE_BITS in 3-level swap macros
mm/swap: Comment all the ifdef in swapops.h
mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry
mm/thp: Carry over dirty bit when thp splits on pmd
mm: Remember young/dirty bit for page migrations
mm/swap: Cache maximum swapfile size when init swap
mm/swap: Cache swap migration A/D bits support
arch/arm64/mm/hugetlbpage.c | 2 +-
arch/x86/include/asm/pgtable-3level.h | 8 +-
arch/x86/mm/init.c | 2 +-
include/linux/swapfile.h | 1 +
include/linux/swapops.h | 145 +++++++++++++++++++++++---
mm/hmm.c | 2 +-
mm/huge_memory.c | 24 ++++-
mm/memory-failure.c | 2 +-
mm/migrate.c | 6 +-
mm/migrate_device.c | 4 +
mm/page_vma_mapped.c | 6 +-
mm/rmap.c | 5 +-
mm/swapfile.c | 18 +++-
13 files changed, 194 insertions(+), 31 deletions(-)
--
2.32.0