[PATCH v3 00/27] userfaultfd-wp: Support shmem and hugetlbfs
From: Peter Xu
Date: Thu May 27 2021 - 16:19:39 EST
This is v3 of uffd-wp shmem & hugetlbfs support, which completes uffd-wp as a
kernel full feature, as it only supports anonymous before this series. It's
based on latest v5.13-rc3-mmots-2021-05-25-20-12.
The rebase was probably the hardest one, as I encountered quite a few breakage
here and there within a few mmots tags. But now after figuring out everything
(which does took time) it's settling.
The whole series can also be found online [1].
Nothing big really changed otherwise. Full changelog listed below.
v3:
- Rebase to v5.13-rc3-mmots-2021-05-25-20-12
- Fix commit message and comment for patch "shmem/userfaultfd: Handle uffd-wp
special pte in page fault handler", dropping all reference to FAULT_FLAG_UFFD_WP.
- Reworked patch "shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP" after
Axel's refactoring on uffdio-copy/continue.
- Added patch "mm/hugetlb: Introduce huge pte version of uffd-wp helpers", so
that huge pte helpers are introduced in one patch. Also add huge_pte_uffd_wp
helper, which was missing previously.
- Added patch: "mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs", to let
pagemap uffd-wp bit work for shmem/hugetlbfs
- Added patch: "mm/shmem: Unconditionally set pte dirty in
mfill_atomic_install_pte", to clean up dirty bit together in uffdio-copy
v2:
- Add R-bs
- Added patch "mm/hugetlb: Drop __unmap_hugepage_range definition from
hugetlb.h" as noticed/suggested by Mike Kravets
- Fix commit message of patch "hugetlb/userfaultfd: Only drop uffd-wp special
pte if required" [MikeK]
- Removing comments for fields in zap_details since they're either incorrect or
not helping [Matthew]
- Rephrase commit message in patch "hugetlb/userfaultfd: Take care of
UFFDIO_COPY_MODE_WP" to explain better on why set dirty bit for UFFDIO_COPY
in hugetlbfs [MikeK]
- Don't emulate READ for uffd-wp-special on both shmem & hugetlbfs.
- Drop FAULT_FLAG_UFFD_WP flag, by checking vmf->orig_pte directly against
pte_swp_uffd_wp_special()
- Fix race condition of page fault handling on uffd-wp-special [Mike]
About Swap Special PTE
======================
In short, the so-called "swap special pte" in this patchset is a new type of
pte that doesn't exist in the past, but it got used initially in this series in
file-backed memories. It is used to persist information even if the ptes got
dropped meanwhile when the page cache still existed. For example, when
splitting a file-backed huge pmd, we could be simply dropping the pmd entry
then wait until another fault coming. It's okay in the past since all
information in the pte can be retained from the page cache when the next page
fault triggers. However in this case, uffd-wp is per-pte information which
cannot be kept in page cache, so that information needs to be maintained
somehow still in the pgtable entry, even if the pgtable entry is going to be
dropped. Here instead of replacing with a none entry, we used the "swap
special pte". Then when the next page fault triggers, we can observe orig_pte
to retain this information.
I'm copy-pasting some commit message from the patch "mm/swap: Introduce the
idea of special swap ptes", where it tried to explain this pte in another angle:
We used to have special swap entries, like migration entries, hw-poison
entries, device private entries, etc.
Those "special swap entries" reside in the range that they need to be at least
swap entries first, and their types are decided by swp_type(entry).
This patch introduces another idea called "special swap ptes".
It's very easy to get confused against "special swap entries", but a speical
swap pte should never contain a swap entry at all. It means, it's illegal to
call pte_to_swp_entry() upon a special swap pte.
Make the uffd-wp special pte to be the first special swap pte.
Before this patch, is_swap_pte()==true means one of the below:
(a.1) The pte has a normal swap entry (non_swap_entry()==false). For
example, when an anonymous page got swapped out.
(a.2) The pte has a special swap entry (non_swap_entry()==true). For
example, a migration entry, a hw-poison entry, etc.
After this patch, is_swap_pte()==true means one of the below, where case (b) is
added:
(a) The pte contains a swap entry.
(a.1) The pte has a normal swap entry (non_swap_entry()==false). For
example, when an anonymous page got swapped out.
(a.2) The pte has a special swap entry (non_swap_entry()==true). For
example, a migration entry, a hw-poison entry, etc.
(b) The pte does not contain a swap entry at all (so it cannot be passed
into pte_to_swp_entry()). For example, uffd-wp special swap pte.
Hugetlbfs needs similar thing because it's also file-backed. I directly reused
the same special pte there, though the shmem/hugetlb change on supporting this
new pte is different since they don't share code path a lot.
Patch layout
============
Part (1): Shmem support, this is where the special swap pte is introduced.
Some zap rework is needed within the process:
mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte
shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
mm: Clear vmf->pte after pte_unmap_same() returns
mm/userfaultfd: Introduce special pte for unmapped file-backed mem
mm/swap: Introduce the idea of special swap ptes
shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
mm: Drop first_index/last_index in zap_details
mm: Introduce zap_details.zap_flags
mm: Introduce ZAP_FLAG_SKIP_SWAP
mm: Pass zap_flags into unmap_mapping_pages()
shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps
shmem/userfaultfd: Handle the left-overed special swap ptes
shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()
Part (2): Hugetlb supportdisable huge pmd sharing for uffd-wp patches have been
merged. The rest is the changes required to teach hugetlbfs understand the
special swap pte too that introduced with the uffd-wp change:
mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
mm/hugetlb: Introduce huge pte version of uffd-wp helpers
hugetlb/userfaultfd: Hook page faults for uffd write protection
hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT
mm/hugetlb: Introduce huge version of special swap pte helpers
hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
hugetlb/userfaultfd: Allow wr-protect none ptes
hugetlb/userfaultfd: Only drop uffd-wp special pte if required
Part (3): Enable both features in code and test (plus pagemap support)
mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
userfaultfd: Enable write protection for shmem & hugetlbfs
userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs
Tests
=====
I've tested it using either userfaultfd kselftest program, but also with
umapsort [2] which should be even stricter. Tested page swapping in/out during
umapsort.
If anyone would like to try umapsort, need to use an extremely hacked version
of umap library [3], because by default umap only supports anonymous. So to
test it we need to build [3] then [2].
Any comment would be greatly welcomed. Thanks,
[1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs
[2] https://github.com/LLNL/umap-apps
[3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs
Peter Xu (27):
mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte
shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
mm: Clear vmf->pte after pte_unmap_same() returns
mm/userfaultfd: Introduce special pte for unmapped file-backed mem
mm/swap: Introduce the idea of special swap ptes
shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
mm: Drop first_index/last_index in zap_details
mm: Introduce zap_details.zap_flags
mm: Introduce ZAP_FLAG_SKIP_SWAP
mm: Pass zap_flags into unmap_mapping_pages()
shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on
thps
shmem/userfaultfd: Handle the left-overed special swap ptes
shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()
mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
mm/hugetlb: Introduce huge pte version of uffd-wp helpers
hugetlb/userfaultfd: Hook page faults for uffd write protection
hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT
mm/hugetlb: Introduce huge version of special swap pte helpers
hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
hugetlb/userfaultfd: Allow wr-protect none ptes
hugetlb/userfaultfd: Only drop uffd-wp special pte if required
mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
mm/userfaultfd: Enable write protection for shmem & hugetlbfs
userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs
arch/arm64/kernel/mte.c | 2 +-
arch/x86/include/asm/pgtable.h | 28 +++
fs/dax.c | 10 +-
fs/hugetlbfs/inode.c | 15 +-
fs/proc/task_mmu.c | 21 +-
fs/userfaultfd.c | 38 ++--
include/asm-generic/hugetlb.h | 15 ++
include/asm-generic/pgtable_uffd.h | 3 +
include/linux/hugetlb.h | 30 ++-
include/linux/mm.h | 48 ++++-
include/linux/mm_inline.h | 43 +++++
include/linux/shmem_fs.h | 4 +-
include/linux/swapops.h | 39 +++-
include/linux/userfaultfd_k.h | 45 +++++
include/uapi/linux/userfaultfd.h | 10 +-
mm/gup.c | 2 +-
mm/hmm.c | 2 +-
mm/hugetlb.c | 160 +++++++++++++---
mm/khugepaged.c | 14 +-
mm/madvise.c | 4 +-
mm/memcontrol.c | 2 +-
mm/memory.c | 234 +++++++++++++++++------
mm/migrate.c | 4 +-
mm/mincore.c | 2 +-
mm/mprotect.c | 63 +++++-
mm/mremap.c | 2 +-
mm/page_vma_mapped.c | 6 +-
mm/rmap.c | 8 +
mm/shmem.c | 5 +-
mm/swapfile.c | 2 +-
mm/truncate.c | 17 +-
mm/userfaultfd.c | 73 ++++---
tools/testing/selftests/vm/userfaultfd.c | 9 +-
33 files changed, 765 insertions(+), 195 deletions(-)
--
2.31.1