[PATCH v2 00/26] userfaultfd: write protection support

From: Peter Xu
Date: Mon Feb 11 2019 - 21:56:50 EST


This series implements initial write protection support for
userfaultfd. Currently both shmem and hugetlbfs are not supported
yet, but only anonymous memory. This is the 2nd version of it.

The latest code can also be found at:

https://github.com/xzpeter/linux/tree/uffd-wp-merged

Since there's no objection on the design on previous RFC series, and
the tree has been run through various tests already so I'm removing
RFC tag starting from this version.

During previous v1 discussion, Mike asked about using userfaultfd to
track mprotect()-allowed processes. So far I don't have good idea on
how that could work easily, so I'll assume it's not an initial goal
for current uffd-wp work.

Note again that the first 5 patches in the series can be seen as
isolated work on page fault mechanism. I would hope that they can be
considered to be reviewed/picked even earlier than the rest of the
series since it's even useful for existing userfaultfd MISSING case
[8].

v2 changelog:
- add some r-bs
- split the patch "mm: userfault: return VM_FAULT_RETRY on signals"
into two: one to focus on the signal behavior change, the other to
remove the NOPAGE special path in handle_userfault(). Removing the
ARC specific change and remove that part of commit message since
it's fixed in 4d447455e73b already [Jerome]
- return -ENOENT when VMA is invalid for UFFDIO_WRITEPROTECT to match
UFFDIO_COPY errno [Mike]
- add a new patch to introduce helper to find valid VMA for uffd
[Mike]
- check against VM_MAYWRITE instead of VM_WRITE when registering UFFD
WP [Mike]
- MM_CP_DIRTY_ACCT is used incorrectly, fix it up [Jerome]
- make sure the lock_page behavior will not be changed [Jerome]
- reorder the whole series, introduce the new ioctl last. [Jerome]
- fix up the uffdio_writeprotect() following commit df2cc96e77011cf79
to return -EAGAIN when detected mm layout changes [Mike]

v1 can be found at: https://lkml.org/lkml/2019/1/21/130

Any comment would be greatly welcomed. Thanks.

Overview
====================

The uffd-wp work was initialized by Shaohua Li [1], and later
continued by Andrea [2]. This series is based upon Andrea's latest
userfaultfd tree, and it is a continuous works from both Shaohua and
Andrea. Many of the follow up ideas come from Andrea too.

Besides the old MISSING register mode of userfaultfd, the new uffd-wp
support provides another alternative register mode called
UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing
page faults but also write protection page faults, or even they can be
registered together. At the same time, the new feature also provides
a new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the
userspace to write protect a range or memory or fixup write permission
of faulted pages.

Please refer to the document patch "userfaultfd: wp:
UFFDIO_REGISTER_MODE_WP documentation update" for more information on
the new interface and what it can do.

The major workflow of an uffd-wp program should be:

1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP

2. Write protect part of the whole registered region using
UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to
show that we want to write protect the range.

3. Start a working thread that modifies the protected pages,
meanwhile listening to UFFD messages.

4. When a write is detected upon the protected range, page fault
happens, a UFFD message will be generated and reported to the
page fault handling thread

5. The page fault handler thread resolves the page fault using the
new UFFDIO_WRITEPROTECT ioctl, but this time passing in
!UFFDIO_WRITEPROTECT_MODE_WP instead showing that we want to
recover the write permission. Before this operation, the fault
handler thread can do anything it wants, e.g., dumps the page to
a persistent storage.

6. The worker thread will continue running with the correctly
applied write permission from step 5.

Currently there are already two projects that are based on this new
userfaultfd feature.

QEMU Live Snapshot: The project provides a way to allow the QEMU
hypervisor to take snapshot of VMs without
stopping the VM [3].

LLNL umap library: The project provides a mmap-like interface and
"allow to have an application specific buffer of
pages cached from a large file, i.e. out-of-core
execution using memory map" [4][5].

Before posting the patchset, this series was smoke tested against QEMU
live snapshot and the LLNL umap library (by doing parallel quicksort
using 128 sorting threads + 80 uffd servicing threads). My sincere
thanks to Marty Mcfadden and Denis Plotnikov for the help along the
way.

TODO
=============

- hugetlbfs/shmem support
- performance
- more architectures
- cooperate with mprotect()-allowed processes (???)
- ...

References
==========

[1] https://lwn.net/Articles/666187/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=userfault
[3] https://github.com/denis-plotnikov/qemu/commits/background-snapshot-kvm
[4] https://github.com/LLNL/umap
[5] https://llnl-umap.readthedocs.io/en/develop/
[6] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=b245ecf6cf59156966f3da6e6b674f6695a5ffa5
[7] https://lkml.org/lkml/2018/11/21/370
[8] https://lkml.org/lkml/2018/12/30/64

Andrea Arcangeli (5):
userfaultfd: wp: hook userfault handler to write protection fault
userfaultfd: wp: add WP pagetable tracking to x86
userfaultfd: wp: userfaultfd_pte/huge_pmd_wp() helpers
userfaultfd: wp: add UFFDIO_COPY_MODE_WP
userfaultfd: wp: add the writeprotect API to userfaultfd ioctl

Martin Cracauer (1):
userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update

Peter Xu (17):
mm: gup: rename "nonblocking" to "locked" where proper
mm: userfault: return VM_FAULT_RETRY on signals
userfaultfd: don't retake mmap_sem to emulate NOPAGE
mm: allow VM_FAULT_RETRY for multiple times
mm: gup: allow VM_FAULT_RETRY for multiple times
mm: merge parameters for change_protection()
userfaultfd: wp: apply _PAGE_UFFD_WP bit
mm: export wp_page_copy()
userfaultfd: wp: handle COW properly for uffd-wp
userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork
userfaultfd: wp: add pmd_swp_*uffd_wp() helpers
userfaultfd: wp: support swap and page migration
khugepaged: skip collapse if uffd-wp detected
userfaultfd: introduce helper vma_find_uffd
userfaultfd: wp: don't wake up when doing write protect
userfaultfd: selftests: refactor statistics
userfaultfd: selftests: add write-protect test

Shaohua Li (3):
userfaultfd: wp: add helper for writeprotect check
userfaultfd: wp: support write protection for userfault vma range
userfaultfd: wp: enabled write protection in userfaultfd API

Documentation/admin-guide/mm/userfaultfd.rst | 51 +++++
arch/alpha/mm/fault.c | 4 +-
arch/arc/mm/fault.c | 12 +-
arch/arm/mm/fault.c | 9 +-
arch/arm64/mm/fault.c | 11 +-
arch/hexagon/mm/vm_fault.c | 3 +-
arch/ia64/mm/fault.c | 3 +-
arch/m68k/mm/fault.c | 5 +-
arch/microblaze/mm/fault.c | 3 +-
arch/mips/mm/fault.c | 3 +-
arch/nds32/mm/fault.c | 7 +-
arch/nios2/mm/fault.c | 5 +-
arch/openrisc/mm/fault.c | 3 +-
arch/parisc/mm/fault.c | 4 +-
arch/powerpc/mm/fault.c | 7 +-
arch/riscv/mm/fault.c | 9 +-
arch/s390/mm/fault.c | 14 +-
arch/sh/mm/fault.c | 5 +-
arch/sparc/mm/fault_32.c | 4 +-
arch/sparc/mm/fault_64.c | 4 +-
arch/um/kernel/trap.c | 6 +-
arch/unicore32/mm/fault.c | 10 +-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/pgtable.h | 67 ++++++
arch/x86/include/asm/pgtable_64.h | 8 +-
arch/x86/include/asm/pgtable_types.h | 11 +-
arch/x86/mm/fault.c | 7 +-
arch/xtensa/mm/fault.c | 4 +-
fs/userfaultfd.c | 114 ++++++----
include/asm-generic/pgtable.h | 1 +
include/asm-generic/pgtable_uffd.h | 66 ++++++
include/linux/huge_mm.h | 2 +-
include/linux/mm.h | 21 +-
include/linux/swapops.h | 2 +
include/linux/userfaultfd_k.h | 42 +++-
include/trace/events/huge_memory.h | 1 +
include/uapi/linux/userfaultfd.h | 28 ++-
init/Kconfig | 5 +
mm/filemap.c | 2 +-
mm/gup.c | 61 ++---
mm/huge_memory.c | 28 ++-
mm/hugetlb.c | 8 +-
mm/khugepaged.c | 23 ++
mm/memory.c | 28 ++-
mm/mempolicy.c | 2 +-
mm/migrate.c | 7 +
mm/mprotect.c | 98 ++++++--
mm/rmap.c | 6 +
mm/userfaultfd.c | 148 ++++++++++---
tools/testing/selftests/vm/userfaultfd.c | 222 ++++++++++++++-----
50 files changed, 919 insertions(+), 276 deletions(-)
create mode 100644 include/asm-generic/pgtable_uffd.h

--
2.17.1