Re: [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE

From: Mike Kravetz
Date: Mon Jan 11 2021 - 19:55:57 EST


On 1/7/21 11:04 AM, Axel Rasmussen wrote:
> Overview
> ========
>
> This series adds a new userfaultfd registration mode,
> UFFDIO_REGISTER_MODE_MINOR. This allows userspace to intercept "minor" faults.
> By "minor" fault, I mean the following situation:
>
> Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory).
> One of the mappings is registered with userfaultfd (in minor mode), and the
> other is not. Via the non-UFFD mapping, the underlying pages have already been
> allocated & filled with some contents. The UFFD mapping has not yet been
> faulted in; when it is touched for the first time, this results in what I'm
> calling a "minor" fault. As a concrete example, when working with hugetlbfs, we
> have huge_pte_none(), but find_lock_page() finds an existing page.
>
> We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is,
> userspace resolves the fault by either a) doing nothing if the contents are
> already correct, or b) updating the underlying contents using the second,
> non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA,
> or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel
> "I have ensured the page contents are correct, carry on setting up the mapping".
>

One quick thought.

This is not going to work as expected with hugetlbfs pmd sharing. If you
are not familiar with hugetlbfs pmd sharing, you are not alone. :)

pmd sharing is enabled for x86 and arm64 architectures. If there are multiple
shared mappings of the same underlying hugetlbfs file or shared memory segment
that are 'suitably aligned', then the PMD pages associated with those regions
are shared by all the mappings. Suitably aligned means 'on a 1GB boundary'
and 1GB in size.

When pmds are shared, your mappings will never see a 'minor fault'. This
is because the PMD (page table entries) is shared.

--
Mike Kravetz