Re: [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE

From: David Hildenbrand
Date: Thu Mar 02 2023 - 09:13:55 EST



uffd-wp protecting a range:
* !pte_none() -> set uffd-wp bit and wrprotect
* pte_none() -> nothing to do
* PTE_UFFD_WP -> nothing to do
* PTE_UFFD_NO_WP -> set PTE_UFFD_WP

unmapping a page (old way: !pte_none() -> pte_none()):
* uffd-wp bit set: set PTE_UFFD_WP
* uffd-wp bit not set: set PTE_UFFD_NO_WP

(re)mapping a page (old: pte_none() -> !pte_none()):
* PTE_UFFD_WP -> set pte bit for new PTE
* PTE_UFFD_NO_WP -> don't set pte bit for new PTE
* pte_none() -> set pte bit for new PTE

Zapping an anon page using MADV_DONTNEED is a bit confusing. It's actually
similar to a memory write (-> write zeroes), but we don't notify uffd-wp for
that (I think that's something you comment on below). Theoretically, we'd
want to set PTE_UFFD_NO_WP ("dirty") in the async mode. But that might need
more thought of what the expected semantics actually are.

When we walk over the page tables we would get the following information
after protecting the range:

* PTE_UFFD_WP -> clean, not modified since last protection round
* PTE_UFFD_NO_WP -> dirty, modified since last protection round
* pte_none() -> not mapped and therefore not modified since beginning of
protection.
* !pte_none() -> uffd-wp bit decides

I can't say I thought a lot but I feel like it may work. I'd probably avoid
calling it PTE_UFFD_NO_WP or it'll be confusing.. maybe WP_WRITTEN or
WP_RESOLVED instead.

I haven't thought about this further, but I maybe we might be able to just use a single PTE marker , because pte_none() would translate to PTE_UFFD_WP in such VMAs. So we could make the meaning of the PTE_UFFD_WP marker simply depend on the type of VMA.

If I am not wrong, we could stop setting PTE_UFFD_WP completely then, for any memory type (shmem/anon/hugetlb).


But that interface looks weird in that the protection happens right after
VM_UFFD_WP applied to VMA and that keeps true until unregister. One needs
to reprotect using ioctl(UFFDIO_WRITEPROTECT) OTOH after the 1st round of
tracking. It just looks a little bit over-complicated, not to mention we
will need two markers only for userfault-wp. I had a feeling this
complexity can cause us some trouble elsewhere.

When to apply this logic is indeed the interesting part. Doing it right from the beginning when the feature is enabled sounds like the simplest approach to me. For background snapshots and dirty tracking that might just be good enough.



IIUC this can be something done on top even if it'll work (I think the

I think the API would have to change eventually, to enable it via a new feature ("unpopulated implies uffd-wp is active").

userspace API doesn't need to change at all), so I'd suggest giving it some
more thoughts and we start with simple and working.

Yes.

--
Thanks,

David / dhildenb