Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads

From: Hugh Dickins
Date: Thu Dec 03 2020 - 00:37:41 EST


On Wed, 2 Dec 2020, Peter Xu wrote:
> On Wed, Dec 02, 2020 at 02:37:33PM -0800, Hugh Dickins wrote:
> > On Tue, 1 Dec 2020, Andrea Arcangeli wrote:
> > >
> > > Any suggestions on how to have the per-vaddr per-mm _PAGE_UFFD_WP bit
> > > survive the pte invalidates in a way that remains associated to a
> > > certain vaddr in a single mm (so it can shoot itself in the foot if it
> > > wants, but it can't interfere with all other mm sharing the shmem
> > > file) would be welcome...
> >
> > I think it has to be a new variety of swap-like non_swap_entry() pte,
> > see include/linux/swapops.h. Anything else would be more troublesome.
> >
> > Search for non_swap_entry and for migration_entry, to find places that
> > might need to learn about this new variety.
> >
> > IIUC you only need a single value, no need to carve out another whole
> > swp_type: could probably be swp_offset 0 of any swp_type other than 0.
> >
> > Note that fork's copy_page_range() does not "copy ptes where a page
> > fault will fill them correctly", so would in effect put a pte_none
> > into the child where the parent has this uffd_wp entry. I don't know
> > anything about uffd versus fork, whether that would pose a problem.
>
> Thanks for the idea, Hugh!
>
> I thought about something similar today, but instead of swap entries, I was
> thinking about constantly filling in a pte with a value of "_PAGE_PROTNONE |
> _PAGE_UFFD_WP" when e.g. we'd like to zap a page with shmem+uffd-wp. I feel
> like the fundamental idea is similar - we can somehow keep the pte with uffd-wp
> information even if zapped/swapped-out, so as long as the shmem access will
> fruther trap into the fault handler, then we can operate on that pte and read
> that information out, like recover that pte into a normal pte (with swap/page
> cache, and vma/addr information, we'll be able to) and then we can retry the
> fault.

Yes, I think that should work too: I can't predict which way would cause
less trouble.

We usually tend to keep away from protnone games, because NUMA balancing
use of protnone is already confusing enough.

But those ptes will be pte_present(), so you must provide a pfn, and I
think if you use the zero_pfn, vm_normal_page() will return false on it,
and avoid special casing (and reference counting) it in various places.

Hugh