On Fri, Dec 02, 2022 at 01:27:48PM +0100, David Hildenbrand wrote:
Currently, we don't enable writenotify when enabling userfaultfd-wp on
a shared writable mapping (for now we only support SHMEM). The consequence
and hugetlbfs
is that vma->vm_page_prot will still include write permissions, to be set
as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
page migration, ...).
The thing is by default I think we want the write bit..
The simple example is (1) register UFFD_WP on shmem writable, (2) write a
page. Here we didn't wr-protect anything, so we want the write bit there.
Or the other example is when UFFDIO_COPY with flags==0 even if with
VM_UFFD_WP. We definitely wants the write bit.
We only doesn't want the write bit when uffd-wp is explicitly set.
I think fundamentally the core is uffd-wp is pte-based, so the information
resides in pte not vma. I'm not strongly objecting this patch, especially
you mentioned auto-numa so I need to have a closer look later there.
However I do think uffd-wp is slightly special because we always need to
consider pte information anyway, so a per-vma information doesn't hugely
help, IMHO.
Running the mprotect() reproducer [1] without this commit:
$ ./uffd-wp-mprotect
FAIL: uffd-wp did not fire
Running the mprotect() reproducer with this commit:
$ ./uffd-wp-mprotect
PASS: uffd-wp fired
[1] https://lore.kernel.org/all/222fc0b2-6ec0-98e7-833f-ea868b248446@xxxxxxxxxx/T/#u
I still hope for a formal patch (non-rfc) we can have a reproducer outside
mprotect(). IMHO mprotect() is really ambiguously here being used with
uffd-wp, so not a good example IMO as I explained in the other thread [1].
I'll need to off-work most of the rest of today, but maybe I can also have
a look in the weekend or Monday more on the numa paths. Before that, can
we first reach a consensus that we have the mm/migrate patch there to be
merged first? These are two issues, IMHO.
I know you're against me for some reason, but until now I sincerely don't
know why. That patch sololy recovers write bit status (by removing it for
read-only) for a migration entry and that definitely makes sense to me. As
I also mentioned in the old version of that thread, we can rework migration
entries and merge READ|WRITE entries into a GENERIC entry one day if you
think proper, but that's for later.
Let me provide another example why I think recovering write bit may matter
outside uffd too so it's common and it's always good to explicit check it.
If you still remember for sparc64 (I think you're also in the loop)
pte_mkdirty will also apply write bit there; even though I don't know why
but it works like that for years. Consider below sequence:
- map a page writable, write to page (fault in, set dirty)
- mprotect(RO) on the page (keep dirty bit, vma/pte becomes RO)
- migrate the page
- mk_pte returns with WRITE bit cleared (which is correct)
- pte_mkdirty set dirty and write bit (because dirty used to set)
If without the previous mm/migrate patch [1] IIUC it'll allow the pte
writable even without VM_WRITE here after migration.
I'm not sure whether I missed something, nor can I write a reproducer
because I don't have sparc64 systems on hand, not to mention time. But
hopefully I explained why I think it's safer to just always double check on
the write bit to be the same as when before migration, irrelevant of
uffd-wp, vma pgprot or mk_pte().
For this patch itself, I'll rethink again when I read more on numa.