Good question, I'd imagine e.g., file sealing could forbid uffd (or however
it is called) registration on a file, and there would have to be a way to
reject files that have uffd registered. But it's certainly a valid concern -
and it raises the question to *what* we actually want to apply such a
concept. Random files? random memfd? most probably not. Special memfds
created with an ALLOW_UFFD flag? sounds like a good idea.
Note that when daemons open files, they may not be aware of what's underneath
but read that file directly. The attacker could still create the file with
uffd-wp enabled with any flag we introduce.
I also don't know the initial concept when uffd is designed and why it's
designed at pte level. Avoid vma manipulation should be a major factor, but I
can't say I understand all of them. Not sure whether Andrea has any input here.
AFAIU originally a) avoid signal handler madness and b) avoid VMA
modifications and c) avoid taking the mmap lock in write (well, that didn't
work out completely for uffd-wp for now IIRC).
Nadav fixed that; it's with read lock now just like when it's introduced.
Please see mwriteprotect_range() and commit 6ce64428d62026a10c.
That's why I think current uffd can still make sense with per-process concepts
and keep it that way. When register uffd-wp yes we need to do that for
multiple processes, but it also means each process is fully aware that this is
happening so it's kind of verified that this is wanted behavior for that
process. It'll happen with less "surprises", and smells safer.
I don't think that will not work out. It may require all the process to
support uffd-wp apis and cooperate, but that's so far how it should work for me
in a safe and self-contained way. Say, every process should be aware of what's
going to happen on blocked page faults.
That's a valid concern, although I wonder if it can just be handled via
specially marked memfds ("this memfd might get a uffd handler registered
later").
Yes, please see my above concern. So I think we at least reached concensus on:
(1) that idea is already not userfaultfd but something else; what's that is
still to be defined. And, (2) that definitely needs further thoughts and
context to support its validity and safety. Now uffd got people worried about
safety already, that's why all the uffd selinux and privileged_userfaultfd
sysctl comes to mainline; we'd wish good luck with the new concept!
OTOH, uffd whole idea is already in mainline, it has limitations on requiring
to rework all processes to support uffd-wp, but actually the same to MISSING
messages has already happened and our QE is testing those: that's what we do
with e.g. postcopy-migrating vhost-user enabled OVS-DPDK - we pass over uffd
registered with missing mode and let QEMU handle the page fault. So it's a bit
complicated but it should work. And I hope you can also agree we don't need to
block uffd before that idea settles.
The pte markers idea need comment; that's about implementation, and it'll be
great to have comments there or even NACK (better with a better suggestion,
though :). But the original idea of uffd that is pte-based has never changed.
Again, I am not sure if uffd-wp or softdirty make too much sense in general
when applied to shmem. But I'm happy to learn more.
Me too, I'm more than glad to know whether the page cache idea could be
welcomed or am I just wrong about it. Before I understand more things around
this, so far I still think the per-process based and fd-based solution of uffd
still makes sense.
I'd be curious about applications where the per-process approach would
actually solve something a per-fd approach couldn't solve. Maybe there are
some that I just can't envision.
Right, that's a good point.
Actually it could be when like virtio-mem that some process shouldn't have
write privilege, but we still allow some other process writting to the shmem.
Something like that.
(using shmem for a single process only isn't a use case I consider important
:) )
If you still remember the discussion about "having qemu start to use memfd and
shmem as default"? :)
shmem is hard but it's indeed useful in many cases, even if single threaded.
For example, shmem-based VMs can do local binary update without migrating guest
RAMs (because memory is shared between old/new binaries!). To me it's always a
valid request to enable both shmem and write protect.