Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
From: Andrea Arcangeli
Date: Wed Oct 23 2019 - 17:16:55 EST
On Wed, Oct 23, 2019 at 12:21:18PM -0700, Andy Lutomirski wrote:
> There are two things going on here.
>
> 1. Daniel wants to add LSM labels to userfaultfd objects. This seems
> reasonable to me. The question, as I understand it, is: who is the
> subject that creates a uffd referring to a forked child? I'm sure
> this is solvable in any number of straightforward ways, but I think
> it's less important than:
The new uffd created during fork would definitely need to be accounted
on the criu monitor, nor to the parent nor the child, so it'd need to
be accounted to the process/context that has the fd in its file
descriptors array. But since this is less important let's ignore this
for a second.
> 2. The existing ABI is busted independently of #1. Suppose you call
> userfaultfd to get a userfaultfd and enable UFFD_FEATURE_EVENT_FORK.
> Then you do:
>
> $ sudo <&[userfaultfd number]
>
> Sudo will read it and get a new fd unexpectedly added to its fd table.
> It's worse if SCM_RIGHTS is involved.
So the problem is just that a new fd is created. So for this to turn
out to a practical issue, it requires finding a reckless suid that
won't even bother checking the return value of the open/socket
syscalls or some equivalent fd number related side effect. All right
that makes more sense now and of course I agree it needs fixing.
> So I think we either need to declare that UFFD_FEATURE_EVENT_FORK is
> only usable by global root or we need to remove it and maybe re-add it
> in some other form.
If I had a time machine, I'd rather prefer to do the below:
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index fe6d804a38dc..574062051678 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1958,7 +1958,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
return -ENOMEM;
refcount_set(&ctx->refcount, 1);
- ctx->flags = flags;
+ ctx->flags = flags | UFFD_CLOEXEC;
ctx->features = 0;
ctx->state = UFFD_STATE_WAIT_API;
ctx->released = false;
I mean there's no strong requirement to allow any uffd to survive exec
even if UFFD_FEATURE_EVENT_FORK never existed, it's enough if it can
be passed through unix domain sockets.
Until UFFD_FEATURE_EVENT_FORK come around, there was no particular
reason to implicitly enforce O_CLOEXEC on all uffd, it was totally
possible to clone() and exec() to pass the fd to a different
process. So it never rang a bell that this would turn out to be a
problem after UFFD_FEATURE_EVENT_FORK was introduced.
There are various ways to approach this:
1) drop all non cooperative features and mark their feature bitflags
reserved (no ABI break)
2) enforce UFFD_CLOEXEC with above patch (potential ABI break all
userfaultfd users)
3) enforce UFFD_CLOEXEC if UFFD_FEATURE_EVENT_FORK is set (ABI break
only if UFFD_FEATURE_EVENT_FORK is set). Note all forked uffd
are opened with the same flags inherited from the original uffd.
4) enforce the global root permission check when creating the uffd only if
UFFD_FEATURE_EVENT_FORK is set.
5) drop all non cooperative features from API 0xaa and introduce API
0xab with the features back, but with UFFD_CLOEXEC implicitly
enforced and with UFFD_CLOEXEC forbidden to be set in the flags
6) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a
UFFD_FEATURE_EVENT_FORK2 that requires UFFD_CLOEXEC to be set
(instead of implicitly enforcing it)
7) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a
UFFD_FEATURE_EVENT_FORK2 that does the global root permission check
5 is the non-ABI-break version of 2.
6 is the non-ABI-break version of 3.
7 is the non-ABI-break version of 4.
My favorite is 1) for the reason explained in the previous email.
However if postcopy live migration of bare metal containers already
runs in production anywhere or is at least very close to reach that
milestone or if the non-cooperative features are used in production in
any other way, we'd like to know where and in such case that will
totally change my mind about it. In such case I'd suggest to pick any
of the other options except 1).
In short there shall be good reason for going through further
maintenance burden.
Thanks,
Andrea