Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

From: Andrea Arcangeli
Date: Thu Mar 14 2019 - 12:16:42 EST


On Thu, Mar 14, 2019 at 11:58:15AM +0100, Paolo Bonzini wrote:
> On 14/03/19 00:44, Andrea Arcangeli wrote:
> > Then I thought we can add a tristate so an open of /dev/kvm would also
> > allow the syscall to make things more user friendly because
> > unprivileged containers ideally should have writable mounts done with
> > nodev and no matter the privilege they shouldn't ever get an hold on
> > the KVM driver (and those who do, like kubevirt, will then just work).
>
> I wouldn't even bother with the KVM special case. Containers can use
> seccomp if they want a fine-grained policy.

We can have a single boolean 0|1 and stick to a simpler sysctl and no
gid and if you want to use userfaultfd you need to enable it for all
users. I agree seccomp already provides more than enough granularity
to do more finegrined choices.

So this will be for who's paranoid and prefers to disable userfaultfd
as a whole as an hardening feature like the bpf sysctl allows: it will
allow to block uffd syscall without having to rebuild the kernel with
CONFIG_USERFAULTFD=n in environments where seccomp cannot be easily
enabled (i.e. without requiring userland changes).

That's very fine with me, but then it wasn't me complaining in the
first place. Kees?

If the above is ok, we can implement it as a static key, not that the
syscall itself is particularly performance critical but it'll be
simple enough as a boolean (only the ioctl are performance critical
but those are unaffected).

The blog post about UAF is not particularly interesting in my view,
unless both of the following points are true 1) it can be also proven
that the very same two UAF bugs, cannot be exploited by other means
(as far as I can tell it can be exploited by other means regardless of
userfaultfd) and 2) the slab randomization was actually enabled (99%
of the time in all POC all randomization features like kalsr are
incidentally disabled first to facilitate publishing papers and blog
posts, but those are really the features intended to reduce the
reproduciblity of exploits against UAF bugs, not disabling userfaultfd
which only provides a minor advantage, and unlike in PoC environments,
we enable those slab randomization in production 100% of the time
whenever they're available in the kernel).

Thanks,
Andrea