Re: [PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

From: Lokesh Gidra
Date: Thu Nov 19 2020 - 22:11:14 EST


On Thu, Nov 19, 2020 at 7:04 PM Lokesh Gidra <lokeshgidra@xxxxxxxxxx> wrote:
>
> With this change, when the knob is set to 0, it allows unprivileged
> users to call userfaultfd, like when it is set to 1, but with the
> restriction that page faults from only user-mode can be handled.
> In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
> must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
> EPERM.
>
> This enables administrators to reduce the likelihood that an attacker
> with access to userfaultfd can delay faulting kernel code to widen
> timing windows for other exploits.
>
> The default value of this knob is changed to 0. This is required for
> correct functioning of pipe mutex. However, this will fail postcopy
> live migration, which will be unnoticeable to the VM guests. To avoid
> this, set 'vm.userfault = 1' in /sys/sysctl.conf.
>
> The main reason this change is desirable as in the short term is that
> the Android userland will behave as with the sysctl set to zero. So
> without this commit, any Linux binary using userfaultfd to manage its
> memory would behave differently if run within the Android userland.
> For more details, refer to Andrea's reply [1].
>
> [1] https://lore.kernel.org/lkml/20200904033438.GI9411@xxxxxxxxxx/
>
> Signed-off-by: Lokesh Gidra <lokeshgidra@xxxxxxxxxx>
> Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> ---
> Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++-----
> fs/userfaultfd.c | 10 ++++++++--
> 2 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index f455fa00c00f..d06a98b2a4e7 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone.
> unprivileged_userfaultfd
> ========================
>
> -This flag controls whether unprivileged users can use the userfaultfd
> -system calls. Set this to 1 to allow unprivileged users to use the
> -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
> -privileged users (with SYS_CAP_PTRACE capability).
> +This flag controls the mode in which unprivileged users can use the
> +userfaultfd system calls. Set this to 0 to restrict unprivileged users
> +to handle page faults in user mode only. In this case, users without
> +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
> +succeed. Prohibiting use of userfaultfd for handling faults from kernel
> +mode may make certain vulnerabilities more difficult to exploit.
>
> -The default value is 1.
> +Set this to 1 to allow unprivileged users to use the userfaultfd system
> +calls without any restrictions.
> +
> +The default value is 0.
>
>
> user_reserve_kbytes
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 605599fde015..894cc28142e7 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -28,7 +28,7 @@
> #include <linux/security.h>
> #include <linux/hugetlb.h>
>
> -int sysctl_unprivileged_userfaultfd __read_mostly = 1;
> +int sysctl_unprivileged_userfaultfd __read_mostly;
>
> static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
>
> @@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> struct userfaultfd_ctx *ctx;
> int fd;
>
> - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
> + if (!sysctl_unprivileged_userfaultfd &&
> + (flags & UFFD_USER_MODE_ONLY) == 0 &&
> + !capable(CAP_SYS_PTRACE)) {
> + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
> + "sysctl knob to 1 if kernel faults must be handled "
> + "without obtaining CAP_SYS_PTRACE capability\n");
> return -EPERM;
> + }
>
> BUG_ON(!current->mm);
>
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>
Adding linux-mm@xxxxxxxxx list