Re: [PATCH v2 4/5] memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy
From: Jeff Xu
Date: Wed Aug 16 2023 - 01:14:40 EST
On Mon, Aug 14, 2023 at 1:41 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
>
> This sysctl has the very unusual behaviour of not allowing any user (even
> CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you
> were to set this sysctl to a more restrictive option in the host pidns
> you would need to reboot your machine in order to reset it.
>
> The justification given in [1] is that this is a security feature and
> thus it should not be possible to disable. Aside from the fact that we
> have plenty of security-related sysctls that can be disabled after being
> enabled (fs.protected_symlinks for instance), the protection provided by
> the sysctl is to stop users from being able to create a binary and then
> execute it. A user with CAP_SYS_ADMIN can trivially do this without
> memfd_create(2):
>
> % cat mount-memfd.c
> #include <fcntl.h>
> #include <string.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <linux/mount.h>
>
> #define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:"
>
> int main(void)
> {
> int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
> assert(fsfd >= 0);
> assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2));
>
> int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
> assert(dfd >= 0);
>
> int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782);
> assert(execfd >= 0);
> assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE));
> assert(!close(execfd));
>
> char *execpath = NULL;
> char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL };
> execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC);
> assert(execfd >= 0);
> assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0);
> assert(!execve(execpath, argv, envp));
> }
> % ./mount-memfd
> this file was executed from this totally private tmpfs: /proc/self/fd/5
> %
>
> Given that it is possible for CAP_SYS_ADMIN users to create executable
> binaries without memfd_create(2) and without touching the host
> filesystem (not to mention the many other things a CAP_SYS_ADMIN process
> would be able to do that would be equivalent or worse), it seems strange
> to cause a fair amount of headache to admins when there doesn't appear
> to be an actual security benefit to blocking this. There appear to be
> concerns about confused-deputy-esque attacks[2] but a confused deputy that
> can write to arbitrary sysctls is a bigger security issue than
> executable memfds.
>
Something to point out: The demo code might be enough to prove your
case in other distributions, however, in ChromeOS, you can't run this
code. The executable in ChromeOS are all from known sources and
verified at boot.
If an attacker could run this code in ChromeOS, that means the
attacker already acquired arbitrary code execution through other ways,
at that point, the attacker no longer needs to create/find an
executable memfd, they already have the vehicle. You can't use an
example of an attacker already running arbitrary code to prove that
disable downgrading is useless.
I agree it is a big problem that an attacker already can modify a
sysctl. Assuming this can happen by controlling arguments passed into
sysctl, at the time, the attacker might not have full arbitrary code
execution yet, that is the reason the original design is so
restrictive.
Best regards,
-Jeff