Re: [PATCH v5 1/1] fs: Allow no_new_privs tasks to call chroot(2)

From: Mickaël Salaün
Date: Tue Mar 30 2021 - 13:02:29 EST


Hi,

Is there new comments on this patch? Could we move forward?

Regards,
Mickaël


On 16/03/2021 21:36, Mickaël Salaün wrote:
> From: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
>
> Being able to easily change root directories enables to ease some
> development workflow and can be used as a tool to strengthen
> unprivileged security sandboxes. chroot(2) is not an access-control
> mechanism per se, but it can be used to limit the absolute view of the
> filesystem, and then limit ways to access data and kernel interfaces
> (e.g. /proc, /sys, /dev, etc.).
>
> Users may not wish to expose namespace complexity to potentially
> malicious processes, or limit their use because of limited resources.
> The chroot feature is much more simple (and limited) than the mount
> namespace, but can still be useful. As for containers, users of
> chroot(2) should take care of file descriptors or data accessible by
> other means (e.g. current working directory, leaked FDs, passed FDs,
> devices, mount points, etc.). There is a lot of literature that discuss
> the limitations of chroot, and users of this feature should be aware of
> the multiple ways to bypass it. Using chroot(2) for security purposes
> can make sense if it is combined with other features (e.g. dedicated
> user, seccomp, LSM access-controls, etc.).
>
> One could argue that chroot(2) is useless without a properly populated
> root hierarchy (i.e. without /dev and /proc). However, there are
> multiple use cases that don't require the chrooting process to create
> file hierarchies with special files nor mount points, e.g.:
> * A process sandboxing itself, once all its libraries are loaded, may
> not need files other than regular files, or even no file at all.
> * Some pre-populated root hierarchies could be used to chroot into,
> provided for instance by development environments or tailored
> distributions.
> * Processes executed in a chroot may not require access to these special
> files (e.g. with minimal runtimes, or by emulating some special files
> with a LD_PRELOADed library or seccomp).
>
> Allowing a task to change its own root directory is not a threat to the
> system if we can prevent confused deputy attacks, which could be
> performed through execution of SUID-like binaries. This can be
> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
> prctl(2). To only affect this task, its filesystem information must not
> be shared with other tasks, which can be achieved by not passing
> CLONE_FS to clone(2). A similar no_new_privs check is already used by
> seccomp to avoid the same kind of security issues. Furthermore, because
> of its security use and to avoid giving a new way for attackers to get
> out of a chroot (e.g. using /proc/<pid>/root, or chroot/chdir), an
> unprivileged chroot is only allowed if the calling process is not
> already chrooted. This limitation is the same as for creating user
> namespaces.
>
> This change may not impact systems relying on other permission models
> than POSIX capabilities (e.g. Tomoyo). Being able to use chroot(2) on
> such systems may require to update their security policies.
>
> Only the chroot system call is relaxed with this no_new_privs check; the
> init_chroot() helper doesn't require such change.
>
> Allowing unprivileged users to use chroot(2) is one of the initial
> objectives of no_new_privs:
> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
> This patch is a follow-up of a previous one sent by Andy Lutomirski:
> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@xxxxxxxxxxxxxx/
>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> Cc: Christian Brauner <christian.brauner@xxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: David Howells <dhowells@xxxxxxxxxx>
> Cc: Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx>
> Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> Cc: James Morris <jmorris@xxxxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Cc: John Johansen <john.johansen@xxxxxxxxxxxxx>
> Cc: Kentaro Takeda <takedakn@xxxxxxxxxxxxx>
> Cc: Serge Hallyn <serge@xxxxxxxxxx>
> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
> Link: https://lore.kernel.org/r/20210316203633.424794-2-mic@xxxxxxxxxxx
> ---
>
> Changes since v4:
> * Use READ_ONCE(current->fs->users) (found by Jann Horn).
> * Remove ambiguous example in commit description.
> * Add Reviewed-by Kees Cook.
>
> Changes since v3:
> * Move the new permission checks to a dedicated helper
> current_chroot_allowed() to make the code easier to read and align
> with user_path_at(), path_permission() and security_path_chroot()
> calls (suggested by Kees Cook).
> * Remove now useless included file.
> * Extend commit description.
> * Rebase on v5.12-rc3 .
>
> Changes since v2:
> * Replace path_is_under() check with current_chrooted() to gain the same
> protection as create_user_ns() (suggested by Jann Horn). See commit
> 3151527ee007 ("userns: Don't allow creation if the user is chrooted")
>
> Changes since v1:
> * Replace custom is_path_beneath() with existing path_is_under().
> ---
> fs/open.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index e53af13b5835..480010a551b2 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -532,6 +532,24 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
> return error;
> }
>
> +static inline int current_chroot_allowed(void)
> +{
> + /*
> + * Changing the root directory for the calling task (and its future
> + * children) requires that this task has CAP_SYS_CHROOT in its
> + * namespace, or be running with no_new_privs and not sharing its
> + * fs_struct and not escaping its current root (cf. create_user_ns()).
> + * As for seccomp, checking no_new_privs avoids scenarios where
> + * unprivileged tasks can affect the behavior of privileged children.
> + */
> + if (task_no_new_privs(current) && READ_ONCE(current->fs->users) == 1 &&
> + !current_chrooted())
> + return 0;
> + if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
> + return 0;
> + return -EPERM;
> +}
> +
> SYSCALL_DEFINE1(chroot, const char __user *, filename)
> {
> struct path path;
> @@ -546,9 +564,10 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
> if (error)
> goto dput_and_out;
>
> - error = -EPERM;
> - if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
> + error = current_chroot_allowed();
> + if (error)
> goto dput_and_out;
> +
> error = security_path_chroot(&path);
> if (error)
> goto dput_and_out;
>