Re: [PATCH v5 1/1] fs: Allow no_new_privs tasks to call chroot(2)
From: Mickaël Salaün
Date: Tue Mar 30 2021 - 15:28:33 EST
On 30/03/2021 20:40, Casey Schaufler wrote:
> On 3/30/2021 11:11 AM, Mickaël Salaün wrote:
>> On 30/03/2021 19:19, Casey Schaufler wrote:
>>> On 3/30/2021 10:01 AM, Mickaël Salaün wrote:
>>>> Hi,
>>>>
>>>> Is there new comments on this patch? Could we move forward?
>>> I don't see that new comments are necessary when I don't see
>>> that you've provided compelling counters to some of the old ones.
>> Which ones? I don't buy your argument about the beauty of CAP_SYS_CHROOT.
>
> CAP_SYS_CHROOT, namespaces. Bind mounts. The restrictions on
> "unprivileged" chroot being sufficiently onerous to make it
> unlikely to be usable.
There is multiple use cases for these features.
>
>>> It's possible to use minimal privilege with CAP_SYS_CHROOT.
>> CAP_SYS_CHROOT can lead to privilege escalation.
>
> Not when used in conjunction with the same set of
> restrictions you're requiring for "unprivileged" chroot.
I'm talking about security with the principle of least privilege: when
we consider that a process may be(come) malicious but should still be
able to drop (more) accesses, e.g. with prctl(set_no_new_privs) *then*
chroot()
>
>>> It looks like namespaces provide alternatives for all your
>>> use cases.
>> I explained in the commit message why it is not the case. In a nutshell,
>> namespaces bring complexity which may not be required.
>
> So? I can use a Swiss Army Knife to cut a string even though it
> has a corkscrew.
Complexity leads to (security) issues. In secure systems, we want to
reduce the attack surfaces. There is some pointers here:
https://lwn.net/Articles/673597/
>
>> When designing a
>> secure system, we want to avoid giving access to such complexity to
>> untrusted processes (i.e. more complexity leads to more bugs).
>
> If you're *really* designing a secure system you can design it to
> use existing mechanisms, like CAP_SYS_CHROOT!
Not always. For instance, in the case of a web browser, we don't want to
give CAP_SYS_CHROOT to every users just because their browser could
(legitimately) use it as a security sandbox mechanism. The same
principle can be applied to a lot of use cases, e.g. network services,
file parsers, etc.
>
>> An
>> unprivileged chroot would enable to give just the minimum feature to
>> drop some accesses. Of course it is not enough on its own, but it can be
>> combined with existing (and future) security features.
>
> Like NO_NEW_PRIVS, namespaces and capabilities!
> You don't need anything new!
If a process is compromised before chrooting itself and dropping
CAP_SYS_CHROOT, then there is a bigger security issue than without
CAP_SYS_CHROOT.
>
>>> The constraints required to make this work are quite
>>> limiting. Where is the real value add?
>> As explain in the commit message, it is useful when hardening
>> applications (e.g. network services, browsers, parsers, etc.). We don't
>> want an untrusted (or compromised) application to have CAP_SYS_CHROOT
>> nor (complex) namespace access.
>
> If you can ensure that an unprivileged application is
> always run with NO_NEW_PRIVS you could also ensure that
> it runs with only CAP_SYS_CHROOT or in an appropriate
> namespace. I believe that it would be easier for your
> particular use case. I don't believe that is sufficient.
You can't always have this assertion, e.g. because a user may require to
run (legitimate) SETUID binaries…
For everyone following a defense in depth approach (i.e. multiple layers
of security), an unprivileged chroot is valuable.
>
>>>> Regards,
>>>> Mickaël
>>>>
>>>>
>>>> On 16/03/2021 21:36, Mickaël Salaün wrote:
>>>>> From: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
>>>>>
>>>>> Being able to easily change root directories enables to ease some
>>>>> development workflow and can be used as a tool to strengthen
>>>>> unprivileged security sandboxes. chroot(2) is not an access-control
>>>>> mechanism per se, but it can be used to limit the absolute view of the
>>>>> filesystem, and then limit ways to access data and kernel interfaces
>>>>> (e.g. /proc, /sys, /dev, etc.).
>>>>>
>>>>> Users may not wish to expose namespace complexity to potentially
>>>>> malicious processes, or limit their use because of limited resources.
>>>>> The chroot feature is much more simple (and limited) than the mount
>>>>> namespace, but can still be useful. As for containers, users of
>>>>> chroot(2) should take care of file descriptors or data accessible by
>>>>> other means (e.g. current working directory, leaked FDs, passed FDs,
>>>>> devices, mount points, etc.). There is a lot of literature that discuss
>>>>> the limitations of chroot, and users of this feature should be aware
> of
>>>>> the multiple ways to bypass it. Using chroot(2) for security purposes
>>>>> can make sense if it is combined with other features (e.g. dedicated
>>>>> user, seccomp, LSM access-controls, etc.).
>>>>>
>>>>> One could argue that chroot(2) is useless without a properly populated
>>>>> root hierarchy (i.e. without /dev and /proc). However, there are
>>>>> multiple use cases that don't require the chrooting process to create
>>>>> file hierarchies with special files nor mount points, e.g.:
>>>>> * A process sandboxing itself, once all its libraries are loaded, may
>>>>> not need files other than regular files, or even no file at all.
>>>>> * Some pre-populated root hierarchies could be used to chroot into,
>>>>> provided for instance by development environments or tailored
>>>>> distributions.
>>>>> * Processes executed in a chroot may not require access to these special
>>>>> files (e.g. with minimal runtimes, or by emulating some special files
>>>>> with a LD_PRELOADed library or seccomp).
>>>>>
>>>>> Allowing a task to change its own root directory is not a threat to the
>>>>> system if we can prevent confused deputy attacks, which could be
>>>>> performed through execution of SUID-like binaries. This can be
>>>>> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
>>>>> prctl(2). To only affect this task, its filesystem information must
> not
>>>>> be shared with other tasks, which can be achieved by not passing
>>>>> CLONE_FS to clone(2). A similar no_new_privs check is already used by
>>>>> seccomp to avoid the same kind of security issues. Furthermore, because
>>>>> of its security use and to avoid giving a new way for attackers to get
>>>>> out of a chroot (e.g. using /proc/<pid>/root, or chroot/chdir), an
>>>>> unprivileged chroot is only allowed if the calling process is not
>>>>> already chrooted. This limitation is the same as for creating user
>>>>> namespaces.
>>>>>
>>>>> This change may not impact systems relying on other permission models
>>>>> than POSIX capabilities (e.g. Tomoyo). Being able to use chroot(2) on
>>>>> such systems may require to update their security policies.
>>>>>
>>>>> Only the chroot system call is relaxed with this no_new_privs check;
> the
>>>>> init_chroot() helper doesn't require such change.
>>>>>
>>>>> Allowing unprivileged users to use chroot(2) is one of the initial
>>>>> objectives of no_new_privs:
>>>>> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
>>>>> This patch is a follow-up of a previous one sent by Andy Lutomirski:
>>>>> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@xxxxxxxxxxxxxx/
>>>>>
>>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>>>> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>>>> Cc: Christian Brauner <christian.brauner@xxxxxxxxxx>
>>>>> Cc: Christoph Hellwig <hch@xxxxxx>
>>>>> Cc: David Howells <dhowells@xxxxxxxxxx>
>>>>> Cc: Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx>
>>>>> Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
>>>>> Cc: James Morris <jmorris@xxxxxxxxx>
>>>>> Cc: Jann Horn <jannh@xxxxxxxxxx>
>>>>> Cc: John Johansen <john.johansen@xxxxxxxxxxxxx>
>>>>> Cc: Kentaro Takeda <takedakn@xxxxxxxxxxxxx>
>>>>> Cc: Serge Hallyn <serge@xxxxxxxxxx>
>>>>> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
>>>>> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
>>>>> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
>>>>> Link: https://lore.kernel.org/r/20210316203633.424794-2-mic@xxxxxxxxxxx
>>>>> ---
>>>>>
>>>>> Changes since v4:
>>>>> * Use READ_ONCE(current->fs->users) (found by Jann Horn).
>>>>> * Remove ambiguous example in commit description.
>>>>> * Add Reviewed-by Kees Cook.
>>>>>
>>>>> Changes since v3:
>>>>> * Move the new permission checks to a dedicated helper
>>>>> current_chroot_allowed() to make the code easier to read and align
>>>>> with user_path_at(), path_permission() and security_path_chroot()
>>>>> calls (suggested by Kees Cook).
>>>>> * Remove now useless included file.
>>>>> * Extend commit description.
>>>>> * Rebase on v5.12-rc3 .
>>>>>
>>>>> Changes since v2:
>>>>> * Replace path_is_under() check with current_chrooted() to gain the same
>>>>> protection as create_user_ns() (suggested by Jann Horn). See commit
>>>>> 3151527ee007 ("userns: Don't allow creation if the user is chrooted")
>>>>>
>>>>> Changes since v1:
>>>>> * Replace custom is_path_beneath() with existing path_is_under().
>>>>> ---
>>>>> fs/open.c | 23 +++++++++++++++++++++--
>>>>> 1 file changed, 21 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/open.c b/fs/open.c
>>>>> index e53af13b5835..480010a551b2 100644
>>>>> --- a/fs/open.c
>>>>> +++ b/fs/open.c
>>>>> @@ -532,6 +532,24 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
>>>>> return error;
>>>>> }
>>>>>
>>>>> +static inline int current_chroot_allowed(void)
>>>>> +{
>>>>> + /*
>>>>> + * Changing the root directory for the calling task (and its future
>>>>> + * children) requires that this task has CAP_SYS_CHROOT in its
>>>>> + * namespace, or be running with no_new_privs and not sharing its
>>>>> + * fs_struct and not escaping its current root (cf. create_user_ns()).
>>>>> + * As for seccomp, checking no_new_privs avoids scenarios where
>>>>> + * unprivileged tasks can affect the behavior of privileged children.
>>>>> + */
>>>>> + if (task_no_new_privs(current) && READ_ONCE(current->fs->users) ==
>>> 1 &&
>>>>> + !current_chrooted())
>>>>> + return 0;
>>>>> + if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>>>>> + return 0;
>>>>> + return -EPERM;
>>>>> +}
>>>>> +
>>>>> SYSCALL_DEFINE1(chroot, const char __user *, filename)
>>>>> {
>>>>> struct path path;
>>>>> @@ -546,9 +564,10 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
>>>>> if (error)
>>>>> goto dput_and_out;
>>>>>
>>>>> - error = -EPERM;
>>>>> - if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>>>>> + error = current_chroot_allowed();
>>>>> + if (error)
>>>>> goto dput_and_out;
>>>>> +
>>>>> error = security_path_chroot(&path);
>>>>> if (error)
>>>>> goto dput_and_out;
>>>>>
>