Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits

From: Jeff Xu
Date: Mon Jul 08 2024 - 13:54:39 EST


On Mon, Jul 8, 2024 at 9:17 AM Jeff Xu <jeffxu@xxxxxxxxxx> wrote:
>
> Hi
>
> On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote:
> >
> > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> > their *_LOCKED counterparts are designed to be set by processes setting
> > up an execution environment, such as a user session, a container, or a
> > security sandbox. Like seccomp filters or Landlock domains, the
> > securebits are inherited across proceses.
> >
> > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> > check executable resources with execveat(2) + AT_CHECK (see previous
> > patch).
> >
> > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).
> >
> Do we need both bits ?
> When CHECK is set and RESTRICT is not, the "check fail" executable
> will still get executed, so CHECK is for logging ?
> Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ?
>
The intention might be "permissive mode"? if so, consider reuse
existing selinux's concept, and still with 2 bits:
SECBIT_SHOULD_EXEC_RESTRICT
SECBIT_SHOULD_EXEC_RESTRICT_PERMISSIVE


-Jeff




> > For a secure environment, we might also want
> > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED
> > to be set. For a test environment (e.g. testing on a fleet to identify
> > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to
> > still be able to identify potential issues (e.g. with interpreters logs
> > or LSMs audit entries).
> >
> > It should be noted that unlike other security bits, the
> > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are
> > dedicated to user space willing to restrict itself. Because of that,
> > they only make sense in the context of a trusted environment (e.g.
> > sandbox, container, user session, full system) where the process
> > changing its behavior (according to these bits) and all its parent
> > processes are trusted. Otherwise, any parent process could just execute
> > its own malicious code (interpreting a script or not), or even enforce a
> > seccomp filter to mask these bits.
> >
> > Such a secure environment can be achieved with an appropriate access
> > control policy (e.g. mount's noexec option, file access rights, LSM
> > configuration) and an enlighten ld.so checking that libraries are
> > allowed for execution e.g., to protect against illegitimate use of
> > LD_PRELOAD.
> >
> > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > environment variables), but that is outside the scope of the kernel.
> >
> > The only restriction enforced by the kernel is the right to ptrace
> > another process. Processes are denied to ptrace less restricted ones,
> > unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to
> > avoid trivial privilege escalations e.g., by a debugging process being
> > abused with a confused deputy attack.
> >
> > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> > Cc: Christian Brauner <brauner@xxxxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > Cc: Paul Moore <paul@xxxxxxxxxxxxxx>
> > Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxx>
> > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@xxxxxxxxxxx
> > ---
> >
> > New design since v18:
> > https://lore.kernel.org/r/20220104155024.48023-3-mic@xxxxxxxxxxx
> > ---
> > include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++-
> > security/commoncap.c | 63 ++++++++++++++++++++++++++++-----
> > 2 files changed, 110 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
> > index d6d98877ff1a..3fdb0382718b 100644
> > --- a/include/uapi/linux/securebits.h
> > +++ b/include/uapi/linux/securebits.h
> > @@ -52,10 +52,64 @@
> > #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
> > (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
> >
> > +/*
> > + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable
> > + * files with execveat(2) + AT_CHECK. However, such check should only be
> > + * performed if all to-be-executed code only comes from regular files. For
> > + * instance, if a script interpreter is called with both a script snipped as
> > + * argument and a regular file, the interpreter should not check any file.
> > + * Doing otherwise would mislead the kernel to think that only the script file
> > + * is being executed, which could for instance lead to unexpected permission
> > + * change and break current use cases.
> > + *
> > + * This secure bit may be set by user session managers, service managers,
> > + * container runtimes, sandboxer tools... Except for test environments, the
> > + * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set.
> > + *
> > + * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK
> > + * but not the tracee. SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked.
> > + */
> > +#define SECURE_SHOULD_EXEC_CHECK 8
> > +#define SECURE_SHOULD_EXEC_CHECK_LOCKED 9 /* make bit-8 immutable */
> > +
> > +#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK))
> > +#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \
> > + (issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED))
> > +
> > +/*
> > + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK). For
> > + * instance, script interpreters called with a script snippet as argument
> > + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set.
> > + * However, if a script interpreter is called with both
> > + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should
> > + * interpret the provided script files if no unchecked code is also provided
> > + * (e.g. directly as argument).
> > + *
> > + * This secure bit may be set by user session managers, service managers,
> > + * container runtimes, sandboxer tools... Except for test environments, the
> > + * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set.
> > + *
> > + * Ptracing another process is deny if the tracer has
> > + * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee.
> > + * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked.
> > + */
> > +#define SECURE_SHOULD_EXEC_RESTRICT 10
> > +#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED 11 /* make bit-8 immutable */
> > +
> > +#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> > +#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \
> > + (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED))
> > +
> > #define SECURE_ALL_BITS (issecure_mask(SECURE_NOROOT) | \
> > issecure_mask(SECURE_NO_SETUID_FIXUP) | \
> > issecure_mask(SECURE_KEEP_CAPS) | \
> > - issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> > + issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
> > + issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> > + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> > #define SECURE_ALL_LOCKS (SECURE_ALL_BITS << 1)
> >
> > +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> > + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> > +
> > #endif /* _UAPI_LINUX_SECUREBITS_H */
> > diff --git a/security/commoncap.c b/security/commoncap.c
> > index 162d96b3a676..34b4493e2a69 100644
> > --- a/security/commoncap.c
> > +++ b/security/commoncap.c
> > @@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz)
> > return 0;
> > }
> >
> > +static bool ptrace_secbits_allowed(const struct cred *tracer,
> > + const struct cred *tracee)
> > +{
> > + const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED &
> > + tracer->securebits;
> > + const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED &
> > + tracee->securebits;
> > + /* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */
> > + const unsigned long tracer_locked = (tracer_secbits << 1) &
> > + tracer->securebits;
> > + const unsigned long tracee_locked = (tracee_secbits << 1) &
> > + tracee->securebits;
> > +
> > + /* The tracee must not have less constraints than the tracer. */
> > + if ((tracer_secbits | tracee_secbits) != tracee_secbits)
> > + return false;
> > +
> > + /*
> > + * Makes sure that the tracer's locks for restrictions are the same for
> > + * the tracee.
> > + */
> > + if ((tracer_locked | tracee_locked) != tracee_locked)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > /**
> > * cap_ptrace_access_check - Determine whether the current process may access
> > * another
> > @@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
> > else
> > caller_caps = &cred->cap_permitted;
> > if (cred->user_ns == child_cred->user_ns &&
> > - cap_issubset(child_cred->cap_permitted, *caller_caps))
> > + cap_issubset(child_cred->cap_permitted, *caller_caps) &&
> > + ptrace_secbits_allowed(cred, child_cred))
> > goto out;
> > if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
> > goto out;
> > @@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent)
> > cred = __task_cred(parent);
> > child_cred = current_cred();
> > if (cred->user_ns == child_cred->user_ns &&
> > - cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
> > + cap_issubset(child_cred->cap_permitted, cred->cap_permitted) &&
> > + ptrace_secbits_allowed(cred, child_cred))
> > goto out;
> > if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
> > goto out;
> > @@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
> > & (old->securebits ^ arg2)) /*[1]*/
> > || ((old->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/
> > || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/
> > - || (cap_capable(current_cred(),
> > - current_cred()->user_ns,
> > - CAP_SETPCAP,
> > - CAP_OPT_NONE) != 0) /*[4]*/
> > /*
> > * [1] no changing of bits that are locked
> > * [2] no unlocking of locks
> > * [3] no setting of unsupported bits
> > - * [4] doing anything requires privilege (go read about
> > - * the "sendmail capabilities bug")
> > */
> > )
> > /* cannot change a locked bit */
> > return -EPERM;
> >
> > + /*
> > + * Doing anything requires privilege (go read about the
> > + * "sendmail capabilities bug"), except for unprivileged bits.
> > + * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
> > + * restrictions enforced by the kernel but by user space on
> > + * itself. The kernel is only in charge of protecting against
> > + * privilege escalation with ptrace protections.
> > + */
> > + if (cap_capable(current_cred(), current_cred()->user_ns,
> > + CAP_SETPCAP, CAP_OPT_NONE) != 0) {
> > + const unsigned long unpriv_and_locks =
> > + SECURE_ALL_UNPRIVILEGED |
> > + SECURE_ALL_UNPRIVILEGED << 1;
> > + const unsigned long changed = old->securebits ^ arg2;
> > +
> > + /* For legacy reason, denies non-change. */
> > + if (!changed)
> > + return -EPERM;
> > +
> > + /* Denies privileged changes. */
> > + if (changed & ~unpriv_and_locks)
> > + return -EPERM;
> > + }
> > +
> > new = prepare_creds();
> > if (!new)
> > return -ENOMEM;
> > --
> > 2.45.2
> >