Re: [PATCH v20 08/12] landlock: Add syscall implementations
From: Mickaël Salaün
Date: Thu Aug 13 2020 - 16:30:25 EST
Arnd and Michael, what do you think about these new syscalls?
On 02/08/2020 23:58, Mickaël Salaün wrote:
> These 4 system calls are designed to be used by unprivileged processes
> to sandbox themselves:
> * landlock_get_features(2): Gets the supported features (required for
> backward and forward compatibility, and best-effort security).
> * landlock_create_ruleset(2): Creates a ruleset and returns its file
> descriptor.
> * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
> ruleset, identified by the dedicated file descriptor.
> * landlock_enforce_ruleset(2): Enforces a ruleset on the current thread
> and its future children (similar to seccomp). This syscall has the
> same usage restrictions as seccomp(2): the caller must have the
> no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
> namespace.
>
> All these syscalls have an options argument (not currently used) to
> enable extensibility.
>
> Here are the motivations for these new syscalls:
> * A sandboxed process may not have access to file systems, including
> /dev, /sys or /proc, but it should still be able to add more
> restrictions to itself.
> * Neither prctl(2) nor seccomp(2) (which was used in a previous version)
> fit well with the current definition of a Landlock security policy.
> * It is quite easy to whitelist this syscall with seccomp-bpf to enable
> all processes to use it. It is also easy to filter specific commands
> or options to restrict a process to a subset of Landlock features.
>
> All variable attributes are checked at build time to ensure that they
> don't contain holes and that they are aligned the same way for each
> architecture. The struct landlock_attr_features contains __u32
> options_* fields which is enough to store 32-bits syscall arguments, and
> __u16 size_attr_* fields which is enough for the maximal struct size
> (i.e. page size) passed through the landlock syscall. The other fields
> can have __u64 type for flags and bitfields, and __s32 type for file
> descriptors.
>
> See the user and kernel documentation for more details (provided by a
> following commit): Documentation/security/landlock/
>
> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxx>
> Cc: Arnd Bergmann <arnd@xxxxxxxx>
> Cc: James Morris <jmorris@xxxxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Serge E. Hallyn <serge@xxxxxxxxxx>
> ---
>
> Changes since v19:
> * Replace the landlock(2) syscall with 4 syscalls (one for each
> command): landlock_get_features(2), landlock_create_ruleset(2),
> landlock_add_rule(2) and landlock_enforce_ruleset(2) (suggested by
> Arnd Bergmann).
> https://lore.kernel.org/lkml/56d15841-e2c1-2d58-59b8-3a6a09b23b4a@xxxxxxxxxxx/
> * Return EOPNOTSUPP (instead of ENOPKG) when Landlock is disabled.
> * Add two new fields to landlock_attr_features to fit with the new
> syscalls: last_rule_type and last_target_type. This enable to easily
> identify which types are supported.
> * Pack landlock_attr_path_beneath struct because of the removed
> ruleset_fd.
> * Update documentation and fix spelling.
>
> Changes since v18:
> * Remove useless include.
> * Remove LLATTR_SIZE() which was only used to shorten lines. Cf. commit
> bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column warning").
>
> Changes since v17:
> * Synchronize syscall declaration.
> * Fix comment.
>
> Changes since v16:
> * Add a size_attr_features field to struct landlock_attr_features for
> self-introspection, and move the access_fs field to be more
> consistent.
> * Replace __aligned_u64 types of attribute fields with __u16, __s32,
> __u32 and __u64, and check at build time that these structures does
> not contain hole and that they are aligned the same way (8-bits) on
> all architectures. This shrinks the size of the userspace ABI, which
> may be appreciated especially for struct landlock_attr_features which
> could grow a lot in the future. For instance, struct
> landlock_attr_features shrinks from 72 bytes to 32 bytes. This change
> also enables to remove 64-bits to 32-bits conversion checks.
> * Switch syscall attribute pointer and size arguments to follow similar
> syscall argument order (e.g. bpf, clone3, openat2).
> * Set LANDLOCK_OPT_* types to 32-bits.
> * Allow enforcement of empty ruleset, which enables deny-all policies.
> * Fix documentation inconsistency.
>
> Changes since v15:
> * Do not add file descriptors referring to internal filesystems (e.g.
> nsfs) in a ruleset.
> * Replace is_user_mountable() with in-place clean checks.
> * Replace EBADR with EBADFD in get_ruleset_from_fd() and
> get_path_from_fd().
> * Remove ruleset's show_fdinfo() for now.
>
> Changes since v14:
> * Remove the security_file_open() check in get_path_from_fd(): an
> opened FD should not be restricted here, and even less with this hook.
> As a result, it is now allowed to add a path to a ruleset even if the
> access to this path is not allowed (without O_PATH). This doesn't
> change the fact that enforcing a ruleset can't grant any right, only
> remove some rights. The new layer levels add more consistent
> restrictions.
> * Check minimal landlock_attr_* size/content. This fix the case when
> no data was provided and e.g., FD 0 was interpreted as ruleset_fd.
> Now this leads to a returned -EINVAL.
> * Fix credential double-free error case.
> * Complete struct landlock_attr_size with size_attr_enforce.
> * Fix undefined reference to syscall when Landlock is not selected.
> * Remove f.file->f_path.mnt check (suggested by Al Viro).
> * Add build-time checks.
> * Move ABI checks from fs.c .
> * Constify variables.
> * Fix spelling.
> * Add comments.
>
> Changes since v13:
> * New implementation, replacing the dependency on seccomp(2) and bpf(2).
> ---
> include/linux/syscalls.h | 8 +
> include/uapi/linux/landlock.h | 166 ++++++++++
> kernel/sys_ni.c | 6 +
> security/landlock/Makefile | 2 +-
> security/landlock/syscall.c | 554 ++++++++++++++++++++++++++++++++++
> 5 files changed, 735 insertions(+), 1 deletion(-)
> create mode 100644 security/landlock/syscall.c
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b951a87da987..8373fb28eb44 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -1005,6 +1005,14 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
> siginfo_t __user *info,
> unsigned int flags);
> asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
> +asmlinkage long sys_landlock_get_features(struct landlock_attr_features __user *features_ptr,
> + size_t features_size, __u32 options);
> +asmlinkage long sys_landlock_create_ruleset(const struct landlock_attr_ruleset __user *ruleset_ptr,
> + size_t ruleset_size, __u32 options);
> +asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
> + const void __user *rule_ptr, size_t rule_size, __u32 options);
> +asmlinkage long sys_landlock_enforce_ruleset(int ruleset_fd, enum landlock_target_type target_type,
> + int target_fd, __u32 options);
>
> /*
> * Architecture-specific system calls
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 5141185e6487..eb2a5e0d61a4 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -9,6 +9,172 @@
> #ifndef _UAPI__LINUX_LANDLOCK_H__
> #define _UAPI__LINUX_LANDLOCK_H__
>
> +#include <linux/types.h>
> +
> +#if 0
> +/**
> + * DOC: options_intro
> + *
> + * These options may be used as second argument of sys_landlock(). Each
> + * command have a dedicated set of options, represented as bitmasks. For two
> + * different commands, their options may overlap. Each command have at least
> + * one option defining the used attribute type. This also enables to always
> + * have a usable &struct landlock_attr_features (i.e. filled with bits).
> + */
> +#endif
> +
> +/**
> + * enum landlock_rule_type - Landlock rule type
> + *
> + * Argument of sys_landlock_add_rule().
> + */
> +enum landlock_rule_type {
> + /**
> + * @LANDLOCK_RULE_PATH_BENEATH: Type of a &struct
> + * landlock_attr_path_beneath .
> + */
> + LANDLOCK_RULE_PATH_BENEATH = 1,
> +};
> +
> +/**
> + * enum landlock_target_type - Landlock target type
> + *
> + * Argument of sys_landlock_enforce_ruleset().
> + */
> +enum landlock_target_type {
> + /**
> + * @LANDLOCK_TARGET_CURRENT_THREAD: Enforce a ruleset on the thread
> + * asking for (i.e. seccomp-like).
> + */
> + LANDLOCK_TARGET_CURRENT_THREAD = 1,
> +};
> +
> +/**
> + * struct landlock_attr_features - Receives the supported features
> + *
> + * This struct should be allocated by user space but it will be filled by the
> + * kernel to indicate the subset of Landlock features effectively handled by
> + * the running kernel. This enables backward compatibility for applications
> + * which are developed on a newer kernel than the one running the application.
> + * This helps avoid hard errors that may entirely disable the use of Landlock
> + * features because some of them may not be supported. Indeed, because
> + * Landlock is a security feature, even if the kernel doesn't support all the
> + * requested features, user space applications should still use the subset
> + * which is supported by the running kernel. Indeed, a partial security policy
> + * can still improve the security of the application and better protect the
> + * user (i.e. best-effort approach). The %LANDLOCK_CMD_GET_FEATURES command
> + * and &struct landlock_attr_features are future-proof because the future
> + * unknown fields requested by user space (i.e. a larger &struct
> + * landlock_attr_features) can still be filled with zeros.
> + *
> + * The Landlock commands will fail if an unsupported option or access is
> + * requested. By firstly requesting the supported options and accesses, it is
> + * quite easy for the developer to binary AND these returned bitmasks with the
> + * used options and accesses from the attribute structs (e.g. &struct
> + * landlock_attr_ruleset), and even infer the supported Landlock commands.
> + * Indeed, because each command must support at least one option, the options_*
> + * fields are always filled if the related commands are supported. The
> + * supported attributes are also discoverable thanks to the size_* fields. All
> + * this data enable to create applications doing their best to sandbox
> + * themselves regardless of the running kernel.
> + */
> +struct landlock_attr_features {
> + /**
> + * @options_get_features: Options supported by
> + * sys_landlock_get_features().
> + */
> + __u32 options_get_features;
> + /**
> + * @options_create_ruleset: Options supported by
> + * sys_landlock_create_ruleset().
> + */
> + __u32 options_create_ruleset;
> + /**
> + * @options_add_rule: Options supported by sys_landlock_add_rule().
> + */
> + __u32 options_add_rule;
> + /**
> + * @options_enforce_ruleset: Options supported by
> + * sys_landlock_enforce_ruleset().
> + */
> + __u32 options_enforce_ruleset;
> + /**
> + * @access_fs: Subset of file system access supported by the running
> + * kernel, used in &landlock_attr_ruleset.handled_access_fs and
> + * &landlock_attr_path_beneath.allowed_access . Cf. `Filesystem
> + * flags`_.
> + */
> + __u64 access_fs;
> + /**
> + * @size_attr_features: Size of the &struct landlock_attr_features
> + * (current struct) as known by the kernel (i.e. ``sizeof(struct
> + * landlock_attr_features)``).
> + */
> + __u16 size_attr_features;
> + /**
> + * @size_attr_ruleset: Size of the &struct landlock_attr_ruleset as
> + * known by the kernel (i.e. ``sizeof(struct
> + * landlock_attr_ruleset)``).
> + */
> + __u16 size_attr_ruleset;
> + /**
> + * @size_attr_path_beneath: Size of the &struct
> + * landlock_attr_path_beneath as known by the kernel (i.e.
> + * ``sizeof(struct landlock_attr_path_beneath)``).
> + */
> + __u16 size_attr_path_beneath;
> + /**
> + * @last_rule_type: Indicate the last entry of &enum
> + * landlock_rule_type.
> + */
> + __u8 last_rule_type;
> + /**
> + * @last_target_type: Indicate the last entry of &enum
> + * landlock_target_type.
> + */
> + __u8 last_target_type;
> +};
> +
> +/**
> + * struct landlock_attr_ruleset- Defines a new ruleset
> + *
> + * Used as first attribute for the %LANDLOCK_CMD_CREATE_RULESET command and
> + * with the %LANDLOCK_OPT_CREATE_RULESET option.
> + */
> +struct landlock_attr_ruleset {
> + /**
> + * @handled_access_fs: Bitmask of actions (cf. `Filesystem flags`_)
> + * that is handled by this ruleset and should then be forbidden if no
> + * rule explicitly allow them. This is needed for backward
> + * compatibility reasons. The user space code should check the
> + * effectively supported actions thanks to %LANDLOCK_CMD_GET_FEATURES
> + * and &struct landlock_attr_features, and then adjust the arguments of
> + * the next calls to sys_landlock() accordingly.
> + */
> + __u64 handled_access_fs;
> +};
> +
> +/**
> + * struct landlock_attr_path_beneath - Defines a path hierarchy
> + */
> +struct landlock_attr_path_beneath {
> + /**
> + * @allowed_access: Bitmask of allowed actions for this file hierarchy
> + * (cf. `Filesystem flags`_).
> + */
> + __u64 allowed_access;
> + /**
> + * @parent_fd: File descriptor, open with ``O_PATH``, which identify
> + * the parent directory of a file hierarchy, or just a file.
> + */
> + __s32 parent_fd;
> + /*
> + * This struct is packed to enable to append future members without
> + * requiring to have dummy reserved members.
> + * Cf. security/landlock/syscall.c:build_check_abi()
> + */
> +} __attribute__((packed));
> +
> /**
> * DOC: fs_access
> *
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 3b69a560a7ac..d0f3cf333d86 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -264,6 +264,12 @@ COND_SYSCALL(request_key);
> COND_SYSCALL(keyctl);
> COND_SYSCALL_COMPAT(keyctl);
>
> +/* security/landlock/syscall.c */
> +COND_SYSCALL(landlock_get_features);
> +COND_SYSCALL(landlock_create_ruleset);
> +COND_SYSCALL(landlock_add_rule);
> +COND_SYSCALL(landlock_enforce_ruleset);
> +
> /* arch/example/kernel/sys_example.c */
>
> /* mm/fadvise.c */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index 92e3d80ab8ed..4388494779ec 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -1,4 +1,4 @@
> obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>
> -landlock-y := setup.o object.o ruleset.o \
> +landlock-y := setup.o syscall.o object.o ruleset.o \
> cred.o ptrace.o fs.o
> diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c
> new file mode 100644
> index 000000000000..7bf4dc175dee
> --- /dev/null
> +++ b/security/landlock/syscall.c
> @@ -0,0 +1,554 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock LSM - System call and user space interfaces
> + *
> + * Copyright © 2016-2020 Mickaël Salaün <mic@xxxxxxxxxxx>
> + * Copyright © 2018-2020 ANSSI
> + */
> +
> +#include <asm/current.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/build_bug.h>
> +#include <linux/capability.h>
> +#include <linux/compiler_types.h>
> +#include <linux/dcache.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/fs.h>
> +#include <linux/limits.h>
> +#include <linux/mount.h>
> +#include <linux/path.h>
> +#include <linux/sched.h>
> +#include <linux/security.h>
> +#include <linux/stddef.h>
> +#include <linux/syscalls.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <uapi/linux/landlock.h>
> +
> +#include "cred.h"
> +#include "fs.h"
> +#include "ruleset.h"
> +#include "setup.h"
> +
> +/**
> + * copy_struct_if_any_from_user - Safe future-proof argument copying
> + *
> + * Extend copy_struct_from_user() to handle NULL @src, which allows for future
> + * use of @src even if it is not used right now.
> + *
> + * @dst: Kernel space pointer or NULL.
> + * @ksize: Actual size of the data pointed to by @dst.
> + * @ksize_min: Minimal required size to be copied.
> + * @src: User space pointer or NULL.
> + * @usize: (Alleged) size of the data pointed to by @src.
> + */
> +static int copy_struct_if_any_from_user(void *const dst, const size_t ksize,
> + const size_t ksize_min, const void __user *const src,
> + const size_t usize)
> +{
> + int ret;
> +
> + /* Checks kernel buffer size inconsistencies. */
> + if (dst) {
> + if (WARN_ON_ONCE(ksize == 0))
> + return -EFAULT;
> + } else {
> + if (WARN_ON_ONCE(ksize != 0))
> + return -EFAULT;
> + }
> +
> + /* Checks minimal size. */
> + if (WARN_ON_ONCE(ksize < ksize_min))
> + return -EFAULT;
> + if (usize < ksize_min)
> + return -EINVAL;
> +
> + /* Handles empty user buffer. */
> + if (!src) {
> + if (usize != 0)
> + return -EFAULT;
> + if (dst)
> + memset(dst, 0, ksize);
> + return 0;
> + }
> +
> + /* Checks user buffer size inconsistency and limit. */
> + if (usize == 0)
> + return -ENODATA;
> + if (usize > PAGE_SIZE)
> + return -E2BIG;
> +
> + /* Copies user buffer and fills with zeros. */
> + if (dst)
> + return copy_struct_from_user(dst, ksize, src, usize);
> +
> + /* Checks unknown user data. */
> + ret = check_zeroed_user(src, usize);
> + if (ret <= 0)
> + return ret ?: -E2BIG;
> + return 0;
> +}
> +
> +/* Features */
> +
> +/*
> + * This function only contains arithmetic operations with constants, leading to
> + * BUILD_BUG_ON(). The related code is evaluated and checked at build time,
> + * but it is then ignored thanks to compiler optimizations.
> + */
> +static void build_check_abi(void)
> +{
> + size_t size_features, size_ruleset, size_path_beneath;
> +
> + /*
> + * For each user space ABI structures, first checks that there is no
> + * hole in them, then checks that all architectures have the same
> + * struct size.
> + */
> + size_features = sizeof_field(struct landlock_attr_features, options_get_features);
> + size_features += sizeof_field(struct landlock_attr_features, options_create_ruleset);
> + size_features += sizeof_field(struct landlock_attr_features, options_add_rule);
> + size_features += sizeof_field(struct landlock_attr_features, options_enforce_ruleset);
> + size_features += sizeof_field(struct landlock_attr_features, access_fs);
> + size_features += sizeof_field(struct landlock_attr_features, size_attr_features);
> + size_features += sizeof_field(struct landlock_attr_features, size_attr_ruleset);
> + size_features += sizeof_field(struct landlock_attr_features, size_attr_path_beneath);
> + size_features += sizeof_field(struct landlock_attr_features, last_rule_type);
> + size_features += sizeof_field(struct landlock_attr_features, last_target_type);
> + BUILD_BUG_ON(sizeof(struct landlock_attr_features) != size_features);
> + BUILD_BUG_ON(sizeof(struct landlock_attr_features) != 32);
> +
> + size_ruleset = sizeof_field(struct landlock_attr_ruleset, handled_access_fs);
> + BUILD_BUG_ON(sizeof(struct landlock_attr_ruleset) != size_ruleset);
> + BUILD_BUG_ON(sizeof(struct landlock_attr_ruleset) != 8);
> +
> + size_path_beneath = sizeof_field(struct landlock_attr_path_beneath, allowed_access);
> + size_path_beneath += sizeof_field(struct landlock_attr_path_beneath, parent_fd);
> + BUILD_BUG_ON(sizeof(struct landlock_attr_path_beneath) != size_path_beneath);
> + BUILD_BUG_ON(sizeof(struct landlock_attr_path_beneath) != 12);
> +}
> +
> +/**
> + * sys_landlock_get_features - Identify the supported Landlock features
> + *
> + * @features_ptr: Pointer to a &struct landlock_attr_features to be filled by
> + * the supported features.
> + * @features_size: Size of the pointed &struct landlock_attr_features (needed
> + * for backward and forward compatibility).
> + * @options: Must be 0.
> + *
> + * This system call enables to ask the kernel for supported Landlock features.
> + * This is important to build user space code compatible with older and newer
> + * kernels.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0;
> + * - ENODATA, E2BIG or EFAULT: @features_ptr or @feature_size inconsistencies.
> + */
> +SYSCALL_DEFINE3(landlock_get_features,
> + struct landlock_attr_features __user *const, features_ptr,
> + const size_t, features_size, const __u32, options)
> +{
> + size_t data_size, fill_size;
> + const struct landlock_attr_features supported = {
> + .options_get_features = 0,
> + .options_create_ruleset = 0,
> + .options_add_rule = 0,
> + .options_enforce_ruleset = 0,
> + .access_fs = _LANDLOCK_ACCESS_FS_MASK,
> + .size_attr_features = sizeof(struct landlock_attr_features),
> + .size_attr_ruleset = sizeof(struct landlock_attr_ruleset),
> + .size_attr_path_beneath = sizeof(struct landlock_attr_path_beneath),
> + .last_rule_type = LANDLOCK_RULE_PATH_BENEATH,
> + .last_target_type = LANDLOCK_TARGET_CURRENT_THREAD,
> + };
> +
> + BUILD_BUG_ON(!__same_type(supported.access_fs,
> + ((struct landlock_attr_ruleset *)NULL)->handled_access_fs));
> + BUILD_BUG_ON(!__same_type(supported.access_fs,
> + ((struct landlock_attr_path_beneath *)NULL)->allowed_access));
> + build_check_abi();
> +
> + /*
> + * Enables user space to identify if Landlock is disabled, thanks to a
> + * specific error code.
> + */
> + if (!landlock_initialized)
> + return -EOPNOTSUPP;
> +
> + /* No option for now. */
> + if (options)
> + return -EINVAL;
> +
> + /* Checks argument consistency. */
> + if (features_size == 0)
> + return -ENODATA;
> + if (features_size > PAGE_SIZE)
> + return -E2BIG;
> +
> + /* Copy a subset of features to user space. */
> + data_size = min(sizeof(supported), features_size);
> + if (copy_to_user(features_ptr, &supported, data_size))
> + return -EFAULT;
> +
> + /* Fills with zeros. */
> + fill_size = features_size - data_size;
> + if (fill_size > 0 && clear_user((void __user *)features_ptr + data_size, fill_size))
> + return -EFAULT;
> + return 0;
> +}
> +
> +/* Ruleset handling */
> +
> +static int fop_ruleset_release(struct inode *const inode,
> + struct file *const filp)
> +{
> + struct landlock_ruleset *ruleset = filp->private_data;
> +
> + landlock_put_ruleset(ruleset);
> + return 0;
> +}
> +
> +static ssize_t fop_dummy_read(struct file *const filp, char __user *const buf,
> + const size_t size, loff_t *const ppos)
> +{
> + /* Dummy handler to enable FMODE_CAN_READ. */
> + return -EINVAL;
> +}
> +
> +static ssize_t fop_dummy_write(struct file *const filp,
> + const char __user *const buf, const size_t size,
> + loff_t *const ppos)
> +{
> + /* Dummy handler to enable FMODE_CAN_WRITE. */
> + return -EINVAL;
> +}
> +
> +/*
> + * A ruleset file descriptor enables to build a ruleset by adding (i.e.
> + * writing) rule after rule, without relying on the task's context. This
> + * reentrant design is also used in a read way to enforce the ruleset on the
> + * current task.
> + */
> +static const struct file_operations ruleset_fops = {
> + .release = fop_ruleset_release,
> + .read = fop_dummy_read,
> + .write = fop_dummy_write,
> +};
> +
> +/**
> + * sys_landlock_create_ruleset - Create a new ruleset
> + *
> + * @ruleset_ptr: Pointer to a &struct landlock_attr_ruleset identifying the
> + * scope of the new ruleset.
> + * @ruleset_size: Size of the pointed &struct landlock_attr_ruleset (needed for
> + * backward and forward compatibility).
> + * @options: Must be 0.
> + *
> + * This system call enables to create a new Landlock ruleset, and returns the
> + * related file descriptor on success.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0, or unknown access, or too small @ruleset_size;
> + * - ENODATA, E2BIG or EFAULT: @ruleset_ptr or @ruleset_size inconsistencies;
> + * - ENOMSG: empty &landlock_attr_ruleset.handled_access_fs.
> + */
> +SYSCALL_DEFINE3(landlock_create_ruleset,
> + const struct landlock_attr_ruleset __user *const, ruleset_ptr,
> + const size_t, ruleset_size, const __u32, options)
> +{
> + struct landlock_attr_ruleset attr_ruleset;
> + struct landlock_ruleset *ruleset;
> + int err, ruleset_fd;
> +
> + if (!landlock_initialized)
> + return -EOPNOTSUPP;
> +
> + /* No option for now. */
> + if (options)
> + return -EINVAL;
> +
> + /* Copies raw user space buffer. */
> + err = copy_struct_if_any_from_user(&attr_ruleset, sizeof(attr_ruleset),
> + offsetofend(typeof(attr_ruleset), handled_access_fs),
> + ruleset_ptr, ruleset_size);
> + if (err)
> + return err;
> +
> + /* Checks content (and 32-bits cast). */
> + if ((attr_ruleset.handled_access_fs | _LANDLOCK_ACCESS_FS_MASK) !=
> + _LANDLOCK_ACCESS_FS_MASK)
> + return -EINVAL;
> +
> + /* Checks arguments and transforms to kernel struct. */
> + ruleset = landlock_create_ruleset(attr_ruleset.handled_access_fs);
> + if (IS_ERR(ruleset))
> + return PTR_ERR(ruleset);
> +
> + /* Creates anonymous FD referring to the ruleset. */
> + ruleset_fd = anon_inode_getfd("landlock-ruleset", &ruleset_fops,
> + ruleset, O_RDWR | O_CLOEXEC);
> + if (ruleset_fd < 0)
> + landlock_put_ruleset(ruleset);
> + return ruleset_fd;
> +}
> +
> +/*
> + * Returns an owned ruleset from a FD. It is thus needed to call
> + * landlock_put_ruleset() on the return value.
> + */
> +static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
> + const fmode_t mode)
> +{
> + struct fd ruleset_f;
> + struct landlock_ruleset *ruleset;
> + int err;
> +
> + ruleset_f = fdget(fd);
> + if (!ruleset_f.file)
> + return ERR_PTR(-EBADF);
> +
> + /* Checks FD type and access right. */
> + err = 0;
> + if (ruleset_f.file->f_op != &ruleset_fops)
> + err = -EBADFD;
> + else if (!(ruleset_f.file->f_mode & mode))
> + err = -EPERM;
> + if (!err) {
> + ruleset = ruleset_f.file->private_data;
> + landlock_get_ruleset(ruleset);
> + }
> + fdput(ruleset_f);
> + return err ? ERR_PTR(err) : ruleset;
> +}
> +
> +/* Path handling */
> +
> +/*
> + * @path: Must call put_path(@path) after the call if it succeeded.
> + */
> +static int get_path_from_fd(const s32 fd, struct path *const path)
> +{
> + struct fd f;
> + int err = 0;
> +
> + BUILD_BUG_ON(!__same_type(fd,
> + ((struct landlock_attr_path_beneath *)NULL)->parent_fd));
> +
> + /* Handles O_PATH. */
> + f = fdget_raw(fd);
> + if (!f.file)
> + return -EBADF;
> + /*
> + * Only allows O_PATH file descriptor: enables to restrict ambient
> + * filesystem access without requiring to open and risk leaking or
> + * misusing a file descriptor. Forbid internal filesystems (e.g.
> + * nsfs), including pseudo filesystems that will never be mountable
> + * (e.g. sockfs, pipefs).
> + */
> + if (!(f.file->f_mode & FMODE_PATH) ||
> + (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
> + (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
> + d_is_negative(f.file->f_path.dentry) ||
> + IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
> + err = -EBADFD;
> + goto out_fdput;
> + }
> + path->mnt = f.file->f_path.mnt;
> + path->dentry = f.file->f_path.dentry;
> + path_get(path);
> +
> +out_fdput:
> + fdput(f);
> + return err;
> +}
> +
> +/**
> + * sys_landlock_add_rule - Add a new rule to a ruleset
> + *
> + * @ruleset_fd: File descriptor tied to the ruleset which should be extended
> + * with the new rule.
> + * @rule_type: Identify the structure type pointed to by @rule_ptr.
> + * @rule_ptr: Pointer to a rule (the currently only supported rule is &struct
> + * landlock_attr_path_beneath).
> + * @rule_size: Size of the struct pointed to by @rule_ptr.
> + * @options: Must be 0.
> + *
> + * This system call enables to define a new rule and add it to an existing
> + * ruleset.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0, or inconsistent access in the rule (i.e.
> + * &landlock_attr_path_beneath.allowed_access is not a subset of the rule's
> + * accesses), or too small @rule_size (according to the underlying rule
> + * type);
> + * - EBADF: @ruleset_fd is not a file descriptor for the current thread;
> + * - EBADFD: @ruleset_fd is not a ruleset file descriptor;
> + * - EPERM: @ruleset_fd has no write access to the underlying ruleset;
> + * - ENODATA, E2BIG or EFAULT: @rule_ptr or @rule_size inconsistencies;
> + */
> +SYSCALL_DEFINE5(landlock_add_rule,
> + const int, ruleset_fd, const enum landlock_rule_type, rule_type,
> + const void __user *const, rule_ptr, const size_t, rule_size,
> + const __u32, options)
> +{
> + struct landlock_attr_path_beneath attr_path_beneath;
> + struct path path;
> + struct landlock_ruleset *ruleset;
> + int err;
> +
> + if (!landlock_initialized)
> + return -EOPNOTSUPP;
> +
> + /* No option for now. */
> + if (options)
> + return -EINVAL;
> +
> + if (rule_type != LANDLOCK_RULE_PATH_BENEATH)
> + return -EINVAL;
> +
> + /* Copies raw user space buffer. */
> + err = copy_struct_if_any_from_user(&attr_path_beneath,
> + sizeof(attr_path_beneath),
> + offsetofend(typeof(attr_path_beneath), allowed_access),
> + rule_ptr, rule_size);
> + if (err)
> + return err;
> +
> + /* Gets and checks the ruleset. */
> + ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_WRITE);
> + if (IS_ERR(ruleset))
> + return PTR_ERR(ruleset);
> +
> + /*
> + * Checks that allowed_access matches the @ruleset constraints
> + * (ruleset->fs_access_mask is automatically upgraded to 64-bits).
> + * Allows empty allowed_access i.e., deny @ruleset->fs_access_mask .
> + */
> + if ((attr_path_beneath.allowed_access | ruleset->fs_access_mask) !=
> + ruleset->fs_access_mask) {
> + err = -EINVAL;
> + goto out_put_ruleset;
> + }
> +
> + /* Gets and checks the new rule. */
> + err = get_path_from_fd(attr_path_beneath.parent_fd, &path);
> + if (err)
> + goto out_put_ruleset;
> +
> + /* Imports the new rule. */
> + err = landlock_append_fs_rule(ruleset, &path,
> + attr_path_beneath.allowed_access);
> + path_put(&path);
> +
> +out_put_ruleset:
> + landlock_put_ruleset(ruleset);
> + return err;
> +}
> +
> +/* Enforcement */
> +
> +/**
> + * sys_landlock_enforce_ruleset - Enforce a ruleset
> + *
> + * @ruleset_fd: File descriptor tied to the ruleset to merge with the target.
> + * @target_type: Identify which type of target to enforce the ruleset on,
> + * currently only the current thread is supported (i.e.
> + * seccomp-like).
> + * @target_fd: Must be -1.
> + * @options: Must be 0.
> + *
> + * This system call enables to enforce a Landlock ruleset on the current
> + * thread. Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in its
> + * namespace or be running with no_new_privs. This avoids scenarios where
> + * unprivileged tasks can affect the behavior of privileged children.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0, or @target_type is not
> + * %LANDLOCK_TARGET_CURRENT_THREAD, or @target_fd is not -1;
> + * - EBADF: @ruleset_fd is not a file descriptor for the current thread;
> + * - EBADFD: @ruleset_fd is not a ruleset file descriptor;
> + * - EPERM: @ruleset_fd has no read access to the underlying ruleset, or the
> + * current thread is not running with no_new_privs (or doesn't have
> + * CAP_SYS_ADMIN in its namespace).
> + */
> +SYSCALL_DEFINE4(landlock_enforce_ruleset,
> + const int, ruleset_fd, const enum landlock_target_type, target_type,
> + const int, target_fd, const __u32, options)
> +{
> + struct landlock_ruleset *new_dom, *ruleset;
> + struct cred *new_cred;
> + struct landlock_cred_security *new_llcred;
> + int err;
> +
> + if (!landlock_initialized)
> + return -EOPNOTSUPP;
> +
> + /* No option for now. */
> + if (options)
> + return -EINVAL;
> +
> + /* Only target the current thread for now. */
> + if (target_type != LANDLOCK_TARGET_CURRENT_THREAD)
> + return -EINVAL;
> + if (target_fd != -1)
> + return -EINVAL;
> +
> + /*
> + * Similar checks as for seccomp(2), except that an -EPERM may be
> + * returned.
> + */
> + if (!task_no_new_privs(current)) {
> + err = security_capable(current_cred(), current_user_ns(),
> + CAP_SYS_ADMIN, CAP_OPT_NOAUDIT);
> + if (err)
> + return err;
> + }
> +
> + /* Gets and checks the ruleset. */
> + ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_READ);
> + if (IS_ERR(ruleset))
> + return PTR_ERR(ruleset);
> +
> + /* Prepares new credentials. */
> + new_cred = prepare_creds();
> + if (!new_cred) {
> + err = -ENOMEM;
> + goto out_put_ruleset;
> + }
> + new_llcred = landlock_cred(new_cred);
> +
> + /*
> + * There is no possible race condition while copying and manipulating
> + * the current credentials because they are dedicated per thread.
> + */
> + new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset);
> + if (IS_ERR(new_dom)) {
> + err = PTR_ERR(new_dom);
> + goto out_put_creds;
> + }
> +
> + /* Replaces the old (prepared) domain. */
> + landlock_put_ruleset(new_llcred->domain);
> + new_llcred->domain = new_dom;
> +
> + landlock_put_ruleset(ruleset);
> + return commit_creds(new_cred);
> +
> +out_put_creds:
> + abort_creds(new_cred);
> + return err;
> +
> +out_put_ruleset:
> + landlock_put_ruleset(ruleset);
> + return err;
> +}
>