Re: [PATCH v24 07/12] landlock: Support filesystem access-control

From: Mickaël Salaün
Date: Sat Nov 21 2020 - 05:08:47 EST



On 21/11/2020 08:00, Jann Horn wrote:
> On Thu, Nov 12, 2020 at 9:52 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote:
>> Thanks to the Landlock objects and ruleset, it is possible to identify
>> inodes according to a process's domain. To enable an unprivileged
>> process to express a file hierarchy, it first needs to open a directory
>> (or a file) and pass this file descriptor to the kernel through
>> landlock_add_rule(2). When checking if a file access request is
>> allowed, we walk from the requested dentry to the real root, following
>> the different mount layers. The access to each "tagged" inodes are
>> collected according to their rule layer level, and ANDed to create
>> access to the requested file hierarchy. This makes possible to identify
>> a lot of files without tagging every inodes nor modifying the
>> filesystem, while still following the view and understanding the user
>> has from the filesystem.
>>
>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
>> keep the same struct inodes for the same inodes whereas these inodes are
>> in use.
>>
>> This commit adds a minimal set of supported filesystem access-control
>> which doesn't enable to restrict all file-related actions. This is the
>> result of multiple discussions to minimize the code of Landlock to ease
>> review. Thanks to the Landlock design, extending this access-control
>> without breaking user space will not be a problem. Moreover, seccomp
>> filters can be used to restrict the use of syscall families which may
>> not be currently handled by Landlock.
>>
>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>> Cc: Anton Ivanov <anton.ivanov@xxxxxxxxxxxxxxxxxx>
>> Cc: James Morris <jmorris@xxxxxxxxx>
>> Cc: Jann Horn <jannh@xxxxxxxxxx>
>> Cc: Jeff Dike <jdike@xxxxxxxxxxx>
>> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
>> Cc: Richard Weinberger <richard@xxxxxx>
>> Cc: Serge E. Hallyn <serge@xxxxxxxxxx>
>> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
>> ---
>>
>> Changes since v23:
>> * Enforce deterministic interleaved path rules. To have consistent
>> layered rules, granting access to a path implies that all accesses
>> tied to inodes, from the requested file to the real root, must be
>> checked. Otherwise, stacked rules may result to overzealous
>> restrictions. By excluding the ability to add exceptions in the same
>> layer (e.g. /a allowed, /a/b denied, and /a/b/c allowed), we get
>> deterministic interleaved path rules. This removes an optimization
>
> I don't understand the "deterministic interleaved path rules" part.

I explain bellow.

>
>
> What if I have a policy like this?
>
> /home/user READ
> /home/user/Downloads READ+WRITE
>
> That's a reasonable policy, right?

Definitely, I forgot this, thanks for the outside perspective!

>
> If I then try to open /home/user/Downloads/foo in WRITE mode, the loop
> will first check against the READ+WRITE rule for /home/user, that
> check will pass, and then it will check against the READ rule for /,
> which will deny the access, right? That seems bad.

Yes that was the intent.

>
>
> The v22 code ensured that for each layer, the most specific rule (the
> first we encounter on the walk) always wins, right? What's the problem
> with that?

This can be explained with the interleaved_masked_accesses test:
https://github.com/landlock-lsm/linux/blob/landlock-v24/tools/testing/selftests/landlock/fs_test.c#L647

In this case there is 4 stacked layers:
layer 1: allows s1d1/s1d2/s1d3/file1
layer 2: allows s1d1/s1d2/s1d3
denies s1d1/s1d2
layer 3: allows s1d1
layer 4: allows s1d1/s1d2

In the v23, access to file1 would be allowed until layer 3, but layer 4
would merge a new rule for the s1d2 inode. Because we don't record where
exactly the access come from, we can't tell that layer 2 allowed access
thanks to s1d3 and that its s1d2 rule was ignored. I think this behavior
doesn't make sense from the user point of view.

In the v24, access to file1 would only be allowed with layer 1. The
layer 2, would deny access to file1 because of the s1d2 rule. This makes
the reasoning consistent and deterministic whatever the layers are,
while storing the same access and layer bits. But I agree that this may
not be desirable.

In a perfect v25, file1 should be allowed by all these layers. I didn't
find a simple solution to this while minimizing the memory allocated by
rule (cf. struct landlock_rule: mainly 32-bits for access rights and
64-bits for the layers that contributed to this ANDed accesses). I would
like to avoid storing 32-bits access rights per stacked layer. Do you
see another solution?

>
>> which could be replaced by a proper cache mechanism. This also
>> further simplifies and explain check_access_path_continue().
>
>>From the interdiff between v23 and v24 (git range-diff
> 99ade5d59b23~1..99ade5d59b23 faa8c09be9fd~1..faa8c09be9fd):
>
> @@ security/landlock/fs.c (new)
> + rcu_dereference(landlock_inode(inode)->object));
> + rcu_read_unlock();
> +
> -+ /* Checks for matching layers. */
> -+ if (rule && (rule->layers | *layer_mask)) {
> -+ if ((rule->access & access_request) == access_request) {
> -+ *layer_mask &= ~rule->layers;
> -+ return true;
> -+ } else {
> -+ return false;
> -+ }
> ++ if (!rule)
> ++ /* Continues to walk if there is no rule for this inode. */
> ++ return true;
> ++ /*
> ++ * We must check all layers for each inode because we may encounter
> ++ * multiple different accesses from the same layer in a walk. Each
> ++ * layer must at least allow the access request one time (i.e. with one
> ++ * inode). This enables to have a deterministic behavior whatever
> ++ * inode is tagged within interleaved layers.
> ++ */
> ++ if ((rule->access & access_request) == access_request) {
> ++ /* Validates layers for which all accesses are allowed. */
> ++ *layer_mask &= ~rule->layers;
> ++ /* Continues to walk until all layers are validated. */
> ++ return true;
> + }
> -+ return true;
> ++ /* Stops if a rule in the path don't allow all requested access. */
> ++ return false;
> +}
> +
> +static int check_access_path(const struct landlock_ruleset *const domain,
> @@ security/landlock/fs.c (new)
> + &layer_mask)) {
> + struct dentry *parent_dentry;
> +
> -+ /* Stops when a rule from each layer granted access. */
> -+ if (layer_mask == 0) {
> -+ allowed = true;
> -+ break;
> -+ }
> -+
>
> This change also made it so that disconnected paths aren't accessible
> unless they're internal, right? While previously, the access could be
> permitted if the walk stops before reaching the disconnected point? I
> guess that's fine, but it should probably be documented.

Right, I should add a test for this case anyway.

>
> +jump_up:
> + /*
> + * Does not work with orphaned/private mounts like overlayfs
> @@ security/landlock/fs.c (new)
> + goto jump_up;
> + } else {
> + /*
> -+ * Stops at the real root. Denies access
> -+ * because not all layers have granted access.
> ++ * Stops at the real root. Denies access if
> ++ * not all layers granted access.
> + */
> -+ allowed = false;
> ++ allowed = (layer_mask == 0);
> + break;
> + }
> + }
>