Re: [PATCH v3 00/25] user_namespace: introduce fsid mappings

From: Jann Horn
Date: Wed Feb 19 2020 - 10:33:32 EST

On Tue, Feb 18, 2020 at 3:35 PM Christian Brauner
<christian.brauner@xxxxxxxxxx> wrote:
> - Let the keyctl infrastructure only operate on kfsid which are always
> mapped/looked up in the id mappings similar to what we do for
> filesystems that have the same superblock visible in multiple user
> namespaces.
> This version also comes with minimal tests which I intend to expand in
> the future.
> From pings and off-list questions and discussions at Google Container
> Security Summit there seems to be quite a lot of interest in this
> patchset with use-cases ranging from layer sharing for app containers
> and k8s, as well as data sharing between containers with different id
> mappings. I haven't Cced all people because I don't have all the email
> adresses at hand but I've at least added Phil now. :)
> This is the implementation of shiftfs which was cooked up during lunch at
> Linux Plumbers 2019 the day after the container's microconference. The
> idea is a design-stew from StÃphane, Aleksa, Eric, and myself (and by
> now also Jann.
> Back then we all were quite busy with other work and couldn't really sit
> down and implement it. But I took a few days last week to do this work,
> including demos and performance testing.
> This implementation does not require us to touch the VFS substantially
> at all. Instead, we implement shiftfs via fsid mappings.
> With this patch, it took me 20 mins to port both LXD and LXC to support
> shiftfs via fsid mappings.

Can you please grep through the kernel for all uses of ->fsuid and
->fsgid and fix them up appropriately? Some cases I still see:

The SafeSetID LSM wants to enforce that you can only use CAP_SETUID to
gain the privileges of a specific set of IDs:

static int safesetid_task_fix_setuid(struct cred *new,
const struct cred *old,
int flags)

/* Do nothing if there are no setuid restrictions for our old RUID. */
if (setuid_policy_lookup(old->uid, INVALID_UID) == SIDPOL_DEFAULT)
return 0;

if (uid_permitted_for_cred(old, new->uid) &&
uid_permitted_for_cred(old, new->euid) &&
uid_permitted_for_cred(old, new->suid) &&
uid_permitted_for_cred(old, new->fsuid))
return 0;

* Kill this process to avoid potential security vulnerabilities
* that could arise from a missing whitelist entry preventing a
* privileged process from dropping to a lesser-privileged one.
return -EACCES;

This could theoretically be bypassed through setfsuid() if the kuid
based on the fsuid mappings is permitted but the kuid based on the
normal mappings is not.

fs/coredump.c in suid dump mode uses "cred->fsuid = GLOBAL_ROOT_UID";
this should probably also fix up the other uid, even if there is no
scenario in which it would actually be used at the moment?

The netfilter xt_owner stuff makes packet filtering decisions based on
the ->fsuid; it might be better to filter on the ->kfsuid so that you
can filter traffic from different user namespaces differently?

audit_log_task_info() is doing "from_kuid(&init_user_ns, cred->fsuid)".