Re: [PATCH] fs: Make /proc/sys inodes be owned by global root.

From: Luis Chamberlain
Date: Mon Nov 26 2018 - 20:16:37 EST


On Mon, Nov 26, 2018 at 06:26:07PM +0100, Radoslaw Burny wrote:
> Due to a recent commit (d151ddc00498 - fs: Update i_[ug]id_(read|write)
> to translate relative to s_user_ns),

Recent? This is commit is from 2014 and present upstream since v4.8.
And the commit ID you mentioned in your commit log seems to be
incorrect. I get:

81754357770ebd900801231e7bc8d151ddc00498a fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns

> inodes under /proc/sys have -1
> written to their i_uid/i_gid members if a containing userns does not
> have entries for root in the uid/gid_map.

Thanks for the description of how to run into the issue described but
is there also a practical use case today where this is happening? I ask
as it would be good to know the severity of the issue in the real world
today.

> This wouldn't normally matter, because these values are not used for
> access checks. However, a later change (0bd23d09b874 - Don't modify
> inodes with a uid or gid unknown to the vfs) changes the kernel to
> prevent opens for write if the i_uid/i_gid field in the inode is -1,
> even if the /proc/sys-specific access checks would otherwise pass.
>
> This causes a problem: in a userns without root mapping, even the
> namespace creator cannot write to e.g. /proc/sys/kernel/shmmax.
> This change fixes the problem by overriding i_uid/i_gid back to
> GLOBAL_ROOT_UID/GID.

We really need Seth and Eric to provide guidance here as they were
the ones devising this long ago, but to me your solution seems backward.
Why allow any namespace to muck with /proc/sys/ seettings?

Let's recall that this case was a corner case, and writeback was the
biggest concern, and for that it was decided that you'd simply not get
write access, and so its read only. Its not clear to me if things like
proc were considered. For the regular file case the situation can be
addressed with chown, however we can't chown proc files.

> Tested: Used a repro program that creates a user namespace without any
> mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
> Before the change, it shows uid/gid of 65534,

I thought you said it would be uid/gid -1 without your patch?

> with the change it's 0.

Note that a good way to also test issues is with the lib/test_sysctl.c
module and the tools/testing/selftests/sysctl/sysctl.sh script, so if
you can device a test there, once we decide what to do that would be
appreciated.

Luis

> Signed-off-by: Radoslaw Burny <rburny@xxxxxxxxxx>
> ---
> fs/proc/proc_sysctl.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index c5cbbdff3c3d..67379a389658 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -499,6 +499,10 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
>
> if (root->set_ownership)
> root->set_ownership(head, table, &inode->i_uid, &inode->i_gid);
> + else {
> + inode->i_uid = GLOBAL_ROOT_UID;
> + inode->i_gid = GLOBAL_ROOT_GID;
> + }
>
> out:
> return inode;
> --
> 2.20.0.rc0.387.gc7a69e6b6c-goog
>