Re: [PATCH v2] fs: Fix the default values of i_uid/i_gid on /proc/sys inodes.

From: Radoslaw Burny
Date: Fri Jul 05 2019 - 18:19:47 EST


On Fri, Jul 5, 2019 at 10:02 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
>
> Please re-state the main fix in the commit log, not just the subject.

Sure, I'll do this. Just to make sure - for every iteration on the
commit message, I need to increment the patch "version" and resend the
whole patch, right?

>
> Also, this does not explain why the current values are and the impact to
> systems / users. This would help in determine and evaluating if this
> deserves to be a stable fix.

This commit a (much overdue) resend of https://lkml.org/lkml/2018/11/30/990
I think Eric's comment on the previous thread explained it best:

> We spoke about this at LPC. And this is the correct behavioral change.
>
> The problem is there is a default value for i_uid and i_gid that is
> correct in the general case. That default value is not corect for
> sysctl, because proc is weird. As the sysctl permission check in
> test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not
> notice that i_uid and i_gid were being set wrong.
>
> So all this patch does is fix the default values i_uid and i_gid.

If my new commit message is still not conveying this clearly, feel
free to suggest the specific wording (I'm new to the kernel patch
process, and I might not be explaining the problems well enough).

>
>
> On Fri, Jul 05, 2019 at 06:30:21PM +0200, Radoslaw Burny wrote:
> > This also fixes a problem where, in a user namespace without root user
> > mapping, it is not possible to write to /proc/sys/kernel/shmmax.
>
> This does not explain why that should be possible and what impact this
> limitation has.

Writing to /proc/sys/kernel/shmmax allows setting a shared memory
limit for that container. Since this is usually a part of container's
initial configuration, one would expect that the container's owner /
creator is able to set the limit. Yet, due to the bug described here,
no process can write the container's shmmax if the container's user
namespace does not contain root mapping.

Using a container with no root mapping seems to be a rare case, but we
do use this configuration at Google, which is how I found the issue.
Also, we use a generic tool to configure the container limits, and the
inability to write any of them causes a hard failure.

>
> > The problem was introduced by the combination of the two commits:
> > * 81754357770ebd900801231e7bc8d151ddc00498: fs: Update
> > i_[ug]id_(read|write) to translate relative to s_user_ns
> > - this caused the kernel to write INVALID_[UG]ID to i_uid/i_gid
> > members of /proc/sys inodes if a containing userns does not have
> > entries for root in the uid/gid_map.
> This is 2014 commit merged as of v4.8.
>
> > * 0bd23d09b874e53bd1a2fe2296030aa2720d7b08: vfs: Don't modify inodes
> > with a uid or gid unknown to the vfs
> > - changed the kernel to prevent opens for write if the i_uid/i_gid
> > field in the inode is invalid
>
> This is a 2016 commit merged as of v4.8 as well...
>
> So regardless of the dates of the commits, are you saying this is a
> regression you can confirm did not exist prior to v4.8? Did you test
> v4.7 to confirm?

I assume no one has noticed this issue before because it requires such
a specific combination of triggers.
Yes, I've tested this with older kernel versions. I've additionally
tested a 4.8 build with just 0aa2720d7b08 reverted, confirming that
the revert fixes the issue.

>
> > This commit fixes the issue by defaulting i_uid/i_gid to
> > GLOBAL_ROOT_UID/GID.
>
> Why is this right?

Quoting Eric: "the sysctl permission check in test_perm are all
against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID".
The values in the inode are not even read during test_perm, but
logically, the inode belongs to the root of the namespace.

>
> > Note that these values are not used for /proc/sys
> > access checks, so the change does not otherwise affect /proc semantics.
> >
> > Tested: Used a repro program that creates a user namespace without any
> > mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
> > Before the change, it shows the overflow uid, with the change it's 0.
>
> Why is the overflow uid bad for user experience? Did you test prior to
> v4.8, ie on v4.7 to confirm this is indeed a regression?
>
> You'd want then to also ammend in the commit log a Fixes: tag with both
> commits listed. If this is a stable fix (criteria yet to be determined),
> then we'd need a stable tag.

The overflow is technically correct; the uid in the inode is invalid,
hence it must be displayed as overflow uid. The fact that the uid is
invalid is the issue.
Logically, this commit fixes 81754357770e (as that commit first
introduced invalid uid/gid values). If you agree, I'll add this to my
updated commit.

>
> Luis
>
> > Signed-off-by: Radoslaw Burny <rburny@xxxxxxxxxx>
> > ---
> > Changelog since v1:
> > - Updated the commit title and description.
> >
> > fs/proc/proc_sysctl.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> > index c74570736b24..36ad1b0d6259 100644
> > --- a/fs/proc/proc_sysctl.c
> > +++ b/fs/proc/proc_sysctl.c
> > @@ -499,6 +499,10 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
> >
> > if (root->set_ownership)
> > root->set_ownership(head, table, &inode->i_uid, &inode->i_gid);
> > + else {
> > + inode->i_uid = GLOBAL_ROOT_UID;
> > + inode->i_gid = GLOBAL_ROOT_GID;
> > + }
> >
> > return inode;
> > }
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
> >

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature