Re: [PATCH] fs: Make /proc/sys inodes be owned by global root.
From: Radoslaw Burny
Date: Fri Nov 30 2018 - 08:46:40 EST
On Fri, Nov 30, 2018 at 2:09 AM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
> On Mon, Nov 26, 2018 at 11:29:40PM -0600, Eric W. Biederman wrote:
> > Luis Chamberlain <mcgrof@xxxxxxxxxx> writes:
> > > Thanks for the description of how to run into the issue described but
> > > is there also a practical use case today where this is happening? I ask
> > > as it would be good to know the severity of the issue in the real world
> > > today.
> >
> > People trying to run containers without a root user in the container.
> > It atypical but something doable.
>
> My question was if there are generic tools / propreitary tools which are
> doing this widely *today*. Or is this just a custom setup some folks
> use?
We will soon start using this setup at Google to harden our usage of CRIU.
There are some more details in my LPC presentation:
https://linuxplumbersconf.org/event/2/contributions/210/
Although I don't know of specific tools using this setup, there was a
kernel patch in 2017 to support such use case:
7c6d78148fa0 - prctl: Allow local CAP_SYS_ADMIN changing exe_file
So, perhaps Virtuozzo people use a similar setup too?
> > We spoke about this at LPC. And this is the correct behavioral change.
> >
> > The problem is there is a default value for i_uid and i_gid that is
> > correct in the general case. That default value is not corect for
> > sysctl, because proc is weird. As the sysctl permission check in
> > test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not
> > notice that i_uid and i_gid were being set wrong.
> >
> > So all this patch does is fix the default values i_uid and i_gid.
> >
> > The commit comment seems worth cleaning up. But for the
> > content of the code.
>
> The logic seems sensible then, but are we implicating what a container
> does with its sysctl values onto the entire system? If so, sure, it
> seems you want this for networking purposes as there are a series of
> sysctl values a container may want to muck with, but are we sure we
> want the same for *all* sysctl entries?
The point is that these sysctls do not affect the whole system, just
an appropriate namespace.
For example, IPC-related files (e.g. shmmax) will always affect
writing process's UTS namespace, regardless of /proc mountpoint that
is used to access them:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/ipc_sysctl.c?h=v4.20-rc4#n24
I presume the net-related sysctls that Eric was referring to have a
similar behavior.
>
> Luis
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature