Re: [PATCH] devpts: Make each mount of devpts an independent filesystem.

From: Eric W. Biederman
Date: Wed Apr 20 2016 - 11:07:37 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Tue, Apr 19, 2016 at 9:36 PM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote:
>> On Wed, Apr 20, 2016 at 6:04 AM, Eric W. Biederman
>>>
>>> The kernel.pty.reserve sysctl is neutered with no way currently
>>> implemented to be able to use the reserved ptys.
>>
>> I think we could convert this into reserve for init user namespace,
>> ssh in host will work even if containers eaten all ptys.
>
> Yes. That's basically how it effectively worked before (ie everything
> but the initial non-newinstance devpts mount would be limited to the
> non-reserved numbers).
>
> We required the non-init namespaces to do a newinstance mount, so the
> whole test for "newinstance" was effectively the same thing as just
> checking for the init namespace from a security standpoint.
>
> And in fact, rewriting it in that form (ie checking for init_ns) would
> just make it much more obvious what the intent it.

How does this sound.

When mounting a devpts filesystem. We look at the caller (aka current)
and if we are in the initial mount namespace set a flag in fsi that
allows that instance of devpts to draw into the reserve pool.

That will still allow crazy pieces of code like xen-create-instance run
by root that mount a devpts filesystem in a chroot environment to draw
into the reserved pool, but any sane users that set up their own mount
namespace won't be able to user the reserve pool.

I believe that will give an almost identical policy to what we have
today, and it certainly makes a good default test for a container. Just
for cleanliness containers (of anyone's definition) almost always use
mount namespaces instead of chroots.

Sigh one last past through all of the distros, to confirm that this
works in practice.

Eric