Re: [PATCH] make TIOCSTI ioctl require CAP_SYS_ADMIN

From: Matt Brown
Date: Fri Apr 21 2017 - 01:10:46 EST


On 04/20/2017 01:41 PM, Serge E. Hallyn wrote:
Quoting matt@xxxxxxxxx (matt@xxxxxxxxx):
On 2017-04-20 11:19, Serge E. Hallyn wrote:
Quoting Matt Brown (matt@xxxxxxxxx):
On 04/19/2017 07:53 PM, Serge E. Hallyn wrote:
Quoting Matt Brown (matt@xxxxxxxxx):
On 04/19/2017 12:58 AM, Serge E. Hallyn wrote:
On Tue, Apr 18, 2017 at 11:45:26PM -0400, Matt Brown wrote:
This patch reproduces GRKERNSEC_HARDEN_TTY functionality from the grsecurity
project in-kernel.

This will create the Kconfig SECURITY_TIOCSTI_RESTRICT and the corresponding
sysctl kernel.tiocsti_restrict that, when activated, restrict all TIOCSTI
ioctl calls from non CAP_SYS_ADMIN users.

Possible effects on userland:

There could be a few user programs that would be effected by this
change.
See: <https://codesearch.debian.net/search?q=ioctl%5C%28.*TIOCSTI>
notable programs are: agetty, csh, xemacs and tcsh

However, I still believe that this change is worth it given that the
Kconfig defaults to n. This will be a feature that is turned on for the

It's not worthless, but note that for instance before this was fixed
in lxc, this patch would not have helped with escapes from privileged
containers.


I assume you are talking about this CVE:
https://bugzilla.redhat.com/show_bug.cgi?id=1411256

In retrospect, is there any way that an escape from a privileged
container with the this bug could have been prevented?

I don't know, that's what I was probing for. Detecting that the pgrp
or session - heck, the pid namespace - has changed would seem like a
good indicator that it shouldn't be able to push.


pgrp and session won't do because in the case we are discussing
current->signal->tty is the same as tty.

This is the current check that is already in place:
| if ((current->signal->tty != tty) && !capable(CAP_SYS_ADMIN))
| return -EPERM;

Yeah...

The only thing I could find to detect the tty message coming from a
container is as follows:
| task_active_pid_ns(current)->level

This will be zero when run on the host, but 1 when run inside a
container. However this is very much a hack and could probably break
some userland stuff where there are multiple levels of namespaces.

Yes. This is also however why I don't like the current patch, because
capable() will never be true in a container, so nested containers
break.


What do you mean by "capable() will never be true in a container"?
My understanding
is that if a container is given CAP_SYS_ADMIN then
capable(CAP_SYS_ADMIN) will return
true?

No, capable(X) checks for X with respect to the initial user namespace.
So for root-owned containers it will be true, but containers running in
non-initial user namespaces cannot pass that check.

To check for privilege with respect to another user namespace, you need
to use ns_capable. But for that you need a user_ns to target.


How about: ns_capable(current_user_ns(),CAP_SYS_ADMIN) ?

current_user_ns() was found in include/linux/cred.h

I agree the hack I mentioned above would be a bad idea because
it would break
nested containers, but the current patch would not IMO.

A better version of the hack could involve a config
CONFIG_TIOCSTI_MAX_NS_LEVEL where
a check would be performed to ensure that
task_active_pid_ns(current)->level is not
greater than the config value(an integer that is >= 0) .

Yeah. That would break a different set of cases than the capable
check, I assume. A smaller set, I think.

Again, I think we both would agree that this is not the best
solution. The clear
downside is that you could have multiple container layers where the
desired security
boundaries happened to fall at different levels. Just throwing ideas
around.

Yup, appreciated.