[PATCH 01/16] devpts: Attempting to get it right
From: Eric W. Biederman
Date: Fri Apr 15 2016 - 11:44:56 EST
To recap the situation for those who have not been following closely.
There are programs such as xen-create-image that run as root and setup
a chroot environment with:
"mknod dev/ptmx c 5 2"
"mkdir dev/pts"
"mount -t devpts none dev/pts"
Which mostly works but stomps the mount options of the system /dev/pts.
In particular the options of "gid=5,mode=620" are lost resulting in a
situation where creating a new pty by opening /dev/ptmx results in
that pty having the wrong permissions.
Some distributions have been working around this problem by continuing
to install a setuid root pt_chown binary that will be called by glibc
to fix the permissions.
Maintaining a setuid root pt_chown binary is not too scary until
multiple instances of devpts are considered at which point it becomes
possible to trick the setuid root pt_chown binary into operating on the
wrong files and directories. Leading to all of the things one might
fear when a setuid root binary goes wrong.
The following patchset digs us out of that hole by carefully devpts and
/dev/ptmx in a way that does not introduce any userspace regressions,
while making each mount of devpts distinct (so pt_chown is unnecessary)
and arranging things so that enough information is available so
that a secure pt_chown binary is possible to write if that is ever
needed.
The approach I have chosen to take is to first enhance the /dev/ptmx
device node to automount /dev/pts/ptmx on top of it. This leads to a
simple high performance solution that allows applications such as
xen-create-image (that call "mknod ptmx c 5 2" and mount devpts)
to continue to run as before even when they are given a non-system
instance of devpts.
Using automountic bind mounts of /dev/pts/ptmx results in no new
security cases to consider as this can already be done, and actually
results in a simplification of the analysis of the code. As now all
opens of ptmx are of /dev/pts/ptmx. /dev/ptmx is now just a magic
mountpoint that does the right thing.
Allowing each mount of devpts to be distinct is also a bit interesting
as there is a concept in the code of the primary system devpts instance.
/dev/ptmx automounts the primary system instance of devpts if can not
find an appropriate devpts instance by path lookup. The sysctl
sys.kernel.pty.max is a global maximum of the number of ptys in the
system with sys.kernel.pty.reserve the number of those ptys reserved
exclusively for the system instance of devpts. Overmounting the system
instance of devpts with itself is expected to fail but update the devpts
mount options anyway.
In my testing I have found pieces of code that depend or at least appear
to depend on all of these propeties.
The particular challenge in all of this have been distro's that mount
devpts in initial ram disks, and then mount devpts again during regular
system startup. It took a little bit of careful arranging to ensure
that it is the system instance of devpts that always winds up on
/dev/pts for all distros. CentOS5 and CentOS6 were particularly
challenging examples.
To look for surprising userspace behavior I have attempted to test this
patchset on a representative sample of linux distributions. The
distributions I managed to setup and test in vms are: on openwrt-15.05,
centos5, centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2,
ubuntu-14.04.3, ubuntu-15.10, fedora23, magia-5, mint-17.3,
opensuse-42.1, slackware-14.1, gentoo-20151225 (13.0?),
archlinux-2015-12-01.
I wanted to test Android (as it is one of the most unique linux
distributions) but I could not find a freely available image that was
easy to get going in a VM, so I audited the Android code instead.
Android has a daemon that is responsible for everything under /dev
that listens on for netlink device events, consultis it's policy data
base and if the Android policy allows creates the device node in a
tmpfs instance mounted on /dev with the attributes specified by policy.
Furthermore at system startup this daemon mounts devpts exactly once,
which thankfully presents no interesting challenges.
I have also run xen-create-image on debian 8.2 (where it was easily
installed with apt-get) and confirmed that without these changes it
stomps the mount options of devpts and with these changes it only uses
atypical mount options on a separate instance of devpts.
The current technique of automounting /dev/pts/ptmx onto /dev/ptmx
results in the best userspace semantics and the easiest to understand
and maintain kernel code that I have seen implemented or heard proposed
in this discussion, as semantically and in the implementation each piece
is tasked with doing one thing.
Eric W. Biederman (16):
devpts: Use the same default mode for both /dev/ptmx and dev/pts/ptmx
devpts: Set the proper fops for /dev/pts/ptmx
vfs: Implement vfs_loopback_mount
devpts: Teach /dev/ptmx to automount the appropriate devpts via path lookup
vfs: Allow unlink, and rename on expirable file mounts
devpts: More obvious check for the system devpts in pty allocation
devpts: Cleanup newinstance parsing
devpts: Stop rolling devpts_remount by hand in devpts_mount
devpts: Fail early (if appropriate) on overmount
devpts: Move parse_mount_options into fill_super
devpts: Make devpts_kill_sb safe if fsi is NULL
devpts: Move the creation of /dev/pts/ptmx into fill_super
devpts: Simplify devpts_mount by using mount_nodev
vfs: Implement mount_super_once
devpts: Always return a distinct instance when mounting
devpts: Kill the DEVPTS_MULTIPLE_INSTANCE config option
Documentation/filesystems/devpts.txt | 122 +++++++-----------
drivers/tty/Kconfig | 11 --
drivers/tty/pty.c | 2 +-
drivers/tty/tty_io.c | 5 +-
fs/devpts/inode.c | 234 +++++++++++++++++++----------------
fs/inode.c | 3 +
fs/namei.c | 83 +++++++++++--
fs/namespace.c | 25 +++-
fs/super.c | 34 +++++
include/linux/devpts_fs.h | 18 +++
include/linux/fs.h | 3 +
include/linux/mount.h | 1 +
include/linux/namei.h | 2 +
13 files changed, 330 insertions(+), 213 deletions(-)
This code is also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git devpts-for-testing
Eric