user namespace and fully visible proc and sys mounts

From: Serge E. Hallyn
Date: Sun Mar 06 2016 - 03:28:35 EST


Hi,

So we've been over this many times... but unfortunately there is more
breakage to report. Regular privileged and unprivileged containers
work all right for us. But running an unprivileged container inside a
privileged container is blocked.

When creating privileged containers, lxc by default does a few things:
it mounts some fuse.lxcfs files over procfiles include /proc/meminfo and
/proc/uptime. It mounts proc rw but /proc/sysrq-trigger ro as well as
moves /proc/sys/net out of the way, bind-mounts /proc/sys readonly
(because this container is not in a user namespace) then moves
/proc/sys/net back. Finally it mounts sys ro but bind-mounts
/sys/devices/virtual/net as writeable.

If any of these are left enabled, unprivileged containers can't be
started. If all are disabled, then they can be.

Can we find a way to make these not block remounts in child user
namespaces? A boot flag, a procfs and sysfs mount option, a sysctl?

-serge