Re: [PATCH RFC] allow some kernel filesystems to be mounted in auser namespace

From: Serge E. Hallyn
Date: Tue Jul 16 2013 - 18:23:14 EST


Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
> On Tue, Jul 16, 2013 at 3:03 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote:
> > Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
> >> On Tue, Jul 16, 2013 at 2:37 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote:
> >> > Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
> >> >> On 07/16/2013 12:50 PM, Serge E. Hallyn wrote:
> >> >> > Quoting Al Viro (viro@xxxxxxxxxxxxxxxxxx):
> >> >> >> On Tue, Jul 16, 2013 at 02:29:20PM -0500, Serge Hallyn wrote:
> >> >> >>> All the files will be owned by host root, so there's no security
> >> >> >>> concern in allowing this.
> >> >> >>
> >> >> >> Files owned by root != very bad things can't be done by non-root.
> >> >> >> Especially for debugfs, which is very much a "don't even think about
> >> >> >> mounting that on a production box" thing...
> >> >> >
> >> >> > I would prefer it not be mounted. But near as I can tell there
> >> >> > should be no regression security-wise whether an unprivileged
> >> >> > user on the host has access to it, or whether a user in a
> >> >> > non-init user ns is allowed to mount it. (Obviously I could very
> >> >> > well be wrong)
> >> >>
> >> >> I would argue that either (a) debugfs denies everything to non-root, so
> >> >> mounting it in a (rootless) userns is useless or (b) it doesn't, in
> >> >> which case it's dangerous.
> >> >>
> >> >> In neither case does it make sense to me to allow the mount.
> >> >
> >> > It makes sense from the POV of having sane user-space. I can obviously
> >> > work around this by tweaking a stock container rootfs to be different
> >> > from a stock host rootfs. It is undesirable.
> >> >
> >> > For debug and fusectl there is another option which I'm happy to
> >> > pursue, namely tweaking how mountall handles 'nofail' to ignore these
> >> > errors.
> >>
> >> I don't know enough about fuse to know whether it should work in a
> >> container, but presumably the fusectl FS needs to be aware of userns
> >
> > Again it's not about working - we actually don't (through LSM) allow
> > writes under any of them anyway. It's about containers and
> > non-containers having similar boot sequences when possible.
>
> I, and many other people, run kernel.org kernels with LSM disabled.
> userns defaults to on, and that configuration needs to be secure.

My point was just that not being able to write under those mounts will
not break the containers. I'm not saying it would be ok to push this
patch is it did require an LSM to be safe.

> >> mappings for it to work right. But ISTM it would be better for
> >> containers to be smart enough to keep going if debugfs fails to mount
> >
> > "smart enough" in this case means finding ways to figure out information
> > that it wouldn't otherwise need, and the form of which could at some point
> > change, and generally just increases the future potential fragility.
>
> Presumably this is as simple as making 'mountall' report success if
> nofail is set and mount returns -EPERM.
>
> That being said, it would probably be okay to modify debugfs to detect
> that it's in a nonroot userns and show up empty when mounted.

That'd obviously work for containers.

> > Well, to be fair that's again really referring to the securityfs one.
> > Basically solving that would require teaching mountall to parse
> > /proc/self/uid_map to decide its namespace.
>
> Huh?

I don't think it's going to be ok to have mountall proceed on
real hosts with /sys/kernel/security not mounted, risking the expected
security policy *quietly* not being setup on hosts.

That's why I consider it better and safer to simply allow the
securityfs mount.

> >> -- this really seems like a userspace problem that ought to be fixed
> >> in userspace.
> >
> >> > But for /sys/kernel/security, the failure of which to mount on a
> >> > non-container can be a real problem, that is not good enough. So
> >> > at least I'd like securityfs to be mountable in a non-init userns.
> >> >
> >>
> >> Will the container work if /sys/kernel/security is inaccessible even to "root"?
> >
> > Yes. As it is they're actually not allowed to write under there (by
> > LSM). Containers start fine for me with these three mounted this way.
> >
>
> At least for securityfs, relying on LSM is legit.

I'm not "relying on LSM" to make these safe. I'm relying on the
uid mappings to make these safe.

Nevertheless I at least have hope of working around the others (in a
distro-acceptable way), so if the others are too scary I'll pursue
the workaround for the others and see where I get. But I really feel
the securityfs one is the best solution.

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/