Re: [GIT PULL] namespace updates for v3.17-rc1
From: Eric W. Biederman
Date: Thu Aug 21 2014 - 00:58:43 EST
Richard Weinberger <richard.weinberger@xxxxxxxxx> writes:
> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> This commit breaks libvirt-lxc.
> libvirt does in lxcContainerMountBasicFS():
The bugs fixed are security issues, so if we have to break a small
number of userspace applications we will. Anything that we can
reasonably do to avoid regressions will be done.
Could you please look at my user-namespace.git#for-next branch I have a
fix for at least one regresion causing issue in there. I think it may
fix your issues but I am not fully certain more comments below.
> /*
> * We can't immediately set the MS_RDONLY flag when mounting filesystems
> * because (in at least some kernel versions) this will propagate back
> * to the original mount in the host OS, turning it readonly too. Thus
> * we mount the filesystem in read-write mode initially, and then do a
> * separate read-only bind mount on top of that.
> */
> bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>
> VIR_DEBUG("Mount %s on %s type=%s flags=%x",
> mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
> ~MS_RDONLY, NULL) < 0) {
>
> ^^^^ Here it fails for sysfs because with user namespaces we bind the
> existing /sys into the container
> and would have to read out all existing mount flags from the current /sys mount.
> Otherwise mount() fails with EPERM.
> On my test system /sys is mounted with
> "rw,nosuid,nodev,noexec,relatime" and libvirt
> misses the realtime...
Not specifying any atime flags to mount should be safe as that asks for
the default atime flags which for remount I have made the default atime
flags the existing atime flags. So I am scratching my head a little on
this one.
>
> virReportSystemError(errno,
> _("Failed to mount %s on %s type %s flags=%x"),
> mnt_src, mnt->dst, NULLSTR(mnt->type),
> mnt_mflags & ~MS_RDONLY);
> goto cleanup;
> }
>
> if (bindOverReadonly &&
> mount(mnt_src, mnt->dst, NULL,
> MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>
> ^^^ Here it fails because now we'd have to specify all flags as used
> for the first
> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
> See lxcBasicMounts[].
> In this case the fix is easy, add mnt_mflags to the mount flags.
That has always been a bug in general because remount has always
required specifying the complete set of mount flags you want to have.
That fact that flags such as nosuid are now properly locked so you can
not change them if you are not the global root user just makes this
obvious.
Andy Lutermorski has observed that statvfs will return the mount flags
making reading them simple.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/