Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
From: Serge Hallyn
Date: Fri Aug 15 2014 - 15:06:04 EST
Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
> Currently, creating a new mount (as opposed to bindmount) in a
> non-root userns will implicitly set nodev unless the fs is devpts.
> Something like this will be necessary for file systems that allow
> the mounter to create device nodes without using mknod (e.g. FUSE
> if/when that is allowed), but none of the currently allowed
> filesystems do this.
Hi,
Sorry, I'm probably thinking stupidly, but I don't see this restriction
being the case
serge@sl:~$ mount | grep tmp
[...]
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
serge@sl:~$ sudo mknod /run/kvm c 10 232
[sudo] password for serge:
serge@sl:~$ echo $?
0
serge@sl:~$ ls -l /run/kvm
crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
But you seem to be saying I shouldn't be allowed to create a device inside
a tmpfs. What am I overlooking?
> Implicitly adding nodev is problematic, though. It will make it
> unsafe to ever remove the implicit addition, since userspace might
> start to rely on it.
>
> This fixes a minor regression in:
>
> 9566d6742852 mnt: Correct permission checks in do_remount
>
> Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
> there is existing user code that creates a new mount in a userns
> without MS_NODEV and then expects a remount with matching options to
> work. That commit broke code that does this.
>
> Fortunately, since the implicit nodev has no effect on any existing
> filesystems, we can still safely remove it.
>
> This replaces the implicit nodev with an explicit nodev requirement:
> anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
> -EPERM unless they set nodev. If they set nodev, that setting will
> be locked.
>
> As an added benefit, if anything like device namespaces is ever
> added, then user code will be able to opt out of nodev by clearing
> nodev.
>
> To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
> FS_USERNS_MOUNT filesystems. All of the current filesystems with
> FS_USERNS_MOUNT set are safe.
>
> I confirmed that this is compatible with Sandstorm's revision
> 1bf0c4847b. That revision of Sandstorm does not work without this
> fix if 9566d6742852 is applied.
>
> Cc: Kenton Varda <kenton@xxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> ---
> fs/namespace.c | 16 ++++++++++++----
> fs/proc/root.c | 2 +-
> fs/ramfs/inode.c | 2 +-
> fs/sysfs/mount.c | 2 +-
> ipc/mqueue.c | 2 +-
> mm/shmem.c | 4 ++--
> 6 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0acabea58319..835fa9e8307e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
> put_filesystem(type);
> return -EPERM;
> }
> - /* Only in special cases allow devices from mounts
> - * created outside the initial user namespace.
> +
> + /*
> + * If a filesystem might allow the mounter to put
> + * device nodes on it without the checks in mknod,
> + * then require MS_NODEV to mount it.
> */
> if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> - flags |= MS_NODEV;
> - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> + if (!(mnt_flags & MNT_NODEV)) {
> + put_filesystem(type);
> + return -EPERM;
> + }
> +
> + /* Do not allow nodev to be cleared. */
> + mnt_flags |= MNT_LOCK_NODEV;
> }
> }
>
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 094e44d4a6be..2313b280729e 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
> .name = "proc",
> .mount = proc_mount,
> .kill_sb = proc_kill_sb,
> - .fs_flags = FS_USERNS_MOUNT,
> + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
> };
>
> void __init proc_root_init(void)
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index d365b1c4eb3c..b95b7302d4cc 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
> .name = "ramfs",
> .mount = ramfs_mount,
> .kill_sb = ramfs_kill_sb,
> - .fs_flags = FS_USERNS_MOUNT,
> + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
> };
>
> int __init init_ramfs_fs(void)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 8a49486bf30c..56ba59317e24 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
> .name = "sysfs",
> .mount = sysfs_mount,
> .kill_sb = sysfs_kill_sb,
> - .fs_flags = FS_USERNS_MOUNT,
> + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
> };
>
> int __init sysfs_init(void)
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4fcf39af1776..56abbc848d4c 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
> .name = "mqueue",
> .mount = mqueue_mount,
> .kill_sb = kill_litter_super,
> - .fs_flags = FS_USERNS_MOUNT,
> + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
> };
>
> int mq_init_ns(struct ipc_namespace *ns)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a42add14331c..f4a708a8f9e3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
> .name = "tmpfs",
> .mount = shmem_mount,
> .kill_sb = kill_litter_super,
> - .fs_flags = FS_USERNS_MOUNT,
> + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
> };
>
> int __init shmem_init(void)
> @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
> .name = "tmpfs",
> .mount = ramfs_mount,
> .kill_sb = kill_litter_super,
> - .fs_flags = FS_USERNS_MOUNT,
> + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
> };
>
> int __init shmem_init(void)
> --
> 1.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/