Re: [PATCH] netns: dangling pointer on netns bind mount point.

From: Al Viro
Date: Mon Apr 06 2020 - 23:06:56 EST


On Tue, Apr 07, 2020 at 11:35:46AM +0900, Levi wrote:
> When we try to bind mount on network namespace (ex) /proc/{pid}/ns/net,
> inode's private data can have dangling pointer to net_namespace that was
> already freed in below case.
>
> 1. Forking the process.
> 2. [PARENT] Waiting the Child till the end.
> 3. [CHILD] call unshare for creating new network namespace
> 4. [CHILD] Bind mount with /proc/self/ns/net to some mount point.
> 5. [CHILD] Exit child.
> 6. [PARENT] Try to setns with binded mount point
>
> In step 5, net_namespace made by child process'll be freed,
> But in bind mount point, it still held the pointer to net_namespace made
> by child process.
> In this situation, when parent try to call "setns" systemcall with the
> bind mount point, parent process try to access to freed memory, That
> makes memory corruption.
>
> This patch fix the above scenario by increaseing reference count.

This can't be the right fix.

> +#ifdef CONFIG_NET_NS
> + if (!(flag & CL_COPY_MNT_NS_FILE) && is_net_ns_file(root)) {
> + ns = get_proc_ns(d_inode(root));
> + if (ns == NULL || ns->ops->type != CLONE_NEWNET) {
> + err = -EINVAL;
> +
> + goto out_free;
> + }
> +
> + net = to_net_ns(ns);
> + net = get_net(net);

No. This is completely wrong. You have each struct mount pointing to
that sucker to grab an extra reference on an object; you do *NOT* drop
said reference when struct mount is destroyed. You are papering over
a dangling pointer of some sort by introducing a trivially exploitable
leak that happens to hit your scenario.

Hell, do (your step 4 + umount your mountpoint) in a loop, then watch what
happens to refcounts with that patch.

This is bollocks; the reference is *NOT* in struct mount. At all.
It's not even in struct dentry. What it's attached to is struct
inode and it should be pinned as long as that inode is alive -
it's dropped in nsfs_evict(). And if _that_ gets called while
dentry is still pinned (as ->mnt_root of something), you have
much worse problems.

Could you post a reproducer, preferably one that would trigger an oops
on mainline?