Re: [PATCH] vfs namespace: Don't assume mount namespace has validroot

From: Al Viro
Date: Wed Mar 14 2012 - 16:58:12 EST


On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote:
> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
> > This patch fixes the assumption that a mnt namespace will always have a valid
> > root object.
>
> It's not an assumption, it's an invariant that should hold unless you have
> run into a bug somewhere.
>
> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
> There are only two callers - dup_mnt_namespace() and create_mnt_ns().
> The latter will assign non-NULL vfsmount to ->root or die NULL pointer
> dereference in
> mnt->mnt_ns = new_ns;
> The former will either assign non-NULL to ->root or kfree() mnt_namespace
> before anyone can see it.
>
> And nothing should modify ->root after that assignment for as long as
> the instance of struct mnt_namespace is allocated.
>
> Mind explaining how have you managed to get mnt_namespace with NULL ->root
> passed to dup_mnt_ns()?

As the matter of fact, all of the above is easily verified; very few places
see the internals of struct mnt_namespace (defined in fs/mount.h, included
only in fs/dcache.c, fs/fhandle.c, fs/namei.c,
fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c},
and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and
fs/proc_namespace.c). Nothing else knows what size the damn thing is,
nevermind the actual layout.

The lifetime rules are also simple and easy to verify: it's a plain refcount.
No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns().
As far as the outside cares,
nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always
returns a new reference, either to exisiting instance or to freshly allocated
one. put_mnt_ns() needs to be called on the other side...
opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount
for as long as it's opened.

This is it; we actually end up acquiring one reference too many to initial
mnt_namespace, but that's not going to cause such effect.

The structure containing struct vfsmount has a pointer to struct mnt_namespace
in it. That one is protected by namespace_sem, does *not* contribute to
refcount and is cleared when vfsmount (OK, struct mount containing it)
gets removed from namespace's ->list and set when it gets placed there.
Anything still on mnt_ns->list will get kicked out of there by umount_tree()
call from put_mnt_ns(), so those references can't outlive the instance
they point to.

And that's all pointers to mnt_namespace that ever exist, aside of function
arguments and local variables. I'm not saying that I couldn't have possibly
fucked it up, but from rereading that code it doesn't look like we could
end up with dangling pointers to already freed instances...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/