All this mount stuff and how it really works.

From: Alexander Viro (viro@math.psu.edu)
Date: Sat Jun 03 2000 - 08:38:58 EST


[apologies for obese Cc:]

On Wed, 31 May 2000, Giuliano Pochini wrote:

> Olivier Galibert wrote:
> >
> > On Tue, May 30, 2000 at 09:46:13AM -0500, Chad Schwartz wrote:
> > > You can allow a person to mount the same fs in unlimited places. thats
> > > just fine.
> > >
> > > But do *NOT* allow a mount on top of an already mounted filesystem.
>
> Hmm, what happens if I
>
>
> mount /dev/sda1 /mnt
> mount /dev/sda1 /mnt/mnt
> umount /mnt

-EBUSY from umount(2), what did you expect? The picture looks so:
1. You have a superblock + dentry tree for whatever you have as root (say
it, v7fs on /dev/fd0 - just to make it weird).
2. You have a superblock + dentry tree for whatever you have on /dev/sda
(say it, ext2).
3. You have a vfsmount tree of (at least) 3 elements (more than 3 if you
have /proc, /tmp or something else mounted, indeed). Let's call them M1,
M2 and M3. Their contents:
                M1 M2 M3
mnt_parent M1 M1 M2
mnt_sb sb from fd0 sb from sda1 sb from sda1
mnt_root root from fd0 root from sda1 root from sda1
mnt_mountpoint root from fd0 mnt from fd0 mnt from sda1

dentry of root from fd0 has an empty d_vfsmnt
dentry of mnt from fd0 has a non-empty d_vmfsnt - it has one element and
that's M2.
dentry of root from sda1 has an empty d_vfsmnt
dentry of mnt from sda1 has a non-empty d_vfsmnt - it has one element and
that's M3.

let's see what /mnt/mnt/foo is:
/ is (M1, root from fd0)
Let's go to /mnt.
        It would be (M1, mnt from fd0). It has a non-empty d_vfsmnt, so
        that's a mountpoint. Let's see if there is an element with
        ->mnt_parent == M1. Yup, there is and that's M2. We are in
        (M2,M2->mnt_root), aka. (M2, root from sda1). Since the root from
        sda1 has empty ->d_vfsmnt we are done -
/mnt is (M2, root from sda1)
Let's go to /mnt/mnt.
        It would be (M2, mnt from sda1). It has a non-empty d_vfsmnt, so
        that's a mountpoint. Let's see if there is an element with
        ->mnt_parent == M2. Yup, there is and that's M3. We are in
        (M3,M3->mnt_root), aka. (M3, root from sda1). Since the root from
        sda1 has empty ->d_vfsmnt we are done -
/mnt/mnt is (M3, root from sda1).
Let's go to /mnt/mnt/foo.
        It's (M3, foo from sda1). Assuming that we didn't mount anything
        there that's it - we are done.
/mnt/mnt/foo is (M3, foo from sda1).

Now, if you would go for /mnt/mnt/mnt everything would be the same except
the last step. There we would have
/mnt/mnt is (M3, root from sda1).
We go to /mnt/mnt/mnt.
        It's (M3, mnt from sda1). d_vfsmnt is non-empty, all right, but
        none of the elements has ->mnt_parent == M3. So we are done and
/mnt/mnt/mnt is (M3, mnt from sda1).

See how it works? vfsmount describes both the linkage (we are here, so we
ought to jump here) and the piece of unified tree. It _refers_ to the part
of relevant dentry tree. When you are resolving a name you watch for the
mountpoints (just as you would do normally) and when you reach one you
are jumping at the right place - it's stored in the vfsmount you've ran
into. The only real difference is that we don't store that linkage in
dentry (otherwise we couldn't mount the same thing in several places at
once) and we have to keep in mind that the same dentry may be visible in
different parts of unified tree, some of them used as mountpoints, some
not. Since all these places are distinguishable by vfsmount (they _are_ in
different chunks) we can determine the right place to go by checking the
->mnt_parent of potential candidates - vfsmouts with ->mnt_mountpoint
pointing to our dentry. So we keep them in the cyclic list anchored in
our dentry and normally (when non-empty at all) this list has only one
element, unless you have something like sda1 mounted on /foo and /bar
_and_ both /foo/baz and /bar/baz used as mountpoints.

When you umount the thing you check whether it's busy, indeed. And M2
(vfsmount of /mnt) definitely is busy - it's a parent of M3. So umount
will fail - nothing changes from the fact that you have the same fs
mounted under /mnt/mnt, it might as well be qnx4 from /dev/xda8.

Notice that umount /mnt/mnt _will_ work unless you have something opened
under /mnt/mnt. If you will open the /mnt/foo you'll get a file with
->f_dentry equal to dentry of foo from sda1 and ->f_vfsmnt == M2. If you
open the same file as /mnt/mnt/foo you'll get the same ->f_dentry (so all
IO, etc. will work as it should), but ->f_vfsmnt will be M3. So unlike the
/mnt/foo it will keep /mnt/mnt busy.

Crossing mountpoints in opposite direction (to root) is even easier - you
want to go into the parent and notice that you stand in (mnt,mnt->mnt_root)
Fine, so you actually need the parent of mountpoint, i.e. parent of
(mnt->mnt_parent, mnt->mnt_mountpoint).

With this kind of data structures we can trivially afford both mounting
a filesystem in several places and mounting over the root - there is
nothing special about these cases. They _used_ to be different with the
old data structures, but that's it. See how the binding is done? We just
add a new vfsmount with (->mnt_parent,->mnt_mountpoint) pointing to
mountpoint and ->mnt_root pointing to the thing we are binding to that
place. It doesn't have to be a root of the dentry tree it belongs to -
anything will go.

        That's what it is and that's how it works. It obviously avoids all
problems with dcache coherency, no matter how many times do you mount the
same fs - there is only one dentry tree for that fs anyway. It also in
principle permits different namespaces for different processes - just let
them have separate vfsmount trees. Even if they mount the same filesystems
- no problem, from the kernel POV it's exactly the same as if that
filesystem had been mounted in several places.
                                                HTH.
                                                        Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:17 EST