Re: Duplicate inode number when mount --bind some directories to same mountpoint. (from Fedora18 to 4.10-rc3)

From: Al Viro
Date: Thu Jan 12 2017 - 22:27:03 EST


On Fri, Jan 13, 2017 at 10:40:08AM +0900, Nakajima Akira wrote:
> On 2017/01/12 19:24, Al Viro wrote:
> > On Thu, Jan 12, 2017 at 06:16:35PM +0900, Nakajima Akira wrote:
> > > Bug:
> > > Duplicate inode number when mount --bind some directories to same
> > > mountpoint. (from Fedora18 to 4.10-rc3)
> > > Fedora17 and earlier works correctly.
> >
> > Explain, please. "Duplicate inode number" between what and what?
>
> Duplicate inode number between mounted directories.
>
> Example)
> # cd /home
> # mkdir a b
> # ls -i
> 100 a 999 b
> # mount --bind a /mnt
> # mount --bind b /mnt
> # ls -i
> 999 a 999 b
>
> Inode number of directory "a" is changed to "b".
> Then we see directory "b" when ls "a".

61 0 252:1 / / rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered

Root, marked shared (peer group 1). /home is not a mountpoint, /mnt
wasn't one until your mounts (i.e. both are within the same mount as /).

Since /home/a is a subtree of a shared mount, any clone of it will, by
default, join the same peer group. Which means that binding it on /mnt
results in

116 61 252:1 /home/a /mnt rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered

i.e. ext4[vda1]home/a being mounted on /mnt and marked peer of root mount.
Accordingly, any mount/umount event in either will be duplicated to all
peers (provided that they contain a counterpart of affected mountpoint).
In particular, binding /home/b on /mnt (i.e. on top of ext4[vda1]home/mnt)
propagates to the corresponding points in all peers - including the root
mount, where it corresponds to /home/a. Result:

120 116 252:1 /home/b /mnt rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered
121 61 252:1 /home/b /home/a rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered

The same tree (ext4[vda1]home/b) is mounted on root in mount 116
(i.e. the thing found on /mnt) and on /home/a in mount 61 (i.e. /home/a).

Since /home/b is on a shared mount, both clones are put in the same peer
group (i.e. the same group 1).

You asked for it, you've got it... Well, fedora folks did, actually.
I'm none too fond of their default setup (root made shared), but that has
nothing to do with the kernel. Userland (systemd, as far as I can tell)
is setting the things up that way, and it's even documented in fedora
release notes... Kernel mechanisms involved in that had been there for
a long time and they are also documented (man 2 mount, look for MS_SHARED
and related flags in there).

Take it up with fedora folks...