Re: Duplicate inode number when mount --bind some directories to same mountpoint. (from Fedora18 to 4.10-rc3)

From: Nakajima Akira
Date: Thu Jan 12 2017 - 20:40:14 EST


On 2017/01/12 19:24, Al Viro wrote:
On Thu, Jan 12, 2017 at 06:16:35PM +0900, Nakajima Akira wrote:
Bug:
Duplicate inode number when mount --bind some directories to same
mountpoint. (from Fedora18 to 4.10-rc3)
Fedora17 and earlier works correctly.

Explain, please. "Duplicate inode number" between what and what?

Duplicate inode number between mounted directories.

Example)
# cd /home
# mkdir a b
# ls -i
100 a 999 b
# mount --bind a /mnt
# mount --bind b /mnt
# ls -i
999 a 999 b

Inode number of directory "a" is changed to "b".
Then we see directory "b" when ls "a".


And,
Above kernel ver 3.6 (Fedora18 including 4.10-rc3) creates many structs of
mount than ver 3.3 (Fedora17).
Is this a correct specification?
Looks kernel creates same many structs of mount.

alloc_vfsmnt() and clone_mnt() are internal functions, no promises of
stability had ever been given... As for the differences between these
setups... almost certainly an effect of changed shared-subtree settings.
Userland, not kernel.


Systemtap script result on Fedora25
Kernel create many structs of mount.
And, inode number of "a" changes to 547586 of "b".

What I would like to see is the contents of /proc/self/mountinfo -
systemtap be damned, there is a sane interface for getting the
mount tree setup. BTW, what's in that /root/mnt.stp thing?

/root/mnt.stp is following.

In result of script,
Kernel creates many same structs of mount, It looks waste of memory.
But I don't know whether it is correct specification or not.

================================================================
# cat /root/mnt.stp
#! /usr/bin/stap

probe kernel.function("alloc_vfsmnt").return {
printf("%s() new_mnt:%p\n", probefunc(), $return);
}

probe kernel.function("clone_mnt").return { // do_mount, copy_tree
name = @cast($return, "mount")->mnt_mountpoint->d_iname;
inode = @cast($return, "mount")->mnt_mountpoint->d_inode;
ino = @cast($return, "mount")->mnt_mountpoint->d_inode->i_ino;
printf("%s() mnt:%p d_iname:%s inode:%p ino:%u\n", probefunc(), $return, kernel_string(name), inode, ino);
}

================================================================
/proc/self/mountinfo is following

17 61 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw
18 61 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
19 61 0:6 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,size=2013132k,nr_inodes=503283,mode=755
20 17 0:18 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - securityfs securityfs rw
21 19 0:19 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw
22 19 0:20 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
23 61 0:21 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,mode=755
24 17 0:22 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs ro,mode=755
25 24 0:23 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
26 17 0:24 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - pstore pstore rw
27 24 0:25 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,cpu,cpuacct
28 24 0:26 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,blkio
29 24 0:27 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,perf_event
30 24 0:28 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,pids
31 24 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,freezer
32 24 0:30 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,net_cls,net_prio
33 24 0:31 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,cpuset
34 24 0:32 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,devices
35 24 0:33 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,hugetlb
36 24 0:34 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,memory
58 17 0:35 / /sys/kernel/config rw,relatime shared:21 - configfs configfs rw
61 0 252:1 / / rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered
16 19 0:15 / /dev/mqueue rw,relatime shared:23 - mqueue mqueue rw
37 19 0:16 / /dev/hugepages rw,relatime shared:24 - hugetlbfs hugetlbfs rw
38 61 0:37 / /tmp rw,nosuid,nodev shared:25 - tmpfs tmpfs rw
39 18 0:38 / /proc/sys/fs/binfmt_misc rw,relatime shared:26 - autofs systemd-1 rw,fd=38,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12648
40 17 0:7 / /sys/kernel/debug rw,relatime shared:27 - debugfs debugfs rw
72 61 0:39 / /var/lib/nfs/rpc_pipefs rw,relatime shared:28 - rpc_pipefs sunrpc rw
74 18 0:40 / /proc/fs/nfsd rw,relatime shared:29 - nfsd nfsd rw
111 23 0:41 / /run/user/0 rw,nosuid,nodev,relatime shared:64 - tmpfs tmpfs rw,size=404708k,mode=700
116 61 252:1 /home/a /mnt rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered
120 116 252:1 /home/b /mnt rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered
121 61 252:1 /home/b /home/a rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered