Re: Improving documentation of parent-ID field in /proc/PID/mountinfo

From: Miklos Szeredi
Date: Tue Nov 14 2017 - 11:16:30 EST


On Tue, Nov 14, 2017 at 8:08 AM, Michael Kerrisk (man-pages)
<mtk.manpages@xxxxxxxxx> wrote:
> Hi Miklos, Ram
>
> Thanks for your comments. A question below.
>
> On 13 November 2017 at 09:11, Miklos Szeredi <mszeredi@xxxxxxxxxx> wrote:
>> On Mon, Nov 13, 2017 at 8:55 AM, Ram Pai <linuxram@xxxxxxxxxx> wrote:
>>> On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) wrote:
>>>> Hello Ram,
>>>>
>>>> Long ago (2.6.29) you added the /proc/PID/mountinfo file and
>>>> associated documentation in Documentation/filesystems/proc.txt. Later,
>>>> I pasted much of that documentation into the proc(5) manual page.
>>>>
>>>> That documentation says of the second field in the file:
>>>>
>>>> [[
>>>> This file contains lines of the form:
>>>>
>>>> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
>>>> (1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11)
>>>>
>>>> (1) mount ID: unique identifier of the mount (may be reused after umount)
>>>> (2) parent ID: ID of parent (or of self for the top of the mount tree)
>>>> ...
>>>> ]]
>>>>
>>>> The last piece of the description of field (2) doesn't seem to be
>>>> correct, or is at least rather unclear. I take this to be saying that
>>>> that for the root mount point, /, field (2) will have the same value
>>>> as field (1). I never actually looked at this detail closely, but
>>>> Alexander pointed out that this is obviously not so, as one can
>>>> immediately verify:
>>>>
>>>> $ grep '/ / ' /proc/$$/mountinfo
>>>> 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order
>>>>
>>>> I dug around in the kernel source for a bit. I do not have an exact
>>>> handle on the details, but I can see roughly what is going on.
>>>> Internally, there seems to be one ("hidden") mount ID reserved to each
>>>> mount namespace, and that ID is the parent of the root mount point.
>>>>
>>>> Looking through the (4.14) kernel source, mount IDs are allocated by
>>>> mnt_alloc_id() (in fs/namespace.c), which is in turn called by
>>>> alloc_vfsmnt() which is in turn called by clone_mnt().
>>>>
>>>> A new mount namespace is created by the kernel function copy_mnt_ns()
>>>> (in fs/namespace.c, called by create_new_namespaces() in
>>>> kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
>>>> fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
>>>> The first of these is the call that creates the "hidden" mount ID that
>>>> becomes the parent of the root mount point. (I verified this by
>>>> instrumenting the kernel with a few printk() calls to display the
>>>> IDs.) The second place where copy_tree() calls clone_mnt() is in a
>>>> loop that replicates each of the mount points (including the root
>>>> mount point) in the source mount namespace.
>>>
>>> We used to report that mount, ones upon a time. Something has changed
>>> the behavior since then and its not reported any more, thus making it
>>> hidden.
>>
>> The hidden one is the initramfs, I believe. That's the root of the
>> mount namespace, and the when a namespace is cloned, the tree is
>> copied from the namespace root.
>>
>> It is "hidden" because no process has its root there. Note the
>> difference between namespace root and process root: the first is the
>> real root of the mount tree and is unchangeable, the second is
>> pointing to some place in a mount tree and can be changed (chroot).
>>
>> So there's nothing special in this rootfs, it is just hidden because
>> it's not the root of any task.
>>
>> The description of field (2) is correct, it just does not make it
>> clear what it means by "root".
>
> Sorry -- do you mean the old description is correct, or my new
> description (below)?

Well, both are correct, yours just describes the same thing at the
higher level. But I think rootfs is an implementation detail, so is
the fact that it gets a zero mount ID, so I think the original
description better captures the essence of the interface. Except it
needs to clarify what "top of the mount tree" means. It doesn't mean
current process's root, rather it means the root of the mount tree in
the current mount namespace.

Thanks,
Miklos