"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
Hello Eric,
A ping on my question below. Could you take a look please?
Thanks,
Michael
The concern from our conversation at the container mini-summit was that
there is a pathology if in your initial mount namespace all of the
mounts are marked MS_SHARED like systemd does (and is almost necessary
if you are going to use mount propagation), that if new_root itself
is MS_SHARED then unmounting the old_root could propagate.
So I believe the desired sequence is:
+++ mount("", ".", MS_SLAVE | MS_REC, NULL);chdir(new_root);
pivot_root(".", ".");
umount2(".", MNT_DETACH);
The change to new new_root could be either MS_SLAVE or MS_PRIVATE. So
long as it is not MS_SHARED the mount won't propagate back to the
parent mount namespace.
Thanks. I made that change.
For what it is worth. The sequence above without the change in mount
attributes will fail if it is necessary to change the mount attributes
as "." is both put_old as well as new_root.
When I initially suggested the change I saw "." was new_root and forgot
"." was also put_old. So I thought there was a silent danger without
that sequence.
So, now I am a little confused by the comments you added here. Do you
now mean that the
mount("", ".", MS_SLAVE | MS_REC, NULL);
call is not actually necessary?
Apologies for being slow getting back to you.
To my knowledge there are two cases where pivot_root is used.
- In the initial mount namespace from a ramdisk when mounting root.
This is the original use case and somewhat historical as rootfs
(aka an initial ramfs) may not be unmounted.
- When setting up a new mount namespace to jettison all of the mounts
you don't need.
The sequence:
chdir(new_root);
pivot_root(".", ".");
umount2(".", MNT_DETACH);
is perfect for both use cases (as nothing needs to be known about the
directory layout of the new root filesystem).
In the case when you are setting up a new mount namespace propogating
changes in the mount layout to another mount namespace is fatal. But
that is not a concern for using that pivot_root sequence above because
pivot_root will fail deterministically if
'mount("", ".", MS_SLAVE | MS_REC, NULL)' is needed but not specified.
So I would document the above sequence of three system calls in the
man-page.
I would document that pivot_root will fail if propagation would occur.
I would document in pivot_root or under unshare(CLONE_NEWNS) that if
mount propagation is enabled (the default with systemd) that you
need to call 'mount("", "/", MS_SLAVE | MS_REC, NULL);' or
'mount("", "/", MS_PRIVATE | MS_REC, NULL);' after creating a mount
namespace. Or mounts will propagate backwards, which is usually
not what people want.
Creating of a mount namespace in a user namespace automatically does
'mount("", "/", MS_SLAVE | MS_REC, NULL);' if the starting mount
namespace was not created in that user namespace. AKA creating
a mount namespace in a user namespace does the unshare for you.