Re: pivot_root(".", ".") and the fchdir() dance
From: Michael Kerrisk (man-pages)
Date: Mon Sep 09 2019 - 10:48:17 EST
Hello Eric,
Thanks for chiming in; I should have thought to CC you at the start. I
have a question or two, below.
On Mon, 9 Sep 2019 at 12:40, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>
> > Hello Philipp,
> >
> > On Tue, 6 Aug 2019 at 10:12, Philipp Wendler <ml@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> Hello Michael, hello Aleksa,
> >>
> >> Am 05.08.19 um 14:29 schrieb Michael Kerrisk (man-pages):
> >>
> >> > On 8/5/19 12:36 PM, Aleksa Sarai wrote:
> >> >> On 2019-08-01, Michael Kerrisk (man-pages) <mtk.manpages@xxxxxxxxx> wrote:
> >> >>> I'd like to add some documentation about the pivot_root(".", ".")
> >> >>> idea, but I have a doubt/question. In the lxc_pivot_root() code we
> >> >>> have these steps
> >> >>>
> >> >>> oldroot = open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
> >> >>> newroot = open(rootfs, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
> >> >>>
> >> >>> fchdir(newroot);
> >> >>> pivot_root(".", ".");
> >> >>>
> >> >>> fchdir(oldroot); // ****
> >> >>>
> >> >>> mount("", ".", "", MS_SLAVE | MS_REC, NULL);
> >> >>> umount2(".", MNT_DETACH);
> >> >>
> >> >>> fchdir(newroot); // ****
> >> >>
> >> >> And this one is required because we are in @oldroot at this point, due
> >> >> to the first fchdir(2). If we don't have the first one, then switching
> >> >> from "." to "/" in the mount/umount2 calls should fix the issue.
> >> >
> >> > See my notes above for why I therefore think that the second fchdir()
> >> > is also not needed (and therefore why switching from "." to "/" in the
> >> > mount()/umount2() calls is unnecessary.
> >> >
> >> > Do you agree with my analysis?
> >>
> >> If both the second and third fchdir are not required,
> >> then we do not need to bother with file descriptors at all, right?
> >
> > Exactly.
> >
> >> Indeed, my tests show that the following seems to work fine:
> >>
> >> chdir(rootfs)
> >> pivot_root(".", ".")
> >> umount2(".", MNT_DETACH)
> >
> > Thanks for the confirmation, That's also exactly what I tested.
> >
> >> I tested that with my own tool[1] that uses user namespaces and marks
> >> everything MS_PRIVATE before, so I do not need the mount(MS_SLAVE) here.
> >>
> >> And it works the same with both umount2("/") and umount2(".").
> >
> > Yes.
> >
> >> Did I overlook something that makes the file descriptors required?
> >
> > No.
> >
> >> If not, wouldn't the above snippet make sense as example in the man page?
> >
> > I have exactly that snippet in a pending change for the manual page :-).
>
> I have just spotted this conversation and I expect if you are going
> to use this example it is probably good to document what is going
> on so that people can follow along.
(Sounds reasonable.)
> >> chdir(rootfs)
> >> pivot_root(".", ".")
>
> At this point the mount stack should be:
> old_root
> new_root
> rootfs
In this context, what is 'rootfs'? The initramfs? At least, when I
examine /proc/PID/mountinfo. When I look at the / mount point in
/proc/PID/mountinfo, I see just
old_root
new_root
But nothing below 'new_root'. So, I'm a little puzzled.
By the way, why is 'old_root' stacked above 'new_root', do you know? I
mean, in this scenario it turns out to be useful, but it's kind of the
opposite from what I would have expected. (And if this was a
deliverate design decision in pivot_root(), it was never made
explicit.)
> With "." and "/" pointing to new_root.
>
> >> umount2(".", MNT_DETACH)
>
> At this point resolving "." starts with new_root and follows up the
> mount stack to old-root.
Okay.
> Ordinarily if you unmount "/" as is happening above you then need to
> call chroot and possibly chdir to ensure neither "/" nor "." point to
> somewhere other than the unmounted root filesystem. In this specific
> case because "/" and "." resolve to new_root under the filesystem that is
> being unmounted that all is well.
s/that/then/ ?
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/