Re: pivot_root(".", ".") and the fchdir() dance

From: Eric W. Biederman
Date: Mon Sep 09 2019 - 19:41:02 EST


"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:

> Hello Eric,
>
> Thanks for chiming in; I should have thought to CC you at the start. I
> have a question or two, below.
>
> On Mon, 9 Sep 2019 at 12:40, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>
>> > Hello Philipp,
>> >
>> > On Tue, 6 Aug 2019 at 10:12, Philipp Wendler <ml@xxxxxxxxxxxxxxxxx> wrote:
>> >>
>> >> Hello Michael, hello Aleksa,
>> >>
>> >> Am 05.08.19 um 14:29 schrieb Michael Kerrisk (man-pages):
>> >>
>> >> > On 8/5/19 12:36 PM, Aleksa Sarai wrote:
>> >> >> On 2019-08-01, Michael Kerrisk (man-pages) <mtk.manpages@xxxxxxxxx> wrote:
>> >> >>> I'd like to add some documentation about the pivot_root(".", ".")
>> >> >>> idea, but I have a doubt/question. In the lxc_pivot_root() code we
>> >> >>> have these steps
>> >> >>>
>> >> >>> oldroot = open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
>> >> >>> newroot = open(rootfs, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
>> >> >>>
>> >> >>> fchdir(newroot);
>> >> >>> pivot_root(".", ".");
>> >> >>>
>> >> >>> fchdir(oldroot); // ****
>> >> >>>
>> >> >>> mount("", ".", "", MS_SLAVE | MS_REC, NULL);
>> >> >>> umount2(".", MNT_DETACH);
>> >> >>
>> >> >>> fchdir(newroot); // ****
>> >> >>
>> >> >> And this one is required because we are in @oldroot at this point, due
>> >> >> to the first fchdir(2). If we don't have the first one, then switching
>> >> >> from "." to "/" in the mount/umount2 calls should fix the issue.
>> >> >
>> >> > See my notes above for why I therefore think that the second fchdir()
>> >> > is also not needed (and therefore why switching from "." to "/" in the
>> >> > mount()/umount2() calls is unnecessary.
>> >> >
>> >> > Do you agree with my analysis?
>> >>
>> >> If both the second and third fchdir are not required,
>> >> then we do not need to bother with file descriptors at all, right?
>> >
>> > Exactly.
>> >
>> >> Indeed, my tests show that the following seems to work fine:
>> >>
>> >> chdir(rootfs)
>> >> pivot_root(".", ".")
>> >> umount2(".", MNT_DETACH)
>> >
>> > Thanks for the confirmation, That's also exactly what I tested.
>> >
>> >> I tested that with my own tool[1] that uses user namespaces and marks
>> >> everything MS_PRIVATE before, so I do not need the mount(MS_SLAVE) here.
>> >>
>> >> And it works the same with both umount2("/") and umount2(".").
>> >
>> > Yes.
>> >
>> >> Did I overlook something that makes the file descriptors required?
>> >
>> > No.
>> >
>> >> If not, wouldn't the above snippet make sense as example in the man page?
>> >
>> > I have exactly that snippet in a pending change for the manual page :-).
>>
>> I have just spotted this conversation and I expect if you are going
>> to use this example it is probably good to document what is going
>> on so that people can follow along.
>
> (Sounds reasonable.)
>
>> >> chdir(rootfs)
>> >> pivot_root(".", ".")
>>
>> At this point the mount stack should be:
>> old_root
>> new_root
>> rootfs
>
> In this context, what is 'rootfs'? The initramfs? At least, when I
> examine /proc/PID/mountinfo. When I look at the / mount point in
> /proc/PID/mountinfo, I see just
>
> old_root
> new_root
>
> But nothing below 'new_root'. So, I'm a little puzzled.

I think that is because Al changed /proc/mounts to not display mounts
that are outside of your current root. But yes there is typically
the initramfs of file system type rootfs on their. Even when it isn't
used you have one. Just to keep everything simple I presume.

I haven't double checked lately to be certain it is there but I expect
it is.

> By the way, why is 'old_root' stacked above 'new_root', do you know? I
> mean, in this scenario it turns out to be useful, but it's kind of the
> opposite from what I would have expected. (And if this was a
> deliverate design decision in pivot_root(), it was never made
> explicit.)

Oh. It is absolutely explicit and part of the design and it has nothing
to do with this case.

The pivot_root system calls takes two parameters: new_root and put_old.

In this case the old root is put on put_old (which is the new_root).
And new_root is made the current root.

The pivot_root code looks everything up before it moves anything. With
the result it is totally immaterrial which order the moves actually
happen in the code. Further it is pretty much necessary to look
everything up before things are moved because the definition of paths
change.

So it would actually be difficult to have pivot_root(.,.) to do anything
except what it does today.


>> With "." and "/" pointing to new_root.
>>
>> >> umount2(".", MNT_DETACH)
>>
>> At this point resolving "." starts with new_root and follows up the
>> mount stack to old-root.
>
> Okay.
>
>> Ordinarily if you unmount "/" as is happening above you then need to
>> call chroot and possibly chdir to ensure neither "/" nor "." point to
>> somewhere other than the unmounted root filesystem. In this specific
>> case because "/" and "." resolve to new_root under the filesystem that is
>> being unmounted that all is well.
>
> s/that/then/ ?

Yes.

Eric