Re: [PATCH RESEND] init/initramfs.c: make initramfs support pivot_root

From: Menglong Dong
Date: Mon May 24 2021 - 20:56:07 EST


Hello!

On Tue, May 25, 2021 at 6:58 AM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
> > In fact, the network namespace is just a case for the problem that the
> > 'mount leak' caused. And this kind modification is not friendly to
> > current docker users, it makes great changes to the usage of docker.
>
> You mean an upgrade of docker? If so... that does not seem like a
> definitive reason to do something new in the kernel *always*.

No, I means that in this solution, users have to create all the
containers first,
and create the network namespace for them later.

>
> However, if you introduce it as a kconfig option so that users
> who want to use this new feature can enable it, and then use it,
> the its sold as a new feature.
>
> Should this always be enabled, or done this way? Should we never have
> the option to revert back to the old behaviour? If not, why not?
>

This change seems transparent to users, which don't change the behavior
of initramfs. However, it seems more reasonable to make it a kconfig option.
I'll do it in the v2 of the three patches I sended.

>
> I don't see you using any context directly, where are you specifying the
> context directly?
>
> > so it is not needed any more
> > and I just removed it in init_rootfs().
> >
> > The initialization of 'user_root' in init_rootfs() is used in:
> > do_populate_rootfs -> mount_user_root, which set the file system(
> > ramfs or tmpfs) of the second mount.
> >
> > Seems it's not suitable to place it in init_rootfs()......
>
> OK I think I just need to understand how you added the context of the
> first mount explicitly now and where, as I don't see it.
>

......

>
> What do you mean? init_mount_tree() is always called, and it has
> statically:
>
> static void __init init_mount_tree(void)
> {
> struct vfsmount *mnt;
> ...
> mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
> ...
> }
>
> And as I noted, this is *always* called earlier than
> do_populate_rootfs(). Your changes did not remove the init_mount_tree()
> or modify it, and so why would the context of the above call always
> be OK to be used now with a ramfs context now?
>
> > So it makes no sense to make the file system of the first mount selectable.
>
> Why? I don't see why, nor is it explained, we're always caling
> vfs_kern_mount(&rootfs_fs_type, ...) and you have not changed that
> either.
>
> > To simplify the code here, I make it ramfs_init_fs_context directly. In fact,
> > it's fine to make it shmen_init_fs_context here too.
>
> So indeed you're suggesting its arbitrary now.... I don't see why.
>

So the biggest problem now seems to be the first mount I changed, maybe I didn't
make it clear before.

Let's call the first mount which is created in init_mount_tree() the
'init_mount'.
If the 'root' is a block fs, initrd or nfs, the 'init_mount' will be a
ramfs, that seems
clear, it can be seen from the enable of tmpfs:

void __init init_rootfs(void)
{
if (IS_ENABLED(CONFIG_TMPFS) && !saved_root_name[0] &&
(!root_fs_names || strstr(root_fs_names, "tmpfs")))
is_tmpfs = true;
}

That makes sense, because the 'init_mount' will not be the root file
system, as the
block fs or initrd or nfs that mounted later on '/root' will become
the root by a call of
init_chroot().

If the 'root' is initramfs, cpio will be unpacked init 'init_mount'.
In this scene, the
'init_mount' can be ramfs or tmpfs, and it is the root file system.

In my patch, I create a second mount which is mounted on the
'init_mount', let's call
it 'user_mount'.

cpio is unpacked into 'user_mount', and the 'user_mount' is made the
root by 'init_chroot()'
later. The 'user_mount' can be ramfs or tmpfs. So 'init_mount' is not
the root any more,
and it makes no sense to make it tmpfs, just like why it should be
ramfs when 'root'
is a block fs or initrd.

'init_mount' is created from 'rootfs_fs_type' in init_mount_tree():

static void __init init_mount_tree(void)
{
struct vfsmount *mnt;
...
mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
...
}

I make the 'init_mount' to be ramfs by making
'rootfs_fs_type->init_fs_context' with
'ramfs_init_fs_context':

struct file_system_type rootfs_fs_type = {
.name = "rootfs",
.init_fs_context = ramfs_init_fs_context,
.kill_sb = kill_litter_super,
};

I think the third patch that I sended has made it clear what I do to
the 'init_mount'.

Thanks!
Menglong Dong