Re: [PATCH 00/80] Kernel based checkpoint/restart [v18]

From: Serge E. Hallyn
Date: Mon Sep 28 2009 - 12:37:24 EST


Quoting Andrew Morton (akpm@xxxxxxxxxxxxxxxxxxxx):
> On Wed, 23 Sep 2009 19:50:40 -0400
> Oren Laadan <orenl@xxxxxxxxxxx> wrote:
> > Q: What about namespaces ?
> > A: Currrently, UTS and IPC namespaces are restored. They demonstrate
> > how namespaces are handled. More to come.
>
> Will this new code muck up the kernel?

Actually user namespaces are handled as well. Pid namespaces will
be named and recorded by kernel at checkpoint, and re-created in
userspace using clone(CLONE_NEWPID). This shouldn't muck up the
kernel at all. The handling of network and mounts namespaces is
at this point undecided. Well, mounts namespaces themselves are
pretty simple, but not so much for mountpoints. There it's mainly
a question of how to predict what a user wants to have automatically
recreated. All mounts which differ between the root checkpoint task
and its parent? Do we do no mounts for the restarted init task at
all, and only recreate mounts in private child namespaces (i.e. if a
task did a unshare(CLONE_NEWNS); mount --make-private /var;
mount --bind /container2/var/run /var/run)?

I hear a decision was made at plumber's about how to begin
handling them, so I'll let someone (Oren? Dave?) give that info.

For network namespaces i think it's clearer that a wrapper
program should set up the network for the restarted init task,
while the usrspace code should recreate any private network
namespaces and veth's which were created by the application.
But it still needs discussion.

> > Q: What additional work needs to be done to it?
> > A: Fill in the gory details following the examples so far. Current WIP
> > includes inet sockets, event-poll, and early work on inotify, mount
> > namespace and mount-points, pseudo file systems
>
> Will this new code muck up the kernel, or will it be clean?
>
> > and x86_64 support.
>
> eh? You mean the code doesn't work on x86_64 at present?

There have been patches for it, but I think the main problem is noone
involved has hw to test.

> What is the story on migration? Moving the process(es) to a different
> machine?

Since that's basically checkpoint; recreate container on remote
machine; restart on remote machine; that will mainly be done by
userspace code exploiting the c/r kernel patches.

The main thing we may want to add is a way to initiate pre-dump
of large amounts of VM while the container is still running.
I suspect Oren and Dave can say a lot more about that than I can
right now.

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/