Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

From: Oren Laadan
Date: Sat Nov 20 2010 - 13:11:47 EST



login as: orenl
Using keyboard-interactive authentication.
Password:
Access denied
Using keyboard-interactive authentication.
Password:
Last login: Fri Nov 19 10:17:21 2010 from 192.117.42.81.static.012.net.il
499:takamine[~]$ pine
PINE 4.64 COMPOSE MESSAGE
Folder: Drafts 8 Messages +

To : Tejun Heo <tj@xxxxxxxxxx>
Cc : Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>,
Kapil Arya <kapil@xxxxxxxxxxx>,
Gene Cooperman <gene@xxxxxxxxxxx>,
linux-kernel@xxxxxxxxxxxxxxx,
xemul@xxxxx,
"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>,
Linux Containers <containers@xxxxxxxxxxxxxx>
Fcc : imap://ol2104@xxxxxxxxxxxxxxxxx/Sent
Attchmnt:
Subject : Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
----- Message Text -----
Hi,

[continuation of discussion of kernel vs userspace c/r approach]
part I: perpsectice about the types of scopes of c/r in discussion
part II: linux-cr design adn objectives
part III: comparison kernel/userspace approaches


PART III: ==SOME TECHNICAL ASPECTS==

Important to know about userspace (DMTCP example) before presenting a
comparison between kernel and userspace approaches:

DMTCP has two components: 1) c/r-engine to save/restore process state,
and 2) glue to restart processes out of their original context. They
are _orthogonal_: the glue can be used with of other c/r-engines, like
linux-cr. This discussion refers to the c/r-engine _only_.

Focusing on the c/r-engine of DMTCP - it uses syscall interposition
for three reasons:

1) To take control of processes at checkpoint
2) To always track state of resources not visible to userspace
3) To virtualize identifiers after restart

#1 is needed because processes saves their own state (and need to run
the checkpoint code for that).

#2 is needed because the kernel does not expose all state, and #3 is
needed because the kernel does not give ways to restore all state. So
these two logics are used to mirror in userspace functionality that
already exists in the kernel.

The main advantages of the approach: (a) portability to other system
(like BSD), though with considerable effort (b) it's "good enough" for
several use-cases, without kernel changes.

Putting the c/r-engine in the kernel provides many advantages, which I
summarize in the following table:

category linux-cr userspace
--------------------------------------------------------------------------------
PERFORMANCE has _zero_ runtime overhead visible overhead due to syscalls
interposition and state tracking
even w/o checkpoints;

OPTIMIZATIONS many optimizations possible limited, less effective
only in kernel, for downtime, w/ much larger overhead.
image size, live-migration

OPERATION applications run unmodified to do c/r, needs 'controller'
task (launch and manage _entire_
execution) - point of failure.
restricts how a system is used.

PREEMPTIVE checkpoint at any time, use processes must be runnable and
auxiliary task to save state; "collaborate" for checkpoint;
non-intrusive: failure does long task coordination time
not impact checkpointees. with many tasks/threads. alters
state of checkpointee if fails.
e.g. cannot checkpoint when in
vfork(), ptrace states, etc.

COVERAGE save/restore _all_ task state; needs new ABI for everything:
identify shared resources; can expose state, provide means to
extend for new kernel features restore state (e.g. TCP protocol
easily options negotiated with peers)

RELIABILITY checkpoint w/ single syscall; non-atomic, cannot find leaks
atomic operation. guaranteed to determine restartability
restartability for containers

USERSPACE GLUE possible possible

SECURITY root and non-root modes root and non-root modes
native support for LSM

MAINTENANCE changes mainly for features changes mainly for features;
create new ABI for features

I'm not saying Gene's work isn't good - on the contrary, it's a fine
piece of engineering. However, the part of it that does c/r poses many
constraints that limits the generality, mode of use, and performance of
the whole. That may be enough for Tejun, for your cluster. But not
for other users of the technology.

And by all means, I intend to cooperate with Gene to see how to
make the other part of DMTCP, namely the userspace "glue", work on
top of linux-cr to have the benefits of all worlds !

All in all, kernel c/r is far more generic and less restrictive than
userspace, can provide nice guarantees, and has superior performance.
It can do everything the a userspace c/r can do, and much more - and
that "much more" is crucial for important use cases.

Last word about maintenance - once the core code is in mainline (which
means a code "spike"), experience (both kernel/userspace) shows that
both code and image format hardly change. The format is tied to specific
set of features supported (i.e. kernel versions) so that the kernel
does not need to maintain backward compatibility.

Thanks,

Oren

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/