Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

From: Oren Laadan
Date: Fri Nov 05 2010 - 18:22:39 EST




On 11/04/2010 04:05 AM, Tejun Heo wrote:
Hello,

On 11/04/2010 04:40 AM, Kapil Arya wrote:
(Sorry for resending the message; the last message contained some html
tags and was rejected by server)

And please also don't top-post. Being the antisocial egomaniacs we
are, people on lkml prefer to dissect the messages we're replying to,
insert insulting comments right where they would be most effective and
remove the passages which can't yield effective insults. :-)

In our personal view, a key difference between in-kernel and userland
approaches is the issue of security. The Linux C/R developers state
the issue very well in their FAQ (question number 7):
https://ckpt.wiki.kernel.org/index.php/Faq :
7. Can non-root users checkpoint/restart an application ?

For now, only users with CAP_SYSADMIN privileges can C/R an
application. This is to ensure that the checkpoint image has not been
tampered with and will be treated like a loadable kernel-module.

That's an interesting point but I don't think it's a dealbreaker.
Kernel CR is gonna require userland agent anyway and access control
can be done there.

Indeed, this is a restriction on the new eclone() syscall, and can
be addressed with proper userspace tools (including crypo-sign the
checkpoint image). There core of the c/r code allows a user to
restore anything within the user's privilege level.

Being able to snapshot w/o root privieldge
definitely is a plust but it's not like CR is gonna be deployed on
majority of desktops and servers (if so, let's talk about it then).

Why not ? it has zero overhead when not in use, and a reasonable
code footprint (which can be reduced by modularizing some of it,
but that's outside the point).

Strategies like these are easily handled in userspace. We suspect
that while one may begin with a pure kernel approach, eventually,
one will still want to add a userland component to achieve this kind
of flexibility, just as BLCR has already done.

Yeap, agreed. There gotta be user agents which can monitor and
manipulate userland states. It's a fundamentally nasty job, that of

Are we talking about distributed checkpoint or "standalone" ?

DMTCP relies on user agents to allow distributed/remote execution
in a manner mostly transparent to the application. Many distributed
systems don't require (and do not use) user agents. Consider a
multi-tier system with web server, sql server and some applications
server. These are not suitable to DMTCP's mode or work.

(This is not to say DMTCP isn't useful - it's a clever piece of
software with specific goals and more geared towards HPC needs).

Now regarding "standalone" c/r, if you want to save/restore single
or a subset of processes of a system without the rest of it, then
you will always need user agents, regardless of userspace/kernel
method. Likewise, their work on those tools will be as useful
independently of which c/r 'engine' it uses.

When you include all the relevant processes (e.g. an entire VNC
session, a web server, HPC and batch jobs), you generally don't
need the user agents. The checkpoint is self-contained, and linux-cr
can provide you that guarantee at checkpoint time.

collecting and applying application-specific workarounds. I've only
glanced the dmtcp paper so my understanding is pretty superficial.
With that in mind, can you please answer some of my curiosities?

* As Oren pointed out in another message, there are somethings which
could seem a bit too visible to the target application. Like the
manager thread (is it visible to the application or is it hidden by
the libc wrapper?) and reserved signal. Also, while it's true that
all programs should be ready to handle -EINTR failure from system
calls, it's something which is very difficult to verify and test and
could lead to once-in-a-blue-moon head scratchy kind of failures.

If there is a will, there is (almost always) a way ;)

What MTCP does, IIUC, is wrap around the applications with a complete
pid-namespace (and more) in userspace. There are/were also commercial
products that do that. It's a tremendous effort and I'm impressed by
their (MTCP) work so far.

It is important to understand that it has a price tag: performance
and complexity. It's usually useful for HPC needs, but unsuitable
for the generic server/VPS space.


I think most of those issues can be tackled with minor narrow-scoped
changes to the kernel. Do you guys have things on mind which the
kernel can do to make these things more transparent or safer?

Hmmm... the kernel already does much of it - for instance, we have
neat pid-namespace infrastructure; does it make sense to go into
the trouble of adding interfaces to provide for pid-virtalization
in userspace ? we should be past that ...

Moreover, your objection was based on the apparent complexity of
a badly presented aggregate diff (and I disagree: most of that
are simple refactoring and cleanups). However, that very set of
"narrow-scoped changes" to the kernel that you suggest, will take
life in the form of kernel patches that will do more than these
and will achieve less.

* The feats dmtcp achieves with its set of workarounds are impressive
but at the same time look quite hairy. Christoph said that having a
standard userland C-R implementation would be quite useful and IMHO
it would be helpful in that direction if the implementation is
modularized enough so that the core functionality and the set of
workarounds can be easily separated. Is it already so?

From what I understand, the 'wrapper' functionality to support
distributed operation is said to be well modularized from the
actual c/r engine - which will allow it to use better c/r engines;
and coincidentally, I have one in mind... ;)

Oren.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/