Re: [RFC] [PATCH 00/13] Introduce task_pid api

From: Bernard Blackham
Date: Wed Nov 16 2005 - 12:54:40 EST

On Tue, Nov 15, 2005 at 06:24:44PM -0800, Hua Zhong (hzhong) wrote:
> I did some checkpoint/restart work on Linux about 5 years ago (you may
> still be able to google "CRAK"), so I'm jumping in with my 2 cents.

Ditto - CryoPID.

> > Personally, I think that these assumptions are incorrect for a
> > checkpoint/restart facility. I think that:
> >
> > (1) It is really only possible to checkpoint/restart a
> > cooperative process.

I agree that some processes are, but the majority are not.

> It's hard, but not impossible, at least theoretically.

Not that hard. Most of the information, if not exported by the
kernel through other means, can be ascertained from within the
process itself. For example, FD offsets can be obtained with lseek,
network connection endpoints with get{sock,peer}name, etc. With a
little help from ptrace, it's trivial.

> > For this to work with uncooperative processes you have to
> > figure out (for example) how to save and restore the file
> > system state. (e.g. how do you get the file position set
> > correctly for an open file in the restored program instance?)
> This is actually one of the simplest problems in checkpoint/restart.

Remote syscalls is how CryoPID does it. Inject some code into the
target and execute it. CryoPID can also scrape out the contents of
unlinked (eg, temporary) files (in svn version). You can establish
what FIFOs joined which FDs of what processes through /proc, and
capture the in-flight buffers with some more ptracing.

> You'd need kernel support to save the state, and restart could be
> done entirely in user space to restore the file descriptors.

Just for file offsets? I disagree =)

> > And this doesn't even consider what to do with open network
> > connections.
> Right, this is really hard. I played with it 5 years ago and I had semi
> success on restoring network connections (with my limited understanding
> on Linux networking and some really ugly hacks). I could restart a
> killed remote Emacs X session with about 50% success rate.

TCP connections can be done with tcpcp ( and CryoPID
already supports it (although the patch hasn't been ported past

UDP connections are not a hassle being stateless (though CryoPID
doesn't yet, because it's too easy :)

Unix sockets can be reconnected, but of course protocols might get
hopelessly confused. However, I am working with the Gtk+ display
migration code to freeze Gtk+ applications. gtk-demo freezes quite
happily as does gvim. The Gtk+ guys are working hard to squash some
remaining bugs to make more apps supported.

However with some prethought, you could hook your X app up to
something like Xmove and migrate any X application that way.

> > Similarly, what does one do about the content of System V shared
> > memory regions or the contents of System V semaphores?

The contents are not so much an issue, as the ids themselves. They
face much the same problem as PIDs - you attach/detach to a SHM
segment by its shmid. These are allocated by the kernel, but cached
in the process. Some method of requesting a particular shmid would
make life easier for checkpointing. Ditto semaphores and message

> > I'm sure there are many more such problems we can come up with a
> > careful study of the Linux/Unix API.

For many processes, there isn't all that much that can't be saved
from userspace. Help from kernel space would certainly make things
easier/faster/more reliable. This task_pid API being one of them.

> > So, I guess my question is wrt the task_pid API is the
> > following: Given that there are a lot of other problems to
> > solve before transparent checkpointing of uncooperative
> > processes is possible, why should this partial solution be
> > accepted into the main line kernel and "set in stone" so to
> > speak?
> I agree with this. Before we see a mature checkpoint/restart solution
> already implemented, there is no point in doing the vpid thing.

Fair enough. I'm actually implementing multithreading support for
CryoPID at the moment. It currently resumes with the original PIDs
by editing last_pid in /dev/kmem and forking - a temporary racy
hack, until a better solution (such as this) appeared. (It fails if
the PID is in use, or if anybody else on the system fork()s before
you do). I'll give the task_pid patches a try and see how much
easier life is.

I'm delighted to see Serge and the vserver guys putting the time
into this! Thanks!



To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at