Re: Remote fork() and Parallel Programming

Alan Cox (alan@lxorguk.ukuu.org.uk)
Fri, 12 Jun 1998 10:50:26 +0100 (BST)


> On Thu, Jun 11, 1998 at 09:15:32AM +0800, Michael O'Reilly wrote:
> >
> > Nonsense. You can implement process migration by using
> > checkpoint/restart, but there's no way you can use process migration
> > to implement checkpoint/restarting.
>
> You have to do some pretty cunning stuff for any open FDs, and sysv ipc
> objects, etc. when you checkpoint. (Assuming the application isn't
> explicitly written to received a signal and do this for you.... if the
> application is aware of this, I think you can pretty much do it all in user
> space).

A proper implementation of checkpointing checkpoints a group of processes and
all their interprocess state (even things like draining and recording pending
pipe data). You can then continue the group on any system where the equivalent
resources are free.

You also don't need rfork in the kernel - its a library level problem since
rfork is

fork()
if(child)
{
checkpoint("file");
remote_exec("restart file");
}

(you do want signal propogation but Don Becker & co already have this for
Beowulf in about 20 lines of actual code and a daemon).

This is where I think some of the argument is coming from. The assertions that
you want the minimum in kernel are true as are those that the end user needs
a nice interface. You can push the interfaces into libraries however not
into the kernel.

Bits of DIPC basically do need to be in kernel. Its certainly possible to
implement user space shared semaphore/message queues albeit somewhat more
slowly and the semantics are very hard to get right to preserve the
guarantees SYS5 operations have. I can't see a way to do sys5 shm
distribution without kernel help.

Alan

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu