Re: Process Migration on Linux - Impossible?

Michael K. Johnson (johnsonm@redhat.com)
Wed, 01 Oct 1997 11:13:53 -0400


Ketil Z Malde writes:
>lm@cobaltmicro.com (Larry McVoy) writes:
>> The answer is: you don't move the processes very often.
>
>Right. But that is very different from not moving it. The use I see
>for this, is to have a cluster in my basement that provides services,
>including CPU. Now, in order to provide high availability, it is
>necessary that processes can be migrated off a node being brought down
>for maintenance. Similar for network connections, etc.

That's checkpoint-restart you want, not clustered process migration.
Note that the process migration that's been talked about here redirects
stateful system calls (file I/O, getpid(), etc.) to the originating
processor, and that won't survive a reboot of the originating processor
very well.

Condor already does checkpoint-restart within certain constraints.

It's very hard to do checkpoint-restart in the completely general case;
how do you maintain existing network connections when you transfer a
process to another CPU or reboot your system. There's a lot of state
to save...

>Most of this could be accomplished in user space, I'd think.

Arbitrary checkpoint-restart needs kernel help. Limited checkpoint-restart
already exists in user space.

>> As has been repeatedly proven, moving an already started process is a lose
>> almost 100% of the time.
>
>Right. Unless somebody just typed shutdown -hnow at the #-rompt.

Not even then.

michaelkjohnson

"Magazines all too frequently lead to books and should be regarded by the
prudent as the heavy petting of literature." -- Fran Lebowitz