Process Migration on Linux - Impossible?

Fabio Olive Leite (leitinho@akira.ucpel.tche.br)
Fri, 26 Sep 1997 21:09:17 -0300 (GRNLNDST)


Hi there,

Just out of curiosity (nevertheless I'll shoot myself if it can't be
done), I was wondering how it would be possible to implement a process
migration facility on Linux, so that idle time can be put to better use
on a net of Linux stations.

The real fact is that I proposed just the above as my graduation project,
so I gotta get it working somehow, and a couple of suggestions would be
nice. I've been doing my homework (I obviously hadn't done it when I
proposed such a gigantic work), and the more I study the Linux kernel more
I get worried about not graduating :). But I can see a light at the end of
the tunnel (maybe it's the train called _failure_).

The model I'll use to attempt it will have some of the work done by the
kernel, and some more by a user level daemon, that will comunicate with
the kernel part in a kerneld fashion. The kernel will never take a strong
part on the migration, like selecting processes for it (distributed
scheduling would be a big bloat on a centralized system), or actually
transfering the virtual memory of a process via the network.

The idea is having a kernel that _supports_ process migration, _helps_ in
the migration, but don't actually take any opinion on it's own, delegating
that position to the user level daemon. I'm not concerned with speed,
since it's gonna crawl anyway. I just want to get a process to actually
get transfered to another machine and keep running (and you keep reading,
I'm not crazy).

Say we want to migrate process X. The daemon talks to the daemon at the
target machine (I'm not concerned with a wonderfull selection of a target,
as that's load _balancing_, not just load _sharing_), and the daemon at
the target convinces the kernel to create a new process, which will have
special needs, as all it's files _and_ code will be on another machine.

I intend to alter all the f_ops, mm_ops, whetever_ops structures of this
process, so that whenever it faults on a page, the routines on my kernel
module get called, and so it talks to the daemon, which talks to the
daemon on the source machine, which talks to the kernel on the source,
which will find da damn page and send it back all the way through.

File ops will have a similar treatment, but then there is some other
concerns, like IPC. Rerouting signals is simple, and so are messages, but
shared memory, for instance, would be a major pain in the ass, if not
impossible at all. Simple tweaks like having all processes involved to
fault on the shared page when it's accessed and then distribute the page
(if someone faulted on it) to the other machine on the next schedule()
would maybe work, but looks horrible.

Anyway, I'm not terribly concerned with such "advanced" processes. If a
simple RC5 cracker, which is CPU-bound, get to continue working, I'll
consider it great. :)

Another great job would have to be put to redirecting all syscalls which
are not as simple as a getpid(). Maybe they can be redirected some way
similar to the page faults. Also obvious is the fact that this migrated
process will be in a state where it won't get on the run queue, or things
will get funny. I don't know if it's better to create a new state, or use
a similar one such as UNINTERRUPTIBLE.

For the record, I was reading about process migration on Sprite when I
decided to accomplish such beast. Maybe I went insane for some days, I
don't know. The fact is that I have to do it.

Now for the real thing: Does anyone consider this project so improbable
as to not even a very simple "Hello World!" be able to migrate?

I'm not on the list, as I can't cope with the traffic. Answers and
suggestions directly (or CC'ed) to me, please!

Now the sentence all mails to linux-kernel should have: Keep up the good
work guys! Linux is the best thing since microwave popcorn!

[]!
Fabio
( Fabio Olive Leite leitinho@akira.ucpel.tche.br )
( Computing Science Student http://akira.ucpel.tche.br/~leitinho/ )
( )
( LOADLIN.EXE: The best Windows95 application. [Debian GNU/Linux] )