The checkpoint / continue capability is a prerequisite to process
migration, since to be able to migrate a process, one must be able to
checkpoint and continue it first, transparency or no transparency. Also,
you could have noticed that a checkpoint / continue capability makes a
remote fork capability quite trivial, since a copy of the process that
has been checkpointed can be continued on as many (remote) nodes as
desired. (so much for flexibility, transparency and having hard time
duplicating the run-time state of a process).
Also, I have to inform you that any dynamic load balancing decision is
at best a speculative one, since a node _cannot_ know how the load will
change on a node that we want to migrate to in the future (nor do we
know that for the originating node), which means that we might find
ourselves filling network bandwidth by copying process data from node to
node as load on nodes goes up and down. If you want to do dynamic load
balancing, you want to minimize the cost of doing it, so you want to
avoid expensive operations, such as copying a process over the network.
Instead of balancing the load by process migration, one can do it much
more efficiently by stopping objects on overloaded nodes and restarting
or continuing others on idle ones (if they are continued, they can take
off from where they were stopped; if they are restarted they are reused
but they start from the beginning and usually with a different set of
data). Dynamic load balancing can be done simply by observing the
progress that individual (remote and local) objects of the application
make and making a balancing decision based on that. And who is more
competent in determining the progress of a part of an application than
the application itself?
Also, the all-knowing authority (even if such a god-like object existed)
in the system that you refer to is a very bad concept security-wise,
since such an object directly violates the principle of least authority.
(so much for 'awareness', 'predicting future' and 'all knowing OS').
To get to more practical points, if we put aside the obvious advantage
from the view of performance, that a message that starts an object is
obviously much more efficient than copying a process over the network
(if you page it in on demand, you still copy it), remote forking is
still not a very good thing, especially in a heterogenuous cpu
environment. It's much better to 'start' an equivalent object on a
remote node that has been built and optimized for the architecture of
the node. One should remember that since by connecting into the internet
you're essentially connecting into a heterogenuous supercomputer, we
might as well do things the right way.
> I hope Mr. Linus Torvalds will have a positive attitude to such changes.
> Without this, any effort to bring such kernel-level functionalities (like
> DIPC) to Linux will have little impact. And I believe that if Linux is to
> compete with the likes of Windows NT, then it better be more equipped with
> these more-advanced mechanisms.
Say what?
Andrej
-- Andrej Presern, andrejp@luz.fe.uni-lj.si
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu