Re: Remote fork() and Parallel Programming

mshar@vax.ipm.ac.ir
Sat, 13 Jun 1998 03:40:00 +0330


Hi,

lm@bitmover.com (Larry McVoy) wrote:

>: As I see things, distributed programming should not be (very) different from
>: programming a single computer. [etc].
>
>Again, please be more specific. I've asked over and over for specific
>examples of how your system would be used and you always reply with
>"it should work this way". I'm not interested in how it should work
>in your mind. I'm interested in how your model works for real applications.
>You are assuming that there are such applications. Yes, there are. But
>no real applications fit your model even slightly [...]

I'll try explaining once more. This time I'll get a bit into the
implementation, as it seems to make understanding my intentions easier.

The most important thing is to see that I have no special category of
applications as my target. ___ANY___ application can benefit from a
remote fork and process migration. An application program that spawns
its children in a single computer can do the same think in different
machines, with minimal changes in the source code, as I'll explain below.

The ordinary fork() system call can be modified to also act as rfork() as
well. The application program just calls the fork(), and the extended
version will decide if the new process should be created in the local
machine or elsewhere.

In such a system it is a good idea to allow the application to give "hints"
to the system. For example, an application that spawns many short living
children can ask the system not to do a remote fork(), and thus save system
resources. Some points arise here, such as:

1) One may need to run older programs (possibly with no available source
code), and they may not work very good in the new environment.

2) The implementation of the new extentions may be gradual. For example, we
may not have solved the sockets problem (I'll come to this later).

To solve these problems, we can define a "default behavior" for our
extensions, such as NO remote fork(), unless the application doing the fork()
has told the system to do so. The same can apply to process migration.

Hints are not something new to UNIX: System V's shared memories can be
"locked" in memory to prevent them from being swapped out. The application
can "unlock" a shared memory at will.

If possible, the hint mechanism should be implemented in such a way that it
can be used in kernels without the extension (where it is simply ignored).
In DIPC, this is done by using a new flag that is "or"ed with older flags.
This new flag is ignored by kernels with no DIPC support. Doing this is not
absolutely necessary, but if done, will have executable-code compatibilty
between older and newer kernels. Good for the time when the newer kernel
has not completely replaced the older one.

If at one time all the transparency problems have been solved (for the
sockets, we can assign a single address to the cluster, and make one computer
responsible for handling incoming data. This computer can keep track of
where a process is currently executing and deliver its data), then one can
consider changing the default behaviour.

Nothing is forcing us to add all the code implementing these features to
the kernel. One can put most of the code in user-space programs or libraries
and let the kernel use them as needed. The kernel is used as a service
point, resulting in minimal change to programming methods and source codes.
One might even consider allowing the applications to use the services of such
user-space entities without going thru the kernel (for example, if a
checkpoint/restart mechanism is used for process migration, an application
may use it to create a checkpoint of itself on the local hard drive).

By short circuiting the extensions when the user-space programs and
libraties are not running or present, we can make sure the user can
use the newer kernels in the old ways.

DIPC ( http://wallybox.cei.net/dipc ) provides an example of this design.

As you see, From my point on view the application should be able to regard
a cluster as a sinlge computer. I consider distributed programming as a
natural extention of ordinary programming. There should not be anything
especial about distributed programs: Developing them should be as simple as
developing single-computer applications. I think the only people who might
not like this goal are those who rely on the difficulties of distributed
application development to make a living by developing such programs.

> [...] Go look at the Vax Cluster
>documentation, at Parallel Oracle, Sabre, MPI, PVM, Locus, Pratt & Whitney
>simulation cluster, all the national labs, the financial trading systems,
>to mention a few of the more important ones.

I'm not usually impressed by what the majority of other reaserchers have
done. They might have their own priorities when designing their systems.
As a general rule, in science the majority should not always be considered
right (simply because it is hard to define what is "right")

>Not one of 'em gives a damn about remote fork. Not one. If you were to
>stop and think about it for a while, you might realize what a horrible
>failure model remote fork() would carry with it. What are you going to
>do when your cluster full of rforked() processes loses a node? And that
>mode happens to have one page of data that they are all sharing.

I have been mentioning remote fork() because it will be a natural extension
of an already-available mechanism. Fault tolerance is a subject that should
be discussed separately, but process migration can be of use here. For
example, copying (migrating) a process to more that one computer from time
to time and holding it from running will allow us to switch to another copy
as soon as a computer goes down.

>For clusters, shared memory sucks, fork() sucks. Clusters need strong
>boundries between the processes, and strong failure models, since the
>probability of a node failing goes to 1 as the number of nodes goes up.
>It's absolutely unacceptable to have you application crash because one
>node went away. How does remote fork() work on that system? How does
>it make sense?

That is your opinion. From my point of view, a cluster is something like a
virtual SMP computer. No strong boundaries at all. As I said before, one can
use replication to bring fault tolerance to the system, as the possibility
of two computers going down simultaneously is less than that of a single
computer.

>Your only answer when questioned is to repeatedly insist it will be a good
>thing and you never answer any specific questions. How do you expect to
>lead people down a path if you don't have good answers for even the obvious
>questions?

Up to know I used to think we have a common base of assumptions for our
discussion. I prefer to discuss ideas, and usually don't get very much into
implementation details. I really hope my above additional explanations have
helped.

> "In theory, theory and practice are the same.
> In practice, they are different."
>
> --me circa 1993 or so
>
>Text books are great for theory. I'm much more interested in practice.

Of course not. All those good, practical things you see in the computer
software world are built on theoretical grounds, even if we are not familiar
with them (or don't like them). Please remember that everything was a theory
before being implemented for the first time.

>: It is very hard to define simplicity and complexity. For example, I have a
>: hard time considering a monolithic OS design as simple. Most UNIX variants
>: are just a jungle of code (they could have been simple once). They are
>: complicated by design.
>
>No, they aren't. An operating system is made up of a collection of objects,
>i.e., devices, files, file systems, processes, sockets. You can think of
>these objects as C++ classes with all virtual methods; otherwise known as
>a set of interfaces whee each instance of the class implements all the
>methods of that class. All operating systems work this way. The generic
>portion of the code that lives above the objects is very small and simple,
>go look at Linux. The generic process and memory management code is
>14K lines. Tiny. All the code is in the instances of the various objects,
>mostly in dirvers and then networking and file systems making up the rest.

I thought you like what the majority do? Monolithic kernels are a thing of
the past. New OS's are being developed every day, and very few of them are
monolithic. Do you want to say that the majority are wrong??

>But the generic part is very, very small. And as simple as it can be.

I'll keep that in mind the next time I change a global variable in the Linux
kernel sources and manage to crash it due to a problem in a seemingly
unrelated part of the kernel.

>What you have been proposing to add is all code that would have to live
>in the generic part of the system. The things you are contemplating would
>completely dwarf the rest of the generic code.

By "system" I also mean user-space programs and libraries.

> "An architect is someone who understands
> the difference between what could be done
> and what should be done"
>
> --me again circa 1992, in the midst of cluster debates
>
>I applaud your ability to realize what could be done. I am fully aware
>that what you are propsing could be done. I think it is a bad idea, I
>think ideas like yours have been proposed, tried, and failed repeatedly.
>I think you got the first half right, now you need to spend some time
>wonder why that pain-in-the-ass-McVoy is so insistant that this is
>something that should not be done.

Thanks for the complements. You seem to be an assertive and self-confident
person (generally a good thing). Unfortunately, such attributes can cause
problems in a scientific debate. I am as sure about my views as you are about
yours, but I always consider it possible that I am wrong.

>By the way - I have zero control over what you or anyone else does. You
>can win the whole debate by showing up with the code. But so far, you are
>just suggesting this as an idea. If we're just talking ideas, I'm quite
>happy to match my experience and other people's experiences, much of
>it documented, against your ideas [...]

One gains experience while working to reach some goals. I am not sure if
what you have been trying to reach in distributed computing has been the
same thing as what I explained in this email.

Unfortunately I am currently not in a position to actively work on these
problems. But I hope there are other people who have either started working,
or are considering to do so.

> [...] The reason for that is that there are
>a lot of people that read this list quietly, think about what is
>discussed, and act accordingly. I'm hoping that they will think about
>this topic and try a new approach, not the same old thing that has
>failed over and over in the past.

The point is, doing things the way I have been saying might need the active
cooperation of the Linux heavy-weights (who are more familiar with the kernel
internals) and also Linus's blessings (to actually put add it to Linux). Can
some one wanting to start such work rely of that??

-Kamran Karimi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu