Re: Remote fork() and Parallel programming

mshar@vax.ipm.ac.ir
Mon, 15 Jun 1998 03:54:09 +0330


Hi,

yodaiken@chelm.cs.nmt.edu wrote:

>> What wrong opinion about DSM? Please refer to DIPC to know of my opinions
>> about DSM.
>
>It's inherently slow. To paraphrase St. Seymour:
> Ok(slow) -> !NeedComputer

First, I give higher priority to ease of programming than speed: Programmers
learn to use shared memory from the very begining. Shared memory, in the form
of _global_and_stack_variables_, is used to exchange data between different
parts of applications. Shared memory _is_ the natural way of data exchange
for most programmers. How many people feel the same about message passing?

For me, it is:
Ok(TooHard) -> MakeItEasier

Distributed programming should be a natural extension to single computer
programming model.

Second, why do you think DSM is that slower than message passing? The major
"problem" (if you can call it that way) with DSM is that the programmer has
little control on the time and amount of data transfered by a DSM system
(after all, this is done transparently). There are two points:

1) The connection setup time (latency) is becoming the major factor in
network operations, so in a fast network once a connection is established,
the amount of data transfered does not matter much. And networks are becoming
faster and faster.

2) By restricting the DSM parts to some well known address spaces of a
process, the programmer knows when a network operation might take place. One
example of such a system is DIPC, in which System V shared memories are
distributed. This give the programmer some control.

>> For process migartion, a simple load-measurement will do for the first
>> implementations. All computers are polled periodically, and the jobs are
>> migrated if some thresholds are exceeded in a machine. Because of the hint
>> mechanism, the application programmer can inform the system not to migrate
>> processes are are short lived, or that use many local resouces.
>
>A) Data shows this does not work. See thirty years of literature on
> process migration.

So you claim process migration is not practical.

>B) Why are "hints" good, but user space directives bad?

How could you compare them? hint can simply be a new system call, or even
some flags or'ed with other flags in an already-available sys call. The
programmer gives the hint (maybe at the start of the program), and then
forgets about it. No "proper time to call the directives" or "proper sequence
to call the directives".

>> It works perfectly when using DSM with strict consistency, but it could be
>> slow. Like many OS text books, I'd tell the programmer to use semaphores
>
>So the user notes that mysterious changes in performance happen under
>the process migration/DSM system, while the programmer in the MPI world
>get predictable speedups. MPI wins.

After trying to develop and debug his application, The MPI programmer may be
too tired to see anything :-) One should also consider the time and cost of
developing message passing programs (done by specialists), and those of
DSM programmers (can be done with no need for special training).

I am not trying to say that DSM should/will replace message passing, but I
say that to bring distributed programming to more people, we should make it
easier. If some one thinks that the performance is not what he needs for his
special application, then he is free to do whatever he likes.

>And most OS textbooks are nonsense. If you can use semaphore synchronization,
>why do you need shared memory? Just send data in messages.

Simple: using shared memories to exchange data is easier for programmers.

>> Yes, synchronizing via DSM can kill a program. A bit of programming
>> discipline is all that is needed to mitigate such problems. I believe the
>> advantages of DSM by far outweigh the disadvantages.
>
>But DSM hides the difference. And DSM with process migration makes it
>impossible to predict performance.

Given that in a dynamic migration system the resource-usage in the cluster
will most probably be better than what the programmer can achieve by guessing,
it might be that the unpredictable performance is better than the predictable
performance of message passing systems.

>If the program can run locally, why use the network at all?

Because it might benefit from doing so (faster computation), or because it
is actually designed to run on a network (a multi-player game), but can
also work in a single computer.

>Elementary OS textbooks, as a rule, are hand waving gibberish.

Interesting rule. Is it only applicable to "elementary" OS text books?

>Consider, for example, a circular linked list containing live measurement
>data that is collected by process A, displayed by process B, and
>where stale data is simply overwitten by A. Trivial with shared memory.
>No busy waiting at all.

No synchronization (even via busy waiting)?? Of course this will not work.
Why? The answer is in those OS text books: What if B is faster than A?
It displays old data. What if B is slower that A? A over-writes data before
B has displayed them.

One way to make this program work (and might be your assumption) is to
ensure that A's execution is interleaved with B's execution (with A
_always_ starting first), but those #$@! OS designers won't let us interfer
with the scheduling ;-)

BTW, even if we could control the scheduling, the problems like: A's data
source being too slow (relative to B), or B's display device being too slow
(relative to A) will ensure that this will not work.

Isn't this obvious??

-Kamran Karimi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu