Re: Remote fork() and Parallel Programming

yodaiken@chelm.cs.nmt.edu
Sun, 14 Jun 1998 10:49:06 -0600


On Sun, Jun 14, 1998 at 04:58:35PM +0330, mshar@vax.ipm.ac.ir wrote:
> I hope by "grey haired architects" you didn't mean me. I never said your

Don't think so.

> *) Programs should employ the new services with minimal change in the source
> code. They should not care about things like process migration. I wonder

So this is a fundamental problem in distributed system design and explains
why you have the wrong opinion about DSM as well. The OS has a simple
algorithm to schedule tasks on a single machine, but nobody has
convincingly proposed a good general purpose algorithm for distributed
task scheduling. Similarly for virtual memory and DSM. Furthermore,
time tradeoffs are different between distributed and unified systems.
The cost of a couple of extra context switches can be a significant
factor in a unified system, but DSM/process migration has a forced
communication overhead that can easily absorb an extra context switch.
Where is the gain from incorporating in the kernel? As a general
rule, something should be in the kernel only if there is a
compelling reason.

> how you would like it if an operation like simply opening a file required
> the involvemnet of the programmer in some kernel-level cache managemnet
> decision makings. The programmer should not care about such things: One
> should learn a bit from past experience.

Obviously, one can have libraries that take care of the details.

> >Things which are bad:
> > . Distributed shared memory. It has no failure model [...]
>
> Replication.

x[0] = 1;
if( remote_fork())
while(x[0]) == 1); /*where x points to distributed shared memory */
else x[0] = 0;

Works one way for real shared memory, another way for DSM. How do
you fix it?
Answer: memory channel. DSM on standard networks cannot work.

Or worse:
P1:
look through buffer for structures marked free
put new data in free structures

P2:
look through buffer for structures marked full
consume and mark free

This is an excellent mechanism on real shared memory and works terribly
on DSM. The problem is that you want to advertise something you can't
deliver.

> > . Remote procedure calls. RPCs block. [...]
>
> RPC is useful for increasing the transparency. Considering that latency will
> probably be the most important factor in performance, I wonder how you say
> RPC is any slower. You have to setup a network connection _no_matter_ the
> programming model and mechanism. RPC will at most add a bit of computational
> penalty for handling arguments, but RPC is much eaiser to use for the
> programmer, as they preserve most of the syntax and semantics of ordinary
> procedure calls.

The question is whether it is good to give the programmer an illusion that
the OS cannot sustain.
>
> > . Process migration. Sounds good, but is way too costly to be of any
> > use. [...]
>
> Disagree. I am so happy many OS designers did not listen to those (like
> Mac-OS people) who said pre-emptive process scheduling is just an overhead
> that only makes the OS more complicated without being of any use to most
> programs.

Come on, that's a weak argument. Try virtual machines as a counter
example and one that is much closer to what you propose.

>
> Process migration is the best way for efficient use of a cluster's
> resources. Isn't this efficient resource-usage one of the main goals of

Only if the OS can properly make the cost tradeoff calculation. How
does it do that?

> > Messages also work well in a cluster because they give you positive
> > notification that the work is done. If you try and use distributed
> > shared memory, you end up using messages anyway. Consider a producer
> > consumer problem in DSM. The producer puts the data in memory.
> > How does the consumer know that the data is ready? Well, it has to
> > get notified. What is that? That's a message. How do you do a
> > message in DSM?
>
> Synchronization problems have been investigated by "grey haired architects"
> before you and I were born. Read some OS books.

You are avoiding the question. DSM turns out to be messages with some
extra junk on top because most networks support messages, not shared
memory.

-- 

--------------------------------- Victor Yodaiken Department of Computer Science New Mexico Institute of Mining and Technology Socorro NM 87801 Homepage http://www.cs.nmt.edu/~yodaiken PowerPC Linux page http://linuxppc.cs.nmt.edu Real-Time Page http://rtlinux.org

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu