Re: Help in DSM design

From: Larry McVoy (lm@bitmover.com)
Date: Sat Mar 04 2000 - 02:52:11 EST

Next message: Arjan van de Ven: "Re: 2.2.15pre13 bttv: CONFIG_VIDEO_MSP3400 still in the sound section"
Previous message: Rainer Mager: "APIC error interrupr on CPU"
In reply to: Ronald G. Minnich: "Re: Help in DSM design"
Next in thread: Richard B. Johnson: "Re: Help in DSM design"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

: > On Sat, 4 Mar 2000, Richard Gooch wrote:
: > > Ach! Not another DSM project :-( Don't do it. There's already a DSM
: >
: > [dsm horror story]
:
: I swore I would avoid this argument but ...
:
: look, there's DSMs and there's DSMs. And there's different types of
: applications. I've done a fair number of parallel apps and there are cases
: where you can use DSM, and cases where you can't and you use message
: passing. I can point to failures with both models.

Ron and I have been having this argument for years, I used to be a
believer, enough that I got him a pile of Sun hardware years ago to work
on this.

I'm not a believer in DSM anymore, through no fault on Ron's part.
Ron did some really cool stuff in the DSM area and he has pushed the
limits of what you can do there (though I'm not real clear if anything
ever happened as a result of all those discussions about release
consistency; Ron?)

Anyway, the big problem with DSM is performance and the real problem is
that there isn't a good answer. The DSM advocates all love to tell you
about how it solves problems, porting is easier, computers are cheaper
than people, etc. But that's all nonsense if the model takes a 200
CPU system and makes it perform like a 20 CPU SMP. Obviously, if that
were the case, people would just buy the SMP, it's cheaper and faster.
So while the DSM crowd will tell you otherwise, there are definite,
quantifiable limits about where it stops making sense.

As far as I can tell, those limits are all in the performance area.
If that's true, DSM is really only useful if it lives up to both the ease
of use and the performance promise. Not only does it not, but it can't.
Here's why: in order to get good performance, contended memory needs to
be released as soon as possible so the next guy can use it. There's the
bummer. Only the application knows when it is done with a chunk of
memory, the OS can't know, it has know why of getting that knowledge.
So the obvious answer is to make the application tell the OS (or DSM
manager, whatever) that it is done. An operation becomes

        lock();
        do some work
        unlock();

and we need to modify the unlock to tell the OS that the memory that was
just worked on is available. This is where the release consistency comes
in, you could have lock primitives that are like

        cookie = lock(address, length);
        do some work;
        unlock(cookie);

and the unlock could tell the DSM manager. Seems pretty nice, eh? At
first blush, that's what I thought too. But it all falls apart in the
real world. In the real world, granularity kills this model. What if
what you want to lock is a linked list, with chunks of the list all over
memory? Doesn't work so well, does it?

In the end, the crucial observation is that a DSM depends 100% on the
application telling it that the memory is "released" and someone else
can use it. It's a boundry that only the app knows.

The problem is that this breaks the ease of use claim. You now have
to go through all your apps and teach them about the new memory model.
Yuck. It's much easier to have them just pass messages, the boundries
are unambiguous.

All the DSM people can yell all they want, but we have 10-20 years of
research in this area and so far, nothing has shown any significant
results.

Conclusion: if you like DSM, that's fine, but be aware that many have
tried and all have failed. And above all, don't expect your DSM stuff
to be a welcome addition to the kernel until it has moved out of the
research project stage. Linux, God bless it, is not a research project,
people run their businesses on it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Arjan van de Ven: "Re: 2.2.15pre13 bttv: CONFIG_VIDEO_MSP3400 still in the sound section"
Previous message: Rainer Mager: "APIC error interrupr on CPU"
In reply to: Ronald G. Minnich: "Re: Help in DSM design"
Next in thread: Richard B. Johnson: "Re: Help in DSM design"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Mar 07 2000 - 21:00:16 EST