Re: Some very thought-provoking ideas about OS architecture.

Jim Gettys (jg@pa.dec.com)
Tue, 22 Jun 1999 08:26:14 -0700

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Ingo Molnar: "Re: 2.3.7: where are the gains?"
Previous message: Ron Flory: "Compile error in 2.2.10-ac3 (shm.c)"

> Sender: owner-linux-kernel@vger.rutgers.edu
> From: "Jeffrey B. Siegal" <jbs@quiotix.com>
> Date: Mon, 21 Jun 1999 14:39:20 -0700
> To: Jim Gettys <jg@pa.dec.com>
> Cc: linux-kernel@vger.rutgers.edu, torvalds@transmeta.com
> Subject: Re: Some very thought-provoking ideas about OS architecture.
> -----
> Jim Gettys wrote:
> > o The X server got much faster when the data structures were shrunk by
> > 60% 8 or so years ago (between X11R2 and X11R3 and X11R4) (trading a few
> > instructions for much more compact data structure representation).
>
> I'm not saying I disagree with your conclusion about touching less
> memory (certainly today this is very true), but there was so much other
> work going on at that time to optimize the X server at that time (some
> of which I did, some of which had to do with touching memory less and
> some of which didn't) that I'm not sure you can really say that the
> speed improvements were the direct result of touching less memory as
> opposed to simply executing less code, or touching the frame buffer
> (often slower than memory) less. CPU's were pretty slow in those days
> (compared to the relative speed of today's CPU's and memory) and it
> sometimes made sense even to touch memory *more* using lookup tables and
> such in order to save instructions.
>

Very true: but the fact is that the server maintained compatibility while
using less than 40% of the dynamic memory for its internal data structures
while getting much faster. Certainly it did not seem to hurt performance
then to use alot less memory. How much of this was due to more compact
representation, and how much to recoding is an interesting question, but
one we'll never be able to answer. Machines of that day were about 10MIP
class systems.

More interesting may be Keith's recent results: while executing more
instructions, but in a much more compact code base, he's seeing >10% kinds
of speedups on the frame buffer coder (excluding text, where he's
not worrying about speed; even so, dumb simple code is getting him
50K characters/second or more). Both the old code and the new code
should be I/O bus cycle bound. Part of the speedup may be that he's
able to use 64 bit constructs in the compiler, but part may be due to
better instruction cache behavior. Even more interesting may be real
applications, rather than small micro-benchmarks, as the footprint in
the I cache for the X server is very much smaller, so the context
switch overhead due to cache refill between client and server should
be much less. But right now, we only have micro-benchmark data, so
I don't know if this intuition is correct.

It is clearly a great example that one must now think much more on overall
system behavior, rather than focussing on instructions executed. Better
performance tools are needed to help redirect programmer's intution about
performance, often set on what are now antique systems. Trading memory
for fewer instructions will often be a loser on current systems.

- Jim

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ingo Molnar: "Re: 2.3.7: where are the gains?"
Previous message: Ron Flory: "Compile error in 2.2.10-ac3 (shm.c)"