Very true: but the fact is that the server maintained compatibility while
using less than 40% of the dynamic memory for its internal data structures
while getting much faster. Certainly it did not seem to hurt performance
then to use alot less memory. How much of this was due to more compact
representation, and how much to recoding is an interesting question, but
one we'll never be able to answer. Machines of that day were about 10MIP
class systems.
More interesting may be Keith's recent results: while executing more
instructions, but in a much more compact code base, he's seeing >10% kinds
of speedups on the frame buffer coder (excluding text, where he's
not worrying about speed; even so, dumb simple code is getting him
50K characters/second or more). Both the old code and the new code
should be I/O bus cycle bound. Part of the speedup may be that he's
able to use 64 bit constructs in the compiler, but part may be due to
better instruction cache behavior. Even more interesting may be real
applications, rather than small micro-benchmarks, as the footprint in
the I cache for the X server is very much smaller, so the context
switch overhead due to cache refill between client and server should
be much less. But right now, we only have micro-benchmark data, so
I don't know if this intuition is correct.
It is clearly a great example that one must now think much more on overall
system behavior, rather than focussing on instructions executed. Better
performance tools are needed to help redirect programmer's intution about
performance, often set on what are now antique systems. Trading memory
for fewer instructions will often be a loser on current systems.
- Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/