Page Colouring (was: 2.6.0 Huge pages not working as expected)

From: Anton Ertl
Date: Fri Dec 26 2003 - 18:07:04 EST


Linus Torvalds <torvalds@xxxxxxxx> writes:
>And the thing is, using huge pages will mean that the pages are 1:1
>mapped, and thus get "perfectly" cache-coloured, while the anonymous mmap
>will give you random placement.
>
>And what you are seeing is likely the fact that random placement is
>guaranteed to not have any worst-case behaviour.

You probably just put the "not" in the wrong place, but just in case
you meant it: Random replacement does not give such a guarantee. You
can get the same worst-case behaviour as with page colouring, since
you can get the same mapping. It's just unlikely.

>In particular, using a pure power-of-two stride means that you are
>limiting your cache to a certain subset of the full result with the
>perfect coloring.
>
>This, btw, is why I don't like page coloring: it does give nicely
>reproducible results, but it does not necessarily improve performance.

Well, even if, on average, it has no performance impact,
reproducibility is a good reason to like it. Is it good enough to
implement it? I'll leave that to you.

However, the main question I want to look at here is: Does it improve
performance, on average? I think it does, because of spatial
locality.

I.e., it is more frequent that you access stuff spatially close to a
recent access (where page colouring has a 0 chance of conflicting,
whereas random mapping has a non-zero chance of conflicting), than to
access stuff that is exactly a multiple of the cache-size away (which
is the worst case for page colouring). Fortunately, set-associative
caches in the machines I use most of the time reduce the impact of the
missing page colouring in Linux.

The most frequent case where random mapping gives better performance
than page colouring is having several sequential passes over a block
that is larger than the cache; but that's just a case where caches
perform badly on principle, and cache designs that are usually
considered better (higher associativity, LRU replacement) perform
worse in this case.

OTOH, for cases where the block does barely fits in the cache, page
colouring performs quite a bit better. This particular access pattern
can be more frequent than one might expect from other statistics, due
to software optimizations like cache blocking.

One additional mechanism in which page colouring can help performance
is by providing a predictable and understandable performance model to
programmers. Caches are bad enough to analyse, one need not
complicate the issue with unpredictable effects of random
virtual-to-physical translation.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@xxxxxxxxxxxxxxxxxxxxxxxxxx Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/