Re: slab needs more agressive trimming

Colin Plumb (colin@nyx.net)
Sun, 21 Sep 1997 18:22:22 -0600 (MDT)


The Lord Linus, the Warrior of the Wastelands, the Ayatollah of Rock-and-Roll-a
wrote:
> The whole idea of trying to keep unused cached objects around only to
> avoid initializing them when we're low of memory is just extremely
> broken, imnsho. We want to avoid having to do IO in order to free
> memory, and that definitely means that any kmem caches need to flush out
> completely - there is no point in trickling them out as far as I can
> see.

Indeed, the benefit to not freeing them immediately is very very small.
Such unused slab pages are essentially *free*, and should be treated
as such. They are a cache, but to a very minor degree; the cost of
recreating the cached data is a very small number of clock cycles and
no I/O, so they should be the first to go under any circumstances.

There is a secondary benefit to per-slab free lists in avoiding the
global page allocation lock, but it's still not unreasonable for a page
colouring scheme to hunt among the slabs looking for a suitably
coloured page to grab.

(I just wish there was some way to predict how important a given process
is. How much use is this page going to get? If it's the strerror
table used just once before a program exits, it's not very interesting.)

Mini-tutorial:

Page colouring is a system which tries to use knowledge of the way that
different parts of memory collide in the processor cache to choose the
part of (physical) memory to allocate to a page request so that it
will not conflict with other useful data in the cache.

In a direct-mapped cache, the cheap and crappy sort of secondary cache
that's universal in '486 systems and I think most Pentiums, each byte
in memory has precisely one slot in cache where it can go. There's
more memory than cache, so there are many bytes in memory for each slot
in the cache.

Now, imagine that your process is running some code, that works on some
data, and both code and data happen to want to use the same slot in
the cache. You're going to have cache misses all over the place,
and the system will perform terribly.

Now, the user chooses where in a page their code and data will go,
but their process takes up a certain number of pages of virtual memory,
and which physical pages will be allocated to provide storage for those
addresses is up to the OS.

Now, imagine painting a rainbow across the cache, and echoing thse colours
in the memory that contends for that part of the cache. You want your
program to use as many different colours as possible. To a lesser
degree, you want different programs to use different colours.

The system currently chooses arbitrarily, and the result is pretty random.
But every now and then it'll have bad luck in choosing the colour for
a particularly popular page and a particular process's performance will
be lousy. And if the page is popular, it's never going to get swapped
out or otherwise moved, so it'll *stay* lousy as long as the process
continues to run.

People who leave big compute-intensive jobs running overnight and
find that occasionally, for inexplicable reasons, the job takes
24 hours instead of 12 get unhappy.

There is a fix, which is to choose the page colours more carefully so
as to do better than random. Here's a random series of digits
(generated by xxd /dev/urandom | cut -b 10-50 | tr -d 'a-f '):
480907408160940803631620559180611543916104440390685041452454306268463278

You motice how it starts out with a lot of 0s, and it's 24 characters
before a 5 appears. A page colouring scheme attempts to produce
012345678901234567890123456789, or something similar. Different processes
start at different positions in the cycle, so they tend to avoid each other,
too. This is more uniform than random.

Since this is a performance-enhancing tweak, you have to steer a careful
course between being to sloppy to benefit and spending too much time
being perfectionist about getting the right pages. It requires careful
tuning, but has the advantage that every piece of software run on the
machine starts running faster.

In fact, the lack of page-colouring in most operating systems makes
benchmarking a nightmare. While it sometimes cuts the performance
in half, 10% is a very common fluctuation between one run of a program
to another.

that much fluctuation makes it very hard to tell if your attempted
performance tweak to some software makde a 5% difference or not.
Annoyingly, by leaving pages in memory after a program has run,
they're in the same places with the same colours and so re-running
the same software immediately will usually give very similar results.
(Only the colouring of the data pages changes.) You end up having
to copy the executable around to get the page colouring to change.

-- 
	-Colin