[RFC 0/4] Object reclaim via the slab allocator V1
From: Christoph Lameter
Date: Mon Jun 19 2006 - 14:47:05 EST
We currently have a set of problem with slab reclaim:
1. On a NUMA system there are excessive cross node accesses.
Locks are taken remotely which leads to a significant slowdown
if concurrent reclaim is performeed on the dcache/icache from
a number of nodes.
2. We need to free an excessive number of elements from the LRU
in order to reclaim enough pages from the slab.
3. After reclaim we have a large number of sparsely used slab
objects.
The fundamental problem in the current reclaim approaches with the
icache/dcache is that the reclaim is LRU and object based. Multiple
objects can share the same slab. So removing one object from a slab
just removes some cache information that may have been useful later
but may not give us what we want: More free pages.
I propose that we replace the LRU based object management by using
the slab oriented lists in the slab allocator itself for reclaim.
The slab allocator already has references to all pages used by the
dcache and icache. It has awareness of which objects are located
in a certain slab and therefore would be able to free specific
objects that would make pages freeable if the slab knew
their state.
In order to allow the slab allocator to know about an objects
state we add another flag SLAB_RECLAIM. SLAB_RECLAIM means that
the following is true of a slab:
1. The object contains a reference counter of atomic_t as the
first element that follows the following conventions:
Counter = 0 -> Object is being freed or is free.
Counter = 1 -> Object is not used and contains cache information.
The object may be freed.
Counter > 1 -> Object is in use and cannot be freed.
2. A destructor was provided during kmem_cache_create().
If SLAB_DTOR_FREE is passed in the flags of the destructor
then a best effort attempt will be made to free that object.
Memory can then be reclaimed from a slab by calling
kmem_cache_reclaim(kmem_cache_t *cachep, unsigned long page)
kmem_cache_reclaim returns an error code or the number of pages reclaimed.
The reclaim works by walking through the lists of full and partially
allocated slabs. We begin at the end of thet fully allocated slabs because
these slabs have been around for a long time (This basically preserves the LRU
lists to some extend).
For slab we check all the objects in the slab. If all object have
a refcount of one then we free all the objects and return the pages of the
object to the page allocator.
So we could just free all the truly freeable slabs and leave the rest alone
in order to preserve the cached information.
However, with that a single object with a higher reference count would prevent
the reclaim of all other unused objects in that slab. So we free a couple of
those unused elements in each slab that is fully allocated.
This leads to a slow circulation of unused elements in fully allocated slabs.
So over time it is likely that continually active elements accumulate in slabs
that are in long term use which increases the allocation density of the slab
objects.
The idea of how to do this reclaim was developed a couple of month ago in a
meeting of Nick Piggin, Dave Chinner and me in Melbourne.
Note that this patch is only at the idea stage for now. It has not been tested
and no slab uses the new reclaim functionality until now. In order for the new
scheme to be useful we need to remove the LRU lists from the dcache and icache
and replace their reclaim algorithm.
That in turn will allow a lot of reduction of complexity in both. However, I
am not that familiar with it and so I thought it may be best to first discuss
the idea and maybe get some help with either of those.
The approach could be extended to allow the freeing of slab pages with
one single object by allowing the use of SLAB_DTOR_FREE on objects with higher
refcount. If one thinks further along those lines then we will get the ability
to migrate slab objects.
The current patchset contains 4 patches (the first two could be reviewed
for kernel inclusion):
1. Freelist code consolidation
We will need to drain freelists repeatedly so consolidate existing code.
2. Mon destructor removal.
We will change the nature of destructors. USB MON uses an empty
destructor. Get that out of the way.
3. Make destructors check for SLAB_DTOR_DESTROY
Only run the existing destructors if SLAB_DTOR_DESTROY is specified.
4. The slab reclaim functionality
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/