Re: [PATCH v3 3/4] limit nr_dentries per superblock

From: Dave Chinner
Date: Mon Aug 15 2011 - 22:13:12 EST

On Mon, Aug 15, 2011 at 02:14:39PM +0300, Pekka Enberg wrote:
> Hi Pavel,
> On Mon, Aug 15, 2011 at 2:05 PM, Pavel Emelyanov <xemul@xxxxxxxxxxxxx> wrote:
> > This will make sense, since the kernel memory management per-cgroup is one of the
> > things we'd live to have, but this particular idea will definitely not work in case
> > we keep the containers' files on one partition keeping each container in its own
> > chroot environment.
> And you want a per-container dcache limit? Will the containers share
> the same superblock?

Yes, and that's one of the problems with the "arbitrary container"
approach to controlling the dentry cache size. Arbitrary containers
don't map easily to predictable and scalable LRU and reclaim
implementations. Hence right now the container scope is limited to

> Couldn't you simply do per-container "struct
> kmem_accounted_cache" in struct superblock?

Probably could do it that way, but it's still not really and
integrated solution. What we'll end up with is this LRU structure:

struct lru_node {
struct list_head lru;
spinlock_t lock;
long nr_items;
} ____cacheline_aligned_in_smp;

struct lru {
struct kmem_accounted_cache *cache;
struct lru_node lru_node[MAX_NUMNODES];
nodemask_t active_nodes;
int (*isolate_item)(struct list_head *item);
int (*dispose)(struct list_head *list);

Where the only thing that the lru->cache is used for is getting the
number of items allocated to the cache. Seems relatively pointless
to make that statistic abstraction for just a single value that we
can get via a simple per-cpu counter...

Then, when you consider SLUB has this structure for every individual
slab cache:

struct kmem_cache_node {
spinlock_t list_lock; /* Protect partial list and nr_partial */
unsigned long nr_partial;
struct list_head partial;
atomic_long_t nr_slabs;
atomic_long_t total_objects;
struct list_head full;

you can see why tight integration of the per-node LRU infrastructure
is appealing - there's no unnecessary duplication and the accounting
is done in the right spot. It also means there is only one shrinker
implmentation for all slabs, with a couple of simple per-slab
callbacks for isolating objects for disposal and then to dispose of
them. This would mean that most slab caches that use shrinkers would
no longer need to implement their own LRU, get LRU scalability and
node-aware reclaim for free, have built in size limits, etc.

And FWIW, integrating the LRU shrinker mechanism into the slab cache
also provides the mechanisms needed for capping the size of the
cache as well as slab defragmentation. Much smarter things can be
done when you know both the age and the locality of objects. e.g.
there's no point preventing allocation from a slab due to maximum
object count limitations if there are partial pages in the slab
cache because the allocation can be done without increasing memory


Dave Chinner
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at