Re: Fw: [PATCH] NUMA Slab Allocator

From: Martin J. Bligh
Date: Wed Mar 16 2005 - 14:03:40 EST


> Do you have profile data from your modification? Which percentage of the allocations is node-local, which percentage is from foreign nodes? Preferably per-cache. It shouldn't be difficult to add statistics counters to your patch.
> And: Can you estaimate which percentage is really accessed node-local and which percentage are long-living structures that are accessed from all cpus in the system?
> I had discussions with guys from IBM and SGI regarding a numa allocator, and we decided that we need profile data before we can decide if we need one:
> - A node-local allocator reduces the inter-node traffic, because the callers get node-local memory
> - A node-local allocator increases the inter-node traffic, because objects that are kfree'd on the wrong node must be returned to their home node.

One of the big problems is that much of the slab data really is more global
(ie dentry, inodes, etc). Some of it is more localized (typically the
kmalloc style stuff). I can't really generate any data easily, as most
of my NUMA boxes are either small Opterons / midsized PPC64, which have
a fairly low NUMA factor, or large ia32, which only has kernel mem on
node 0 ;-(

> IIRC the conclusion from our discussion was, that there are at least four possible implementations:
> - your version
> - Add a second per-cpu array for off-node allocations. __cache_free batches, free_block then returns. Global spinlock or per-node spinlock. A patch with a global spinlock is in
> http://www.colorfullife.com/~manfred/Linux-kernel/slab/patch-slab-numa-2.5.66
> per-node spinlocks would require a restructuring of free_block.
> - Add per-node array for each cpu for wrong node allocations. Allows very fast batch return: each array contains memory just from one node, usefull if per-node spinlocks are used.
> - do nothing. Least overhead within slab.
>
> I'm fairly certains that "do nothing" is the right answer for some caches.
> For example the dentry-cache: The object lifetime is seconds to minutes,
> the objects are stored in a global hashtable. They will be touched from
> all cpus in the system, thus guaranteeing that kmem_cache_alloc returns
> node-local memory won't help. But the added overhead within slab.c will hurt.

That'd be my inclination .... but OTOH, we do that for pagecache OK. Dunno,
I'm torn. Depends if there's locality on the file access or not, I guess.
Is there any *harm* in doing it node local .... perhaps creating a node
mem pressure imbalance (OTOH, there's loads of stuff that does that anyway ;-))

The other thing that needs serious thought is how we balance reclaim pressure.

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/