Re: [patch] SLQB slab allocator (try 2)

From: Christoph Lameter
Date: Wed Feb 04 2009 - 11:14:19 EST


On Tue, 3 Feb 2009, Pekka Enberg wrote:

> Anyway, even if we did end up going forward with SLQB, it would sure as hell
> be less painful if we understood the reasons behind it.

The reasons may depend on hardware contingencies like TLB handling
overhead and various inefficiencies that depend on the exact processor
model. Also the type of applications you want to run. Some of the IA64
heritage of SLUB may be seen in the results of these tests. Note that PPC
and IA64 have larger page sizes (which results in SLUB being able to put
more objects into an order 0 page) and higher penalties for TLB handling.
The initial justification for SLUB were Mel's results on IA64 that showed
a 5-10% increase in performance through SLUB.

In my current position we need to run extremely low latency code in user
space and want to avoid any disturbance by kernel code interrupting user
space. My main concern for my current work context is that switching to
SLQB will bring back the old cache cleaning problems and introduce
latencies for our user space applications. Otherwise I am on x86 now so
the TLB issues are less of a concern for me now.

In general it may be better to have a larger selection of slab allocators.
I think this is no problem as long as we have motivated people that
maintain these. Nick seems to be very motivated at this point. So lets
merge SLQB as soon as we can and expose it to a wider audience so that it
can mature. And people can have more fun running one against the other
refining these more and more.

There are still two major things that I hope will happen soon to clean up stuff in
the slab allocators:

1. The introduction of a per cpu allocator.

This is important to optimize the fastpaths. The cpu allocator will allow
us to get rid of the arrays indexes by NR_CPUS and allow operations that
are atomic wrt. interrupts. The lookup of the kmem_cache_cpu struct
address will no longer be necessary.

2. Alloc/free without disabling interrupts.

Matthieu has written an early implementation of allocation functions that
do not require interrupt disable/enable. It seems that these are right now
the major cause of latency in the fast paths. Andi has stated that the
interrupt enable/disable has been optimized in recent releases of new
processors. The overhead may be due to the flags being pushed onto the
stack and retrieved later. Mathieus implementation can be made more
elegant if atomic per cpu ops are available. This could significantly
increase the speed of the fast paths in the allocators (may be a challenge
to SLAB and SLQB since they need to update a counter and a pointer but its
straightforward in SLUB).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/