Re: [patch 1/2] mm, mempool: poison elements backed by slab allocator

From: David Rientjes
Date: Fri Mar 13 2015 - 20:06:29 EST


On Thu, 12 Mar 2015, Andrew Morton wrote:

> > Mempools keep elements in a reserved pool for contexts in which
> > allocation may not be possible. When an element is allocated from the
> > reserved pool, its memory contents is the same as when it was added to
> > the reserved pool.
> >
> > Because of this, elements lack any free poisoning to detect
> > use-after-free errors.
> >
> > This patch adds free poisoning for elements backed by the slab allocator.
> > This is possible because the mempool layer knows the object size of each
> > element.
> >
> > When an element is added to the reserved pool, it is poisoned with
> > POISON_FREE. When it is removed from the reserved pool, the contents are
> > checked for POISON_FREE. If there is a mismatch, a warning is emitted to
> > the kernel log.
> >
> > This is only effective for configs with CONFIG_DEBUG_VM.
>
> At present CONFIG_DEBUG_VM is pretty lightweight (I hope) and using it
> for mempool poisoning might be inappropriately costly. Would it be
> better to tie this to something else? Either standalone or reuse some
> slab debug option, perhaps.
>

Ok, I agree. I'll use CONFIG_DEBUG_SLAB and CONFIG_SLUB_DEBUG_ON and
allow it to be enabled by slub debugging when that is enabled. It
probably doesn't make a lot of sense to do mempool poisoning without slab
poisoning.

> Did you measure the overhead btw? It might be significant with fast
> devices.
>

It's certainly costly: with a new 128-byte slab cache, allocating 64
objects took about 480 cycles longer per object to do the poison checking
and in-use poisoning on one of my 2.2GHz machines (~90 cycles/object
without CONFIG_DEBUG_VM). To do the free poisoning, it was about ~130
cycles longer per object (~140 cycles/object without CONFIG_DEBUG_VM).

For cache cold pages from the page allocator, it's more expensive,
allocating and freeing 64 pages, it's ~620 cycles longer per page and
freeing is an additional ~60 cycles/page.

Keep in mind that overhead is only incurred when the mempool alloc
function fails to allocate memory directly from the slab allocator or page
allocator in the given context and on mempool_create() to create the new
mempool.

I didn't benchmark high-order page poisoning, but that's only used by
bcache and I'm looking at that separately: allocating high-order pages
from a mempool sucks.

> > --- a/mm/mempool.c
> > +++ b/mm/mempool.c
> > @@ -16,16 +16,77 @@
> > #include <linux/blkdev.h>
> > #include <linux/writeback.h>
> >
> > +#ifdef CONFIG_DEBUG_VM
> > +static void poison_error(mempool_t *pool, void *element, size_t size,
> > + size_t byte)
> > +{
> > + const int nr = pool->curr_nr;
> > + const int start = max_t(int, byte - (BITS_PER_LONG / 8), 0);
> > + const int end = min_t(int, byte + (BITS_PER_LONG / 8), size);
> > + int i;
> > +
> > + pr_err("BUG: mempool element poison mismatch\n");
> > + pr_err("Mempool %p size %ld\n", pool, size);
> > + pr_err(" nr=%d @ %p: %s0x", nr, element, start > 0 ? "... " : "");
> > + for (i = start; i < end; i++)
> > + pr_cont("%x ", *(u8 *)(element + i));
> > + pr_cont("%s\n", end < size ? "..." : "");
> > + dump_stack();
> > +}
>
> "byte" wasn't a very useful identifier, and it's called "i" in
> check_slab_element(). Rename it to "offset" in both places?
>
> > +static void check_slab_element(mempool_t *pool, void *element)
> > +{
> > + if (pool->free == mempool_free_slab || pool->free == mempool_kfree) {
> > + size_t size = ksize(element);
> > + u8 *obj = element;
> > + size_t i;
> > +
> > + for (i = 0; i < size; i++) {
> > + u8 exp = (i < size - 1) ? POISON_FREE : POISON_END;
> > +
> > + if (obj[i] != exp) {
> > + poison_error(pool, element, size, i);
> > + return;
> > + }
> > + }
> > + memset(obj, POISON_INUSE, size);
> > + }
> > +}
>
> I question the reuse of POISON_FREE/POISON_INUSE. If this thing
> triggers, it may be hard to tell if it was due to a slab thing or to a
> mempool thing. Using a distinct poison pattern for mempool would clear
> that up?
>

Hmm, I think it would actually make it more confusing: mempools only
allocate from the reserved pool (those poisoned by this patchset) when
doing kmalloc() or kmem_cache_free() in context fails. Normally, the
reserved pool isn't used because there are free objects sitting on slab
free or partial slabs and the context is irrelevant. If slab poisoning is
enabled, they are already POISON_FREE as anticipated. We only fallback to
the reserved pool when new slab needs to be allocated and fails in the
given context, so the poison value would differ depending on where the
objects came from.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/