Re: SLUB regression in current Linus

From: Linus Torvalds
Date: Tue May 24 2011 - 19:04:03 EST


On Tue, May 24, 2011 at 4:52 AM, James Morris <jmorris@xxxxxxxxx> wrote:
>
> Reverting the patch appears to fix the hang for me, although I'm not sure
> what the actual problem is.
>
> This is on a quad-core Opteron (1352). Let me know if you need any further
> info.

That whole "deactivate_slab()" + "c->page = NULL" that that patch does
looks bogus.

Look at __slab_alloc: we have:


page = c->page;
if (!page)
goto new_slab;

slab_lock(page);
if (unlikely(!node_match(c, node)))
goto another_slab;

and let's assume we have two users racing on that "c->page". The
"slab_lock()" is going to work for one of them, right?

Ok, so the one it works for will then hit

if (kmem_cache_debug(s))
goto debug;

and thus get to the new "deactivate_slab(s,c) + c->page = NULL" and
then unlock the page.

In the meantime, the one that wasn't able to lock the page will now go
forward, but will not have "node_match()" any more, so it does that
"goto another_slab".

Which does "deactivate_slab(s,c)" again, and now c->page is NULL, so
that totally breaks.

What am I missing?

That patch seems to be just broken piece-of-s%^!

Christoph, Pekka, please tell me why I shouldn't immediately revert
it. What am I missing?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/