Re: Major regression on hackbench with SLUB

From: Linus Torvalds
Date: Fri Dec 07 2007 - 13:37:44 EST




On Fri, 7 Dec 2007, Linus Torvalds wrote:
>
> Can you do one run with oprofile, and see exactly where the cost is? It
> should hopefully be pretty darn obvious, considering your timing.

So I checked hackbench on my machine with oprofile, and while I'm not
seeing anything that says "ten times slower", I do see it hitting the slow
path all the time, and __slab_alloc() is 15% of the profile.

With __slab_free() being another 8-9%.

I assume that with many CPU's, those get horrendously worse, with cache
trashing.

The biggest cost of __slab_alloc() in my profile is the "slab_lock()", but
that may not be the one that causes problems in a 64-cpu setup, so it
would be good to have that verified.

Anyway, it's unclear whether the reason it goes into the slow-path because
the freelist is just always empty, or whether it hits the

... || !node_match(c, node)

case which can trigger on NUMA. That's another potential explanation of
why you'd see such a *huge* slowdown (ie if the whole node-match thing
doesn't work out, slub just never gets the fast-case at all). That said,
the number of places that actually pass a specific node to slub is very
limited, so I suspect it's not the node matching. But just disabling that
test in slab_alloc() might be one thing to test.

[ The whole node match thing is likely totally bogus. I suspect we pay
*more* in trying to match nodes than we'd ever likely pay in just
returning the wrong node for an allocation, but that's neither here nor
there ]

But yeah, I'm not entirely sure SLUB is working out.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/