Ok, I found this.
Once more, it was the slab stuff that broke badly. I'm going to
consider just throwing out the slabs for v2.2 unless somebody is willing
to stand up and fix it - the multi-page allocation stuff just breaks too
horribly.
In this case, TCP wanted to allocate a single skb, and due to slabs this
got turned into a multi-page request even though it fit perfectly fine
into one page. Thus a critical allocation could fail, and the TCP layer
started looping - and kswapd could never even try to fix it up because
the TCP code held the kernel lock.
I'm going to fix this particular case even with slabs, but this isn't
the first time the slabs "optimizations" have just broken code that used
to work fine by making it a high-order allocation. Essentially, the
slabsified kmalloc() is just a lot more fragile than the original
kmalloc() was.
(This also shows a particularly nasty inefficiency - the TCP code
explicitly tries to have a "good" MTU for loopback, and it's meant to
fit in a single page. The slab code makes it fail miserably in that
objective).
All sane architectures are moving to at least 2-way caches and the good
ones are 4-way or more. As such, slabs optimizes for the wrong case.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html