Re: commit 4066c33d0308f8 breaks booting under KVM

From: Christoph Lameter
Date: Mon Jun 29 2015 - 10:01:46 EST


On Fri, 26 Jun 2015, Linus Torvalds wrote:

> Does reverting it fix everything? I'll give people another day or so to see
> if they can see what's wrong, but I guess I'll just revert if no fix end up
> being forthcoming..

Reversion will only affect a special slub debug kernel option and we have
run without the static names for years. So ok to revert.


the problem here is the sequencing of SLAB bootstrap. SLAB bootstrap
requires certain kmalloc caches to be initialized in a specific sequence
and that has always caused fragility. Slab cache creation may
generate a "off slab" configuration which means a kmem_cache
will then refer to a kmem_cache from the kmalloc array that may not be
there.

Gavin: Please check your logic here. The two small caches are not always
created and they often lead to off slab configurations. It depends on the
minimum aligment required by the architecture. These also need to be
created in the proper sequence.

Note that you removed the comment that said:

/*
- * Caches that are not of the two-to-the-power-of size.
- * These have to be created immediately after the
- * earlier power of two caches
+ * "i == 2" is the "kmalloc-192" case which is the last special
+ * case for initialization and it's the point to jump to

The modification you have made does create the non power of two caches not
in sequence anymore but before all others!

Here is the new code.

void __init create_kmalloc_caches(unsigned long flags)
{
int i;

for (i = KMALLOC_LOOP_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
if (!kmalloc_caches[i]) {
kmalloc_caches[i] = create_kmalloc_cache(
kmalloc_info[i].name,
kmalloc_info[i].size,
flags);
}

/*
* "i == 2" is the "kmalloc-192" case which is the last special
* case for initialization and it's the point to jump to
* allocate the minimize size of the object. In slab allocator,
* the KMALLOC_SHIFT_LOW = 5. So, it needs to skip 2^3 and 2^4
* and go straight to allocate 2^5. If the ARCH_DMA_MINALIGN is
* defined, it may be larger than 2^5 and here is also the
* trick to skip the empty gap.
*/
if (i == 2)
i = (KMALLOC_SHIFT_LOW - 1);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/