2.1.125: Patch for kmem_cache_alloc infinite looping

storner@image.dk
Wed, 21 Oct 1998 14:47:08 +0200


A couple of days ago I reported that I was seeing sporadic lockups
on 2.1.125 with a network driver that was allocation buffers with
GFP_DMA|GFP_ATOMIC.

I've now tracked the lockups to __kmem_cache_alloc(), and I believe
there is a bug there. Let me explain what I see:

alloc_skb() calls kmem_cache_alloc(), passing the GFP_DMA|GFP_ATOMIC
flags. This in turn calls __kmem_cache_alloc with the same flags,
although now they are clled SLAB_DMA and SLAB_ATOMIC. Here, around
line 1370 (the printk's are mine - see explanation below):

try_again:
/* if (flags & SLAB_DMA) printk("b"); */
/* Get slab alloc is to come from. */
slabp = cachep->c_freep;

/* Magic is a sanity check _and_ says if we need a new slab. */
if (slabp->s_magic != SLAB_MAGIC_ALLOC)
goto alloc_new_slab;
/* DMA requests are 'rare' - keep out of the critical path. */
if (flags & SLAB_DMA)
goto search_dma;
[snip]
search_dma:
/* if (flags & SLAB_DMA) printk("g"); */
if (slabp->s_dma || (slabp = kmem_cache_search_dma(cachep))!=kmem_slab_end(cachep))
goto try_again_dma;
alloc_new_slab:
/* if (flags & SLAB_DMA) printk("h"); */
/* Either out of slabs, or magic number corruption. */
if (slabp == kmem_slab_end(cachep)) {
/* Need a new slab. Release the lock before calling kmem_cache_grow().
* This allows objs to be released back into the cache while growing.
*/
spin_unlock_irqrestore(&cachep->c_spinlock, save_flags);
/* if (flags & SLAB_DMA) printk("i"); */
if (kmem_cache_grow(cachep, flags)) {
/* if (flags & SLAB_DMA) printk("j"); */
/* Someone may have stolen our objs. Doesn't matter, we'll
* just come back here again.
*/
spin_lock_irq(&cachep->c_spinlock);
goto try_again;
}
/* Couldn't grow, but some objs may have been freed. */
spin_lock_irq(&cachep->c_spinlock);
if (cachep->c_freep != kmem_slab_end(cachep)) goto try_again;
} else {
/* Very serious error - maybe panic() here? */
kmem_report_alloc_err("Bad slab magic (corrupt)", cachep);
}
spin_unlock_irqrestore(&cachep->c_spinlock, save_flags);
err_exit:

When the lockup happens, I get an infinite stream of "bghi". So what
happens is that __kmem_cache_alloc() figures out this is a request for
DMA memory; it eventually determines it needs a new slab to fulfill
the request, but kmem_cache_grow() returns NULL. So we go into the
code following point "i" above:

/* Couldn't grow, but some objs may have been freed. */
if (cachep->c_freep != kmem_slab_end(cachep)) goto try_again;

And then it starts the routine all over again, but NOTHING has changed
- we're in an interrupt handler, so we cannot swap things out or do
other stuff to get free memory. Result: infinite loop.

I propose this patch should go into 2.1.126:

--- linux/mm/slab.c.orig Wed Oct 21 14:26:12 1998
+++ linux/mm/slab.c Wed Oct 21 14:26:58 1998
@@ -1444,8 +1444,10 @@
}
/* Couldn't grow, but some objs may have been freed. */
spin_lock_irq(&cachep->c_spinlock);
- if (cachep->c_freep != kmem_slab_end(cachep))
- goto try_again;
+ if (cachep->c_freep != kmem_slab_end(cachep)) {
+ if ((flags & SLAB_ATOMIC) == 0)
+ goto try_again;
+ }
} else {
/* Very serious error - maybe panic() here? */
kmem_report_alloc_err("Bad slab magic (corrupt)", cachep);

-- 
Henrik Storner <storner@image.dk>

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/