Re: kernel BUG at include/linux/list.h:164!

From: Hugh Dickins
Date: Sun May 02 2004 - 03:24:17 EST


On Sat, 1 May 2004, Konstantin Kletschke wrote:
>
> May 1 12:03:07 kermit kernel: kernel BUG at include/linux/list.h:164!
> May 1 12:03:07 kermit kernel: invalid operand: 0000 [#1]
> May 1 12:03:07 kermit kernel: PREEMPT
> May 1 12:03:07 kermit kernel: CPU: 0
> May 1 12:03:07 kermit kernel: EIP: 0060:[exit_rmap+237/272] Not tainted VLI
> May 1 12:03:07 kermit kernel: EFLAGS: 00010283 (2.6.6-rc2-mm1)
....
> 2.6.6-rc2-mm1 with devmapper udm1 patch
>
> Is this dump due to udma1 or not and the error is fixed now?

Probably not - I don't know this udma1 patch, but the error above
definitely occurs in my rmap area. Could be due to slab corruption
from somewhere else, but that would be a poor assumption to make -
though neither my testing nor anyone else has seen the same (yet).

I imagine you're limited in the experiments you can do on that server,
and wouldn't want to run it with CONFIG_SLAB_DEBUG=y, which might show
up unrelated problems - interesting for us, but troublesome for you.

You're UP but PREEMPT on. Hmm, there's a suspicious atomic_read
in exit_rmap: once upon a time I convinced myself it was good, but
it looks unsafe to me now - could cause a double free of an anonmm,
which would lead to your BUG if reallocated in between.

Not so likely for me to feel this is definitely the answer, but
likely enough to be well worth trying. Please try patch below -
simple enough not to make your case worse anyway - thanks.

Hugh

--- 2.6.6-rc2-mm1/mm/rmap.c 2004-04-26 12:39:46.000000000 +0100
+++ linux/mm/rmap.c 2004-05-02 08:43:38.088319696 +0100
@@ -103,6 +103,7 @@ void exit_rmap(struct mm_struct *mm)
{
struct anonmm *anonmm = mm->anonmm;
struct anonmm *anonhd = anonmm->head;
+ int anonhd_count;

mm->anonmm = NULL;
spin_lock(&anonhd->lock);
@@ -114,8 +115,9 @@ void exit_rmap(struct mm_struct *mm)
if (atomic_dec_and_test(&anonhd->count))
BUG();
}
+ anonhd_count = atomic_read(&anonhd->count);
spin_unlock(&anonhd->lock);
- if (atomic_read(&anonhd->count) == 1) {
+ if (anonhd_count == 1) {
BUG_ON(anonhd->mm);
BUG_ON(!list_empty(&anonhd->list));
kmem_cache_free(anonmm_cachep, anonhd);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/