Re: Errors and later panics in 2.6.0-test11.

From: Linus Torvalds
Date: Wed Dec 03 2003 - 12:26:50 EST




On Wed, 3 Dec 2003, David Martínez Moreno wrote:
>
> I've just rebooted about six hours ago, and it's giving panics elsewhere:
>
> [...]
> Ending XFS recovery on filesystem: md0 (dev: md0)
> b44: eth0: Link is down.
> b44: eth0: Link is up at 100 Mbps, full duplex.
> b44: eth0: Flow control is on for TX and on for RX.
> eth0: no IPv6 routers present
> Unable to handle kernel paging request at virtual address 00100104

That's the LIST_POISON stuff: 00100100 is the "bad list pointer". Somebody
tried to remove a page twice.

Doesn't mean a lot - if your "struct page" got corrupted, anything can
happen. Quite possibly it's a double free.

> I can rebuild the Debian mirror for not using the RAID and using the SATA
> disks separately, but will be tomorrow, it's a lot of space to move, and I
> need remote intervention.
>
> Anyway I'd love to know before doing if it will be useful, looking at what
> Jens has said just ten minutes ago about RAIDs 0/5. Will it help to you? Say
> so and I'll go for it.

It might be more useful to leave it as RAID0, if you're willing to try out
patches to try to debug this. The slab-debugging thing I sent out earlier
is one such patch (but may well cause out-of-memory problems under load),
and possibly the atomic-decrement checker patch (appended). And maybe Jens
and Neil can come up with something..

Linus

----
===== arch/i386/lib/dec_and_lock.c 1.1 vs edited =====
--- 1.1/arch/i386/lib/dec_and_lock.c Tue Feb 5 09:40:21 2002
+++ edited/arch/i386/lib/dec_and_lock.c Sun Nov 2 09:07:53 2003
@@ -19,7 +19,7 @@
counter = atomic_read(atomic);
newcount = counter-1;

- if (!newcount)
+ if (newcount <= 0)
goto slow_path;

asm volatile("lock; cmpxchgl %1,%2"
===== include/asm-i386/atomic.h 1.5 vs edited =====
--- 1.5/include/asm-i386/atomic.h Mon Aug 18 19:46:23 2003
+++ edited/include/asm-i386/atomic.h Sun Nov 2 09:40:42 2003
@@ -2,6 +2,8 @@
#define __ARCH_I386_ATOMIC__

#include <linux/config.h>
+#include <linux/kernel.h>
+#include <asm/bug.h>

/*
* Atomic operations that C can't guarantee us. Useful for
@@ -136,12 +138,17 @@
*/
static __inline__ int atomic_dec_and_test(atomic_t *v)
{
- unsigned char c;
+ static int count = 2;
+ unsigned char c, neg;

__asm__ __volatile__(
- LOCK "decl %0; sete %1"
- :"=m" (v->counter), "=qm" (c)
+ LOCK "decl %0; sete %1; sets %2"
+ :"=m" (v->counter), "=qm" (c), "=qm" (neg)
:"m" (v->counter) : "memory");
+ if (count && neg) {
+ count--;
+ WARN_ON(neg);
+ }
return c != 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/