Re: Bug report : reproducible memory bug (hardware failure, sorry)

From: Mathieu Desnoyers
Date: Mon Jan 29 2007 - 22:33:24 EST


* Martin J. Bligh (mbligh@xxxxxxxxxx) wrote:
> Mathieu Desnoyers wrote:
> >Hi,
> >
> >Trying to build cross-compilers (or kernels) on a 2-way x86_64 (amd64) with
> >make -j3 triggers the following OOPS after about 30 minutes on
> >2.6.19.2. Due to the amount of time and the heavy load it takes before it
> >happens, I suspect a race condition. Memtest86 tests passed ok. The
> >amount of swap used when the condition happens is about 52k and stable
> >(only ~800MB/1GB are used).
> >
> >I am going to give it a look, but I suspect you might help narrowing it
> >down more quickly. Any insight would be appreciated.
>
> Mmm. that's going to be messy to debug ... but didn't we already know
> that kernel was racy? Or is 2.6.19.2 after that fix already? Does 20-rc6
> still break?

Hi Martin,

I finally re-ran memtest86 on the machine since it began to have too
many different kind of errors (GPF, invalid instruction...). It turned
out that one of the memory modules was bad. I guess my brand new
list_debug race condition debugger will be useful in the future, but not
now. :)

I'll remember to let memtest86 run a few hours more on my new machines
next time.

Mathieu

--
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/