Re: kernel cache mem bug(?)

From: Valdis . Kletnieks
Date: Thu Mar 16 2006 - 15:39:53 EST


On Thu, 16 Mar 2006 18:41:02 +0100, kernel@xxxxxxxxxxx said:
> [X.] Other notes, patches, fixes, workarounds:

> Workaround: When we disable HyperThreading in BIOS, this
> problem goes away. We re-enabling HT, it comes back...

Have you ruled out marginal memory, or overclocking/overheating?

I'm guessing something is barely within tolerance when one CPU is
beating up on it, and it falls over when the HT adds to the mix.

For that matter, this looks racy:

while [ 0$MD5LOOPS -gt 0 ]; do
md5sum cache.*-*-* >> md5log.$PID.lis
MD5LOOPS=`expr 0$MD5LOOPS - 1`
done &

AMNT=`awk '$1!="91b82dcc83230890dbcdfc6b80571ddd"' md5log.$PID.lis | wc -l`

If md5sum uses stdio to write the output, and it writes more than 4K or so,
it's possible you can get a partial line right at the buffer boundary,
which will then come up as a mismatch according to the awk. You might want
to actually output the mismatched line in its entirety, and make sure you're
looking at a complete line, and not a partial....

Attachment: pgp00000.pgp
Description: PGP signature