Re: Possible dcache BUG

From: Andi Kleen
Date: Thu Aug 05 2004 - 15:30:39 EST


On Thu, 5 Aug 2004 11:50:33 -0700 (PDT)
Linus Torvalds <torvalds@xxxxxxxx> wrote:

>
>
> On Thu, 5 Aug 2004, Ingo Molnar wrote:
> >
> > * Linus Torvalds <torvalds@xxxxxxxx> wrote:
> >
> > > Anyway, one other thing that makes me worry is the fact that Gene
> > > apparently has a K7. One of the things AMD has gotten wrong several
> > > times is prefetching, and it so happens that the dcache code is one of
> > > the users of the prefetch instruction. prude_dcache() in particular.
> >
> > hm, i too happen to have an Athlon64 box (running the x86 kernel) where
> > i can reproduce dcache pruning crashes after a few hours of testing
> > using a near-vanilla kernel.
>
> Very interesthing.
>
> The K8 core (aka Opteron or Athlon64) has exactly the same prefetch page
> fault bugs that the K7 core has. This, coupled with your observation

Yep, but they should be handled. Of course in theory it could be a subtle
bug in the prefetch handler. But normally even when that goes wrong you
just get a obvious oops on the prefetch instruction itself.

When you disable the use of prefetch does it still happen?

diff -u linux-2.6.8rc2-update/include/asm-i386/processor.h-o linux-2.6.8rc2-update/include/asm-i386/processor.h
--- linux-2.6.8rc2-update/include/asm-i386/processor.h-o 2004-07-28 02:23:44.000000000 +0200
+++ linux-2.6.8rc2-update/include/asm-i386/processor.h 2004-08-05 22:25:46.000000000 +0200
@@ -612,6 +612,7 @@

#define ASM_NOP_MAX 8

+#if 0
/* Prefetch instructions for Pentium III and AMD Athlon */
/* It's not worth to care about 3dnow! prefetches for the K6
because they are microcoded there and very slow.
@@ -640,6 +641,7 @@
"r" (x));
}
#define spin_lock_prefetch(x) prefetchw(x)
+#endif

extern void select_idle_routine(const struct cpuinfo_x86 *c);



> > NOTE2: i tried hard but couldnt reproduce the problem using the very
> > same kernel and the same workload on a PIII box. Once i ran it overnight
> > to check. Only the Athlon64 box does it. It could also be a hardware
> > problem - albeit the box withstood days of memtest86.

Both K8/K7 are usually a lot faster and a lot more aggressive in out of order
execution than the P3 box. A P4 would be a better comparison.

> Andi, I think you were the contact for the AMD prefetch bug. Can you ask
> around the same people whether there might be other problems in this area?
> No point in putting a lot of effort into it, but just as one thing to
> check for..

A bigger sample size that shows it really only happens on AMD first would be useful.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/