Re: Caches and DMA with PPro

Linus Torvalds (torvalds@cs.helsinki.fi)
Tue, 16 Apr 1996 09:27:20 +0300 (EET DST)


On Mon, 15 Apr 1996, Bob Felderman wrote:
>
> I'm running 1.3.88 on a Micron 180MHz PentiumPro machine.
> It appears that our network board is DMAing stale data
> when transmitting a packet and/or the host is reading stale
> data after a pakcet is received.

This _should_ be impossible. It sounds like maybe the motherboard doesn't
correctly keep the caches in sync, because even though the PPro does some
"flexible" memory accesses, they should never result in this kind of
behaviour (essentially, the hardware should make sure memory is coherent).

> I've looked at the flush_cache_xxx() code in pgtable.h and it
> has the following code.
>
> /* Caches aren't brain-dead on the intel. */
> #define flush_cache_all() do { } while (0)
> #define flush_cache_mm(mm) do { } while (0)
> #define flush_cache_range(mm, start, end) do { } while (0)
> #define flush_cache_page(vma, vmaddr) do { } while (0)

No, this is for the user-level memory management, not for device level
cache flushing. Essentially, it's for architectures that have virtual
caches and don't invalidate them correctly when the page translations
change.

There is a "mb()" macro in the header files that stands for "memory
barrier", and which is used to make sure that CPU writes have actually gone
out to the memory subsystem. On the x86 this is an empty asm statement (set
up in a way that makes sure that gcc doesn't optimize things around it and
thus make the barrier useless).

Not very many drivers use "mb()", because it's not usually needed even
on hardware that have write buffers and/or out-of-order reads (IO
operations are also written so that they do the same memory
synchronization).

> I've tried disabling the caches from the BIOS setup, but the
> performance of the system and the behavior is unchanged, so I
> suspect the BIOS isn't really turning off the caches.

It may be that it disables any external caches, and with a PPro you
probably don't even have that (and even if you do, you probably wouldn't
notice the speed difference because the internal caches are good enough
for most things).

Note that the intel architecture doesn't even _have_ any cache flush
operations for reads (well, it has a "wbinvalidate()" instruction, but
nobody uses it because it should never be needed and it's slow as h*ll,
epsecially in the unlikely situation that the external interfaces actually
honour it)

Instead, the PPro has a few so-called "serializing instructions", and
any speculative reads (or delayed writes) will _not_ pass those
instructions. Which is why you should _not_ see the behaviour you see
unless the external hardware is broken wrt cache coherency. I quote:

The I/O instructions, locking instructions, the LOCK prefix, and
serializing instructions force strong ordering on the processor.

Note that the low-level interrupt code always does a few IO instructions,
so that the hardware interrupt action itself will always serialize the
pentium (I suspect the actual interrupt also serializes the CPU, but I
can't find that in the documentation).

Now, if the hardware sends out the interrupt _before_ having completely
written the packet to memory, that might result in problems, but I assume
that goes without saying ;-)

Linus