Re: How to invoke burst-read on PCI mapped memory area

Gabriel Paubert (paubert@iram.es)
Tue, 13 Oct 1998 13:15:59 +0100 (MET)


On Tue, 13 Oct 1998, Hiroshi Kawashima wrote:

> Thank you for your suggetion.
>
> > Even if it is possible, what machanism will maintain coherency between the
> > cache and the remote memory which is seen through the system that has been
> > described in the initial mail.
>
> Of course, I understand coherency issue.
>
> Since this system is early experimental, we force strong limitation to
> application programmer (i.e. before reading buffer mapped to remote node's
> memory, application explicitly must invalidate cache associated.)
> Finally, we will implement coherency mechanism on NIC (hopefully...).

How do you explicitly invalidate the cache on some architectures without
breaking anything (I'm thinking of exotic and little known architectures
like Intel ;)) ? And for coherency between PCI and processor caches, good
luck (some transaction to keep memory and caches coherent in SMP systems
probably do not go across the PCI bridge).

> Also, I understand PCI mastar capable device is the appropriate solution
> for these performance issue, but I'd like to try squeeze out performance
> from current H/W implementation as possible as I can.
>
> Before Linux implementation, my colleague wrote very basic device
> driver on Windows/NT. On WIndows/NT, he can map such PCI address space
> (finally mapped to remote node's memory) as Cachable,
> and observed 'Read Multiple' (or 'Read Line' not sure, sorry) transaction
> on PCI bus (monitored with PCI analyzer).
>
> So, I'd like to implement same feature (access method) on Linux (if possible).

Cache handling on Intel has become a mess because of the interaction
bewteen MTRR and PTE attributes. And Intel has recently added a new
feature called programmable translation attributes or something like that
comlicates matters still more. However, when memory is declared
non-cacheable (one way or another), the processor will perform a single
bus cycle for each memory access. If the memory is cacheable, the
processor will attempt to perform a cache line fill burst which may or may
not be translated to PCI multiple/line transaction. This will not work on
Pentium systems because it is the chip set which decides which memory
areas are cacheable (and uses 2 pins to communicate it to the processor,
transforming attempts from burst reading the PCI bus to single beat
cycles).

If you are using ioremap to acces the device, you may want to consider
__ioremap which has an additional parameter describing the desired cache
attributes. But the MTRR may still get in the way.

Finally, do not forget that PII can only cache without aliasing 512 Mb,
if the address allocated to your adapter falls above that line, you're
probably screwed (and for sure if you happen to have 512Mb of RAM).

Gabriel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/