Re: Cache flushing...

Tim Olson (tim@ibmoto.com)
Sun, 2 Jul 1995 09:22:36 -0500 (CDT)


Thanks for the comprehensive cache/TLB management document. I'd like
to add my comments as it pertains to PowerPC; in particular, explicit
coherency management between the instruction and data cache in split
I/D processors.

The PowerPC architecture is defined so that implementations which have
split I/D caches don't need to maintain instruction cache coherency in
hardware (the icache doesn't have to snoop bus transactions, nor does
it have to watch data transactions on its own processor). This means
that I/D coherency must be maintained by software. Anytime
instructions are "generated" (fork, exec) the icache must be made coherent.

| 4) Physical caches
|
| Here, the lines are indexed by physical addresses, and the
| tags are based upon the physical address too. Usually these caches
| are external or are part of a multi-level caching architecture.
|
| a) Context switch - No flushing.

correct.

| b) Fork - No flushing.

Since a fork() copies instruction pages to a new physical page, at the
end of the fork() we may have stale valid instructions in the icache,
and modified data in the dcache. To make the two coherent, the
instruction pages must be invalidated in the icache and flushed from
the dcache. In the PowerPC architecture, this can be done with the
following instruction sequence:

loop: ;# loop for each cache block in each of the instruction page ranges
dcbst r0, r3 ;# store any modified data from this block to memory
sync ;# ensure all transactions have been performed
icbi r0, r3 ;# invalidate icache for this block
addi r3, r3, 32 ;# 32-byte cache blocks
bdnz loop ;# repeat for rest of blocks
isync ;# prevent any prefetches that may have gotten
;# stale instructions from the icache before
;# invalidation

| c) Exec - No flushing.

exec() requires the same flushing as above.

| Lastly, bounding cache flushes by the size of the cache can
| greatly reduce the cache flushing overhead. If a region that needs to
| be flushed from the cache exceeds the total size of the cache, only
| an entire cache flush is necessary. This is pretty straight forward.

In the PowerPC architecture, there currently is no
architecturally-defined method for flushing the entire data cache
contents to memory (at least not efficiently). The data cache block store
and data cache block flush instructions use effective addresses, and
must match the tag in the data cache exactly. Thus, if you don't know
the range of effective addresses that might exist in the cache, you
would have to sequence through all of the effective address range to
guarantee a flush.

Instead of doing that ;-) most OS's will use an processor
implementation-defined method to flush the data cache. For 601, 603,
and 604, this consists of loading from an unused linear sequence of
memory, equal to the size of the data cache. This flushes out any
modified data existent in the cache. N.B. this must be done with
interrupts off, as it depends upon being the only thing accessing the
data cache to ensure the LRU bits sequence correctly and all ways of a
set are accessed.

TLB Coherency
-------------

TLBs are other caches on the processor that must be managed in
software. In particular, TLB entries must be flushed when the
virtual->physical translation changes in the external page tables.

Tim Olson
Apple Computer / Somerset
(tim@apple.com)