Re: Ideas for reducing memory copying and zeroing times

David S. Miller (davem@caip.rutgers.edu)
Thu, 18 Apr 1996 23:22:01 -0400


From: ralf@mailhost.uni-koblenz.de (Ralf Baechle)
Date: 17 Apr 1996 13:38:31 GMT

Some CPUs waste bandwidth by reading data from memory (-> lots of
wait states) which lateron would overwritten anyway. For some CPUs
like the R4000 there are instruction that tell the cache to create
a dirty cacheline. This cachelines then gets completly overwritten
at full speed (That's >1gb/s for an plain R4600) and then written
back into memory. With the copying/filling loop in the cache and
the bus only used for writeback into memory of complete cachelines
this results in a significant speedup. This is how Linux/MIPS
copies/fills pages for R6000 and newer.

Nice. Someone else mentioned that the Pentium cache has a problem
where it won't write allocate (which makes a lot of sense on small
caches and produces a better hit pattern, the PC memory subsystem is
pretty fast anyways though...)

I figure if you do something like:

load source ! source enters cache
do checksum calculation ! fill the pipeline
null load from dest ! dest enters cache even if no-wr-alloc
store to destination

If both source and dest keep the cache streaming data in, _and_
continues to hold the destination by the time the store happens, you
get a really nice copy bandwidth streaming effect (1gb/s as you
mentioned on nice cache architectures.)

Though these pains may only really get the numbers on decently sized
caches (MIPS, SuperSparc, HyperSparc, UltraSparc, maybe some others)
and not with the small caches so much. But if the optimization is not
too difficult, doing it on small caches is indeed worthwhile.

Later,
David S. Miller
davem@caip.rutgers.edu