Re: Ideas for reducing memory copying and zeroing times

Ralf Baechle (ralf@mailhost.uni-koblenz.de)
17 Apr 1996 13:38:31 GMT


In article <199604162340.TAA06104@huahaga.rutgers.edu>, "David S. Miller" <davem@caip.rutgers.edu> writes:
|>
|> Just as a point of reference, Sparc's previous to the recent
|> UltraSparc V9's could not do bcopy/bzero using floating point as
|> someone suggested could be done on the intel. The reason being is
|> that the fpu can only use the same speed load/store double-word
|> instructions to move values in and out of the fpu regs as the integer
|> unit can, so there would be no advantage. So what they did was add
|> special stream copy/fill hardware onto the Sparc which could do burst
|> copy/fill at full bus burst chunk size.
|>
|> With the UltraSparc they have the same similar bcopy hardware
|> available but the FPU can be used efficiently now for such purposes
|> because there is a special load/store which can throw the entire
|> floating point register set to/from memory (64 64-bit registers ==
|> 512k at a time).

Some CPUs waste bandwidth by reading data from memory (-> lots of wait
states) which lateron would overwritten anyway. For some CPUs like
the R4000 there are instruction that tell the cache to create a dirty
cacheline. This cachelines then gets completly overwritten at full speed
(That's >1gb/s for an plain R4600) and then written back into memory.
With the copying/filling loop in the cache and the bus only used for
writeback into memory of complete cachelines this results in a significant
speedup. This is how Linux/MIPS copies/fills pages for R6000 and newer.

Ralf