Re: oh, that clever bypassing..

David Mosberger-Tang (davidm@AZStarNet.com)
Fri, 1 Sep 1995 11:45:59 -0700


>>>>> On Fri, 1 Sep 95 20:27:39 MET DST, "Neal." <crook@rdgeng.enet.dec.com> said:

Neal> Tsk. if only I'd read the Wise One's words more carefully:

Neal> a load which misses the dcache and hits the write buffer
Neal> causes the buffer to flush, and the load get the data from
Neal> off-chip.

Neal> - in the case of this loop, the load *hits* in the
Neal> Dcache. That's why:

Neal> -- you see a speedup
Neal> -- you get the same performance between 21066 (noname)
Neal> and 21064 (Cabriolet)

Neal> So, the process is:

Neal> -- the store hits in the Dcache, and updates the Dcache
Neal> value. Since the Dcache can't be dirty, the store data posts
Neal> an entry into the write buffer.

Neal> -- the load hit sin the Dcache, so doesn't force the write
Neal> buffer flush.

Neal> Phew! Thanks, Anthony, for setting me straight on that
Neal> one. Saved me having to pore over DavidMT's code at the
Neal> weekend.

Neal> So, it seems like the problem with the store/load
Neal> 'optimisation' is to decide whether the data is likely to be
Neal> Dcached.

Oh, I see! I didn't know that stores *do* update the d-cache
(provided the line is in the cache already). That's cool, because it
means that fp<->integer conversions will usually run at CPU speeds
(because the top of the stack is normally in the d-cache already).
Interesting!

Thanks much for digging into this!

Have a great weekend (no more mails---promised! :)

--david