On Wed, 20 Oct 2010, Gleb O. Raiko wrote:I meant the latter.
I'm not sure what you mean: whether the processor will snoop the value to
read in the store buffer or will it stall until the buffer has drained and
issue the load on the external bus?
I can't see the behaviour of uncached loads wrt uncached stores clearlyI agree the docs are unclear here. They contain an example of cached and uncached stores (Ralf has pointed to already), but no clear explanation for mix of loads and stores. Sure, it's safer to keep both sync and uncached load.
documented anywhere for the R4400 processor (DEC used the SC variation,
BTW). There's no mention of uncached loads to have SYNC properties.
Therefore in the context of one or more pending uncached stores I canThere is no such thing like performance in case of uncached loads.
assume one of the three for an uncached load:
1. If the addresses match, then the value loaded is snooped in (retrieved
from) the store buffer, no external cycle on the bus is seen. This is
what the R2020 WB did.
2. The load bypasses the stores and therefore reaches the external bus
before the stores. This is what the R3220 MB did and I believe the
R2020 WB defaulted to in the case of no address match.
3. The load stalls until the outstanding stores have completed and only
then appears on the external bus.
There's no hurt from using SYNC here and its semantics make it clear it
enforces the case #3 above even if not otherwise guaranteed. Otherwise I
think the case #2 would be a reasonable default (i.e. one I'd recommend to
a processor designer) as draining the store buffer on any uncached load
whether needed or not is a waste of performance.