Hi Suzuki,
On Thu, Nov 14, 2019 at 02:59:13PM +0000, Suzuki K Poulose wrote:
This series adds workaround for Arm erratum 1542418 which affects
Searching for that erratum number doesn't find me a description :(
Yes, but it hurts performance.
Cortex-A77 cores (r0p0 - r1p0). Affected cores may execute stale
instructions from the L0 macro-op cache violating the
prefetch-speculation-protection guaranteed by the architecture.
This happens when the when the branch predictor bases its predictions
on a branch at this address on the stale history due to ASID or VMID
reuse.
Two immediate questions:
1. Can we disable the L0 MOP cache?
2. Can we invalidate the branch predictor? If Spectre-v2 taught us
anything it's that removing those instructions was a mistake!
Moving on...
Have you reproduced this at top-level? If I recall the
prefetch-speculation-protection, it's designed to protect against the
case where you have a direct branch:
addr: B foo
and another CPU writes out a new function:
bar:
insn0
...
insnN
before doing any necessary maintenance and then patches the original
branch to:
addr: B bar
The idea is that a concurrently executing CPU could mispredict the original
branch to point at 'bar', fetch the instructions before they've been written
out and then confirm the prediction by looking at the newly written branch
instruction. Even without the prefetch-speculation-protection, that's
fairly difficult to achieve in practice: you'd need to be doing something
like reusing memory to hold the instructions so that the initial
misprediction occurs.
How does A77 stop this from occurring when the ASID is not reallocated (e.g.
the example above)? Is the MOP cache flushed somehow?
With this erratum, it sounds like you have to end up reusing an ASID from
a task that had a branch at 'addr' in its address space that branched to
the address of 'bar' (again. in its address space). Is that right? That
sounds super rare to me, particularly with ASLR: not only does the aliasing
branch need to exist, but it needs to be held in the branch predictor while
we cycle through 64k ASIDs *and* the race with the writer needs to happen
so that we get stale instructions from the MOP cache.
Is there something I'm missing that makes this remotely plausible?