Re: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses
From: Linus Torvalds
Date: Wed Dec 04 2024 - 13:49:57 EST
On Sun, 1 Dec 2024 at 14:24, David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> Agner's tables pretty much show that Intel implemented as
> x = cond ? y : x
> so it suffers from being a 2 u-op instruction (the same as sbb)
> on older core-2 cpu.
So I don't worry about a 2-cycle latency here, you'll find the same
for 'sbb' too, and there you have the additional 'or' operation that
then adds another cycle.
And Intel has documented that cmov is a data dependency, so it's
mainly just AMD that I'd worry about:
> OTOH AMD have is as '4 per clock' (the same as mov) so could be
> a 'mov' with the write disabled' (but I'm not sure how that
> would work if 'mov' is a register rename).
So that's the part that really worried me. "4 per lock, just like
'mov'" makes me worry it's a clever predicted mov instruction.
However, it looks like Anger is actually wrong here. Going to
https://uops.info/table.html
and looking up 'cmovbe' (which I think is the op we'd want), says that
ZEN 4 is 2 per cycle (I hate how they call that 0.5 "throughput" - at
least Agner correctly calls it the "reciprocal throughput").
So that actually looks ok.
I'd still be happier if I could find some official AMD doc that says
that cmov is a data dependency and is not predicted, but at least now
the numbers line up for it.
Linus