Re: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

From: Linus Torvalds
Date: Wed Dec 04 2024 - 13:49:57 EST


On Sun, 1 Dec 2024 at 14:24, David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> Agner's tables pretty much show that Intel implemented as
> x = cond ? y : x
> so it suffers from being a 2 u-op instruction (the same as sbb)
> on older core-2 cpu.

So I don't worry about a 2-cycle latency here, you'll find the same
for 'sbb' too, and there you have the additional 'or' operation that
then adds another cycle.

And Intel has documented that cmov is a data dependency, so it's
mainly just AMD that I'd worry about:

> OTOH AMD have is as '4 per clock' (the same as mov) so could be
> a 'mov' with the write disabled' (but I'm not sure how that
> would work if 'mov' is a register rename).

So that's the part that really worried me. "4 per lock, just like
'mov'" makes me worry it's a clever predicted mov instruction.

However, it looks like Anger is actually wrong here. Going to

https://uops.info/table.html

and looking up 'cmovbe' (which I think is the op we'd want), says that
ZEN 4 is 2 per cycle (I hate how they call that 0.5 "throughput" - at
least Agner correctly calls it the "reciprocal throughput").

So that actually looks ok.

I'd still be happier if I could find some official AMD doc that says
that cmov is a data dependency and is not predicted, but at least now
the numbers line up for it.

Linus