Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
From: Linus Torvalds
Date: Sat Oct 05 2024 - 20:00:30 EST
On Sat, 5 Oct 2024 at 16:37, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> Sadly, that is not correct; neither gcc nor clang uses lea:
Looking around, this may be intentional. At least according to Agner,
several cores do better at "mov immediate" compared to "lea".
Eg a RIP-relative LEA on Zen 2 gets a throughput of two per cycle, but
a "MOV r,i" gets four. That got fixed in Zen 3 and later, but
apparently Intel had similar issues (Ivy Bridge: 1 LEA per cycle, vs 3
"mov i,r". Haswell is 1:4).
Of course, Agner's tables are good, but not necessarily always the
whole story. There are other instruction tables on the internet (eg
uops.info) with possibly more info.
And in reality, I would expect it to be a complete non-issue with any
OoO engine and real code, because you are very seldom ALU limited
particularly when there aren't any data dependencies.
But a RIP-relative LEA does seem to put a *bit* more pressure on the
core resources, so the compilers are may be right to pick a "mov".
Linus