Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel

From: Linus Torvalds
Date: Sat Oct 05 2024 - 20:00:30 EST

Next message: kernel test robot: "Re: [PATCH] v4l2-subdev: Return -EOPNOTSUPP for unsupported pad type in call_get_frame_desc()"
Previous message: Kent Overstreet: "Re: [GIT PULL] bcachefs fixes for 6.12-rc2"
In reply to: H. Peter Anvin: "Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel"
Next in thread: Uros Bizjak: "Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 5 Oct 2024 at 16:37, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> Sadly, that is not correct; neither gcc nor clang uses lea:

Looking around, this may be intentional. At least according to Agner,
several cores do better at "mov immediate" compared to "lea".

Eg a RIP-relative LEA on Zen 2 gets a throughput of two per cycle, but
a "MOV r,i" gets four. That got fixed in Zen 3 and later, but
apparently Intel had similar issues (Ivy Bridge: 1 LEA per cycle, vs 3
"mov i,r". Haswell is 1:4).

Of course, Agner's tables are good, but not necessarily always the
whole story. There are other instruction tables on the internet (eg
uops.info) with possibly more info.

And in reality, I would expect it to be a complete non-issue with any
OoO engine and real code, because you are very seldom ALU limited
particularly when there aren't any data dependencies.

But a RIP-relative LEA does seem to put a *bit* more pressure on the
core resources, so the compilers are may be right to pick a "mov".

Linus

Next message: kernel test robot: "Re: [PATCH] v4l2-subdev: Return -EOPNOTSUPP for unsupported pad type in call_get_frame_desc()"
Previous message: Kent Overstreet: "Re: [GIT PULL] bcachefs fixes for 6.12-rc2"
In reply to: H. Peter Anvin: "Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel"
Next in thread: Uros Bizjak: "Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]