Re: 8aeb879baf12 - significant system call latency regression, bisected

Next message: himanshubatra: "[PATCH v2] scsi: ufs: sysfs: Add HS_GEAR6 string in power_info/gear sysfs output"
Previous message: Luca Ceresoli: "Re: [PATCH v4 2/2] drm: bridge: ti-sn65dsi83: Disable video burst mode for LVDS stability"
In reply to: H. Peter Anvin: "Re: 8aeb879baf12 - significant system call latency regression, bisected"
Next in thread: Linus Torvalds: "Re: 8aeb879baf12 - significant system call latency regression, bisected"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Peter Zijlstra

Date: Tue Jun 16 2026 - 04:31:33 EST

On Sat, Jun 13, 2026 at 06:50:24PM -0700, H. Peter Anvin wrote:

> OK, I have, I believe root-caused this.
>
> It is a padding issue; removing the code changes __pfx_x64_sys_call to be
> 32-byte aligned, with the result that x64_sys_call gets *mis*aligned.
>
> Reverting the patch but adding an alignment statement to x64_sys_call
> re-introduces the performance regression.
>
> I am concerned because this could mean that the __pfx stubs add substantial
> overhead elsewhere, unless this just happens to be a particularly sensitive
> case...

So what is the actual alignment requirement these days then? We're
building the (x86_64) kernel with 16 byte function and 1 byte jump
alignment.

So ISTR the Intel I-fetch window was 16 bytes, so the above things would
make sense. However, Gemini, or whatever AI sits in google search, is
trying to tell me Intel moved to 32 byte I-fetch with Alderlake.

That same thing is saying AMD switched to 32 byte I-fetch with Zen (1)
and later.

This all seems to suggest we do something like so, hmm?

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b9f5a4a3cc2a..65fff65271d0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -329,7 +329,9 @@ config X86
select HAVE_ARCH_KCSAN if X86_64
select PROC_PID_ARCH_STATUS if PROC_FS
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
- select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16
+ # AMD-Zen+ and Intel-Alderlake+ moved to 32 byte I-fetch
+ select FUNCTION_ALIGNMENT_32B if X86_64
+ select FUNCTION_ALIGNMENT_16B if X86_ALIGNMENT_16
select FUNCTION_ALIGNMENT_4B
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE