Re: 8aeb879baf12 - significant system call latency regression, bisected
From: H. Peter Anvin
Date: Thu Jun 18 2026 - 18:56:33 EST
On 2026-06-17 05:37, Peter Zijlstra wrote:
>
> This builds with kcfi on and seems to do more or less do what is expected.
>
> I've not actually tried performance measurements on my IDT based system.
>
I'm going to run this through its paces.
I'm still confused, though, by the claim that changing the
patchable_function_entry() breaks the kCFI ABI. When I do a symbol check on my
system, the __pfx symbols are still at an offset of -16, and the additional
NOPs are located *before* them. Isn't this completely consistent with the
existing ABI? What am I missing here?
That being said, I like the idea of entry points being noendbr: they are very
high value targets and making them even a little bit harder to access I think
is a very good thing.
As far as getting the compiler people to address this: this very clearly would
have to be something explicitly opted in; e.g by adding a third (alignment)
argument to the patchable_function_entry attribute and option.
> Obviously this would want splitting into a few patches, but it does:
>
> - makes -fno-jump-tables unconditional
> - removes array_index_nospec() from the syscall dispatch
> - makes x{32,64}_sys_call() 'static noinstr'
Note: I have found that merging x32_sys_call() into x64_sys_call() generates
considerably better code, because both gcc and clang will re-use x64
sub-branches for the x32 code. Specifically, the way to make it generate good
code is to explicitly remove the x32 bit before a second switch (merging the
two switch statements will *not* give good code; neither compiler isn't clever
enough to detect the common-but-offset code branches.)
> - adds align_entry attribute that aligns on cacheline boundaries
> and disallows taking address
> - sprinkles align_entry on the noinstr syscall path
-hpa