Re: 8aeb879baf12 - significant system call latency regression, bisected

From: Calvin Owens

Date: Sat Jun 13 2026 - 22:11:24 EST


On Saturday 06/13 at 16:52 -0700, H. Peter Anvin wrote:
> On 2026-06-13 13:34, H. Peter Anvin wrote:
> > On 2026-06-13 01:59, Peter Zijlstra wrote:
> > > On Fri, Jun 12, 2026 at 06:45:06PM -0700, "H. Peter Anvin" (Intel) wrote:
> > > > So I was trying to figure out a significant -- about 13% -- increase
> > > > in system call latency between v7.0 and the current master, and it
> > > > bisects down to:
> > > >
> > > > 8aeb879baf12 x86/kvm/vmx: Fix x86_64 CFI build
> > > >
> > > > This is on Panther Lake (Core Ultra X7 358H) with FRED enabled. This
> > > > is a bare metal boot, no KVM.
> > > >
> > > > I'm personally extremely puzzled how this could possibly be related,
> > > > and I will be investigating the possibility that this is a false
> > > > bisect, but it is not a Heisenbug in any way; it has been extremely
> > > > reproducible, and the difference is statistically valid by close to 10
> > > > sigma. Futhermore, the bisection at least gave the appearance of
> > > > stability.
> > > >
> > > > Given how late in the cycle this is I wanted to send an alert sooner
> > > > rather than later; I will update as I get more data.
> > >
> > > Uhm, massive WTF indeed. I don't immediately see how this could possibly
> > > affect a FRED host either, except perhaps in code layout.
> > >
> > > I don't actually have a FRED capable machine, but have you tried running
> > > one of those top-down perf things on it, to see where its hurting?
> >
> > Not yet, but I'm investigating right now (I have some family obligations this weekend, so my duty cycle is somewhat limited.)
> >
> > I reverted the patch on top of rc7, and it did, in fact, fix the regression,
> > but I'm doing a clean from-scratch rebuild of both trees to make sure
> > there isn't anything in my test setup that could introduce any kind of
> > "memory" between builds...>
> Nope, even with the clean rebuild it is 100% reproducible. It is in fact
> worse than I originally stated: the average with 7.1rc7 is 478±6 cycles
> (with the top and bottom octiles removed as outlier protection); with 7.1rc7
> with the above patch reverted it is 397.5±0.4. - this is in fact a 20%
> increase in latency, not 13%...

It has to be the .text layout, doesn't it?

I notice we're splitting a cache line here now with the prefix symbol,
7.0-rc7 has:

ffffffff812175f0 <__pfx_x64_sys_call>:
ffffffff81217600 <x64_sys_call>:

If I revert 8aeb879baf12, I get:

ffffffff812175c0 <__pfx_x64_sys_call>:
ffffffff812175d0 <x64_sys_call>:

Could that be it?

Unfortunately I don't have any hardware new enough to poke at it myself.

Cheers,
Calvin