Re: 8aeb879baf12 - significant system call latency regression, bisected

From: Calvin Owens

Date: Sat Jun 13 2026 - 22:14:17 EST


On Saturday 06/13 at 19:11 -0700, Calvin Owens wrote:
> On Saturday 06/13 at 16:52 -0700, H. Peter Anvin wrote:
> > On 2026-06-13 13:34, H. Peter Anvin wrote:
> > > On 2026-06-13 01:59, Peter Zijlstra wrote:
> > > > On Fri, Jun 12, 2026 at 06:45:06PM -0700, "H. Peter Anvin" (Intel) wrote:
> > > > > So I was trying to figure out a significant -- about 13% -- increase
> > > > > in system call latency between v7.0 and the current master, and it
> > > > > bisects down to:
> > > > >
> > > > > 8aeb879baf12 x86/kvm/vmx: Fix x86_64 CFI build
> > > > >
> > > > > This is on Panther Lake (Core Ultra X7 358H) with FRED enabled. This
> > > > > is a bare metal boot, no KVM.
> > > > >
> > > > > I'm personally extremely puzzled how this could possibly be related,
> > > > > and I will be investigating the possibility that this is a false
> > > > > bisect, but it is not a Heisenbug in any way; it has been extremely
> > > > > reproducible, and the difference is statistically valid by close to 10
> > > > > sigma. Futhermore, the bisection at least gave the appearance of
> > > > > stability.
> > > > >
> > > > > Given how late in the cycle this is I wanted to send an alert sooner
> > > > > rather than later; I will update as I get more data.
> > > >
> > > > Uhm, massive WTF indeed. I don't immediately see how this could possibly
> > > > affect a FRED host either, except perhaps in code layout.
> > > >
> > > > I don't actually have a FRED capable machine, but have you tried running
> > > > one of those top-down perf things on it, to see where its hurting?
> > >
> > > Not yet, but I'm investigating right now (I have some family obligations this weekend, so my duty cycle is somewhat limited.)
> > >
> > > I reverted the patch on top of rc7, and it did, in fact, fix the regression,
> > > but I'm doing a clean from-scratch rebuild of both trees to make sure
> > > there isn't anything in my test setup that could introduce any kind of
> > > "memory" between builds...>
> > Nope, even with the clean rebuild it is 100% reproducible. It is in fact
> > worse than I originally stated: the average with 7.1rc7 is 478±6 cycles
> > (with the top and bottom octiles removed as outlier protection); with 7.1rc7
> > with the above patch reverted it is 397.5±0.4. - this is in fact a 20%
> > increase in latency, not 13%...
>
> It has to be the .text layout, doesn't it?
>
> I notice we're splitting a cache line here now with the prefix symbol,
> 7.0-rc7 has:

Whoops, I meant 7.1-rc7.

But seeing your other mail, sounds like this is it :)

> ffffffff812175f0 <__pfx_x64_sys_call>:
> ffffffff81217600 <x64_sys_call>:
>
> If I revert 8aeb879baf12, I get:
>
> ffffffff812175c0 <__pfx_x64_sys_call>:
> ffffffff812175d0 <x64_sys_call>:
>
> Could that be it?
>
> Unfortunately I don't have any hardware new enough to poke at it myself.
>
> Cheers,
> Calvin