Re: 8aeb879baf12 - significant system call latency regression, bisected

Next message: Danilo Krummrich: "[PATCH v3 0/7] ForLt/CovariantForLt split, auxiliary closure API and DevresLt"
Previous message: Randy Dunlap: "Re: [RFC PATCH] reserve_mem: add support for static memory"
In reply to: David Laight: "Re: 8aeb879baf12 - significant system call latency regression, bisected"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: H. Peter Anvin

Date: Thu Jun 18 2026 - 19:03:41 EST

On 2026-06-16 06:53, David Laight wrote:
>
> Basically you can't win.
> I was looking at why a patch didn't give the expected performance gain
> on a different base kernel build.
> It seems to depend on whether the function (actually strlen) was aligned
> to an odd or even 16 byte boundary.
> If aligned to an even boundary the loop inside the function crossed a
> 'significant' boundary and the code ran measurably slower.
> If you start aligning loop tops and labels in general you probably lose
> due to code bloat.
> (Here the loop didn't need aligning, it just needed not to contain
> the relevant boundary.)
>
> In this case the extra padding will change the alignment of everything that
> follows - and some of those might make a difference as well.
>
> You'd need to add extra code further down the function to keep the size
> the same (and hope the compiler keeps the functions in the same order).
>

This is true, but this is why we want to at least be selective about it.

Padding every single function generates code bloat, *and* it is a compile-time
option which means that only people using a kernel built for that target will
benefit.

Hence this is better confined to specific ultra-hot entry point.

-hpa