Re: 8aeb879baf12 - significant system call latency regression, bisected
From: Linus Torvalds
Date: Tue Jun 16 2026 - 04:49:17 EST
On Tue, 16 Jun 2026 at 13:58, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> So ISTR the Intel I-fetch window was 16 bytes, so the above things would
> make sense. However, Gemini, or whatever AI sits in google search, is
> trying to tell me Intel moved to 32 byte I-fetch with Alderlake.
Even with 16-byte fetch, the cacheline size is 64 bytes, so it hurts
to not be 64-byte aligned - simply because you may need to fetch more
cachelines (assuming fairly linear code).
And afaik, some of the newer ones aren't 32-byte wide, but can do 48
bytes as three 16-byte fetches.
But I don't know if they can do the old "split line access" that older
cores could do, where a Pentium would do two 8-byte accesses at the
same time, and they didn't have to be in the same cache line.
So 64-byte alignment would always be the best option if you only look
at a *particular* piece of code.
But it obviously is very wasteful and hurts when there is code around
it that could be loaded into the cache at the same time.
So almost certainly not a good idea in general.
But 64-byte alignment is probably what things like interrupt and
system call entrypoints should use, because those things would make
sense to look at as isolated things, not part of a bigger load". And
they are quite likely to start from a fairly cold-cache situation.
So *not* some general compiler option in a config file, but maybe a
special "entry point alignment" macro?
Linus