Re: 8aeb879baf12 - significant system call latency regression, bisected
From: Ingo Molnar
Date: Wed Jun 17 2026 - 06:05:28 EST
* Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> I'd exclude the L0D, L1DTLB, the RSB and the load/store queues
> as well, because code alignment of a single symbol should have
> a minimal effect on them, which leaves:
>
> - uOP Queue - 192 entries
> - uOP Cache (Micro-op Cache) - ~5,250 uOPs, ~64 sets x 10-12 way
> - Reorder Buffer (ROB) - 576 entries
>
> And I think of these the main suspect would be the uOP cache,
> because its (estimated...) ~10-12 deep associativity limit
> of uop-sets may be something this benchmark is hitting on
> Panther Lake?
>
> Could it be that the extra alignment adds +1 to the maximum number
> of uOP cache 'ways' this execution hits in the uOP cache, moving
> it form say 12 (still fits) to 13 (misses) so that this particular
> uOP cache association depth starts trashing? But I'm really just
> guessing wildly here...
>
> ( The extra statistical noise of the regressed figures does suggest
> some sort of trashing mechanic behind the scenes though, and the
> regular caches seem large enough to not actually trash for such
> a cache-hot benchmark. )
>
> Or am I missing something obvious?
>
> Any perf stat uOP related counter measurements might be illuminating.
The relevant uOP cache (Intel DSB) perf stat counters would be:
starship:~/tip> git grep DSB_ tools/perf/pmu-events/arch/x86/pantherlake/
tools/perf/pmu-events/arch/x86/pantherlake/frontend.json: "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS",
tools/perf/pmu-events/arch/x86/pantherlake/frontend.json: "EventName": "FRONTEND_RETIRED.DSB_MISS",
tools/perf/pmu-events/arch/x86/pantherlake/frontend.json: "EventName": "IDQ.DSB_CYCLES_ANY",
tools/perf/pmu-events/arch/x86/pantherlake/frontend.json: "EventName": "IDQ.DSB_CYCLES_OK",
tools/perf/pmu-events/arch/x86/pantherlake/frontend.json: "EventName": "IDQ.DSB_UOPS",
In particular FRONTEND_RETIRED.ANY_DSB_MISS and
FRONTEND_RETIRED.DSB_MISS before/after counts?
Thanks,
Ingo