Re: [PATCH v9 00/10] VMSCAPE optimization for BHI variant
From: David Laight
Date: Sat Apr 04 2026 - 11:21:18 EST
On Thu, 2 Apr 2026 17:30:32 -0700
Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> v9:
> - Use global variables for BHB loop counters instead of ALTERNATIVE-based
> approach. (Dave & others)
> - Use 32-bit registers (%eax/%ecx) for loop counters, loaded via movzbl
> from 8-bit globals. 8-bit registers (e.g. %ah in the inner loop) caused
> performance regression on certain CPUs due to partial-register stalls. (David Laight)
> - Let BPF save/restore %rax/%rcx as in the original implementation, since
> it is the only caller that needs these registers preserved across the
> BHB clearing sequence.
That is as dangerous as hell...
Does BPF even save %rcx - I'm sure I checked that a long time ago
and found it didn't.
(I'm mostly AFK over Easter and can't check.)
A least there should be a blood great big comment that BPF calls this code
and only saves specific registers.
But given the number of mispredicted branches and other pipeline stalls
in this code a couple of register saves to stack are unlikely to make
any difference.
David