Re: [PATCH v9 00/10] VMSCAPE optimization for BHI variant
From: Pawan Gupta
Date: Sun Apr 05 2026 - 03:24:17 EST
On Sat, Apr 04, 2026 at 04:20:59PM +0100, David Laight wrote:
> On Thu, 2 Apr 2026 17:30:32 -0700
> Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
>
> > v9:
> > - Use global variables for BHB loop counters instead of ALTERNATIVE-based
> > approach. (Dave & others)
> > - Use 32-bit registers (%eax/%ecx) for loop counters, loaded via movzbl
> > from 8-bit globals. 8-bit registers (e.g. %ah in the inner loop) caused
> > performance regression on certain CPUs due to partial-register stalls. (David Laight)
> > - Let BPF save/restore %rax/%rcx as in the original implementation, since
> > it is the only caller that needs these registers preserved across the
> > BHB clearing sequence.
>
> That is as dangerous as hell...
> Does BPF even save %rcx - I'm sure I checked that a long time ago
> and found it didn't.
Below code injects save/restore of %rax and %rcx to BPF programs:
arch/x86/net/bpf_jit_comp.c
emit_spectre_bhb_barrier()
{
u8 *prog = *pprog;
u8 *func;
if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
/* The clearing sequence clobbers eax and ecx. */
EMIT1(0x50); /* push rax */
EMIT1(0x51); /* push rcx */
ip += 2;
func = (u8 *)clear_bhb_loop_nofence;
ip += x86_call_depth_emit_accounting(&prog, func, ip);
if (emit_call(&prog, func, ip))
return -EINVAL;
/* Don't speculate past this until BHB is cleared */
EMIT_LFENCE();
EMIT1(0x59); /* pop rcx */
EMIT1(0x58); /* pop rax */
}
...
> (I'm mostly AFK over Easter and can't check.)
> A least there should be a blood great big comment that BPF calls this code
> and only saves specific registers.
Sure, will add.
> But given the number of mispredicted branches and other pipeline stalls
> in this code a couple of register saves to stack are unlikely to make
> any difference.
BPF programs have been saving/restoring the registers since long now. What
problem are you anticipating?