Re: [PATCH v8 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Pawan Gupta
Date: Thu Mar 26 2026 - 04:45:49 EST
On Wed, Mar 25, 2026 at 09:37:59PM +0100, Borislav Petkov wrote:
> On Tue, Mar 24, 2026 at 03:13:08PM -0700, Pawan Gupta wrote:
> > This is cleaner. A few things to consider are, CLEAR_BRANCH_HISTORY that
> > calls clear_bhb_loop() would be calling into C code very early during the
> > kernel entry. The code generated here may vary based on the compiler. Any
> > indirect branch here would be security risk. This needs to be noinstr so
> > that it can't be hijacked by probes and ftraces.
> >
> > At kernel entry, calling into C before mitigations are applied is risky.
>
> You can write the above function in asm if you prefer - should still be
> easier.
I believe the equivalent for cpu_feature_enabled() in asm is the
ALTERNATIVE. Please let me know if I am missing something.
Regarding your intent to move the loop count selection out of the BHB
sequence, below is what I could come up. It is not as pretty as the C
version, but it is trying to achieve something similar:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ecae3cef9d8c..54c65b0a3f65 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1494,6 +1494,20 @@ SYM_CODE_START_NOALIGN(rewind_stack_and_make_dead)
SYM_CODE_END(rewind_stack_and_make_dead)
.popsection
+/*
+ * Between the long and short version of BHB clear sequence, just the
+ * loop count differs based on BHI_CTRL, see Intel's BHI guidance.
+ */
+#define BHB_SHORT_LOOP_OUTER 5
+#define BHB_SHORT_LOOP_INNER 5
+
+#define BHB_LONG_LOOP_OUTER 12
+#define BHB_LONG_LOOP_INNER 7
+
+#define BHB_MOVB(type, reg) \
+ ALTERNATIVE __stringify(movb $BHB_SHORT_LOOP_##type, reg), \
+ __stringify(movb $BHB_LONG_LOOP_##type, reg), X86_FEATURE_BHI_CTRL
+
/*
* This sequence executes branches in order to remove user branch information
* from the branch history tracker in the Branch Predictor, therefore removing
@@ -1540,12 +1554,7 @@ SYM_FUNC_START(clear_bhb_loop_nofence)
/* BPF caller may require all registers to be preserved */
push %rax
- /*
- * Between the long and short version of BHB clear sequence, just the
- * loop count differs based on BHI_CTRL, see Intel's BHI guidance.
- */
- ALTERNATIVE "movb $5, %al", \
- "movb $12, %al", X86_FEATURE_BHI_CTRL
+ BHB_MOVB(OUTER, %al)
ANNOTATE_INTRA_FUNCTION_CALL
call 1f
@@ -1567,8 +1576,7 @@ SYM_FUNC_START(clear_bhb_loop_nofence)
* but some Clang versions (e.g. 18) don't like this.
*/
.skip 32 - 14, 0xcc
-2: ALTERNATIVE "movb $5, %ah", \
- "movb $7, %ah", X86_FEATURE_BHI_CTRL
+2: BHB_MOVB(INNER, %ah)
3: jmp 4f
nop
4: sub $1, %ah
Below is how the disassembly looks like:
clear_bhb_loop_nofence:
...
call 1f
jmp 5f
// BHB_MOVB(OUTER, %al)
mov $0x5,%al