Re: [PATCH v4 04/11] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Nikolay Borisov
Date: Fri Nov 21 2025 - 11:45:58 EST
On 11/21/25 18:40, Dave Hansen wrote:
On 11/19/25 22:18, Pawan Gupta wrote:
- CLEAR_BHB_LOOP_SEQ 5, 5
+ /* loop count differs based on CPU-gen, see Intel's BHI guidance */
+ ALTERNATIVE (CLEAR_BHB_LOOP_SEQ 5, 5), \
+ __stringify(CLEAR_BHB_LOOP_SEQ 12, 7), X86_FEATURE_BHI_CTRL
There are a million ways to skin this cat. But I'm not sure I really
like the end result here. It seems a little overkill to use ALTERNATIVE
to rewrite a whole sequence just to patch two constants in there.
What if the CLEAR_BHB_LOOP_SEQ just took its inner and outer loop counts
as register arguments? Then this would look more like:
ALTERNATIVE "mov $5, %rdi; mov $5, %rsi",
"mov $12, %rdi; mov $7, %rsi",
...
CLEAR_BHB_LOOP_SEQ
Or, even global variables:
mov outer_loop_count(%rip), %rdi
mov inner_loop_count(%rip), %rsi
nit: FWIW I find this rather tacky, because the way the registers are being used (although they do follow the x86-64 calling convention) is obfuscated in the macro itself.
and then have some C code somewhere that does:
if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
outer_loop_count = 5;
inner_loop_count = 5;
} else {
outer_loop_count = 12;
inner_loop_count = 7;
}
OTOH: the global variable approach seems saner as in the macro you'd have direct reference to them and so it will be more obvious how things are setup.
<snip>