Re: [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
From: Andrew Cooper
Date: Fri Apr 04 2025 - 10:16:35 EST
On 04/04/2025 3:14 pm, Uros Bizjak wrote:
>
>
> On 2. 04. 25 19:24, Andrew Cooper wrote:
>> Commit 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
>> workaround, add barriers") adds barriers, justified with:
>>
>> ... and add memory barriers around it since the documentation is
>> explicit
>> that CLFLUSH is only ordered with respect to MFENCE.
>>
>> This also triggered the same adjustment in commit
>> f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched
>> IPIs") during development, although it failed to get the
>> static_cpu_has_bug()
>> treatment.
>>
>> X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel
>> CPUs,
>> and the SDM currently states:
>>
>> Executions of the CLFLUSH instruction are ordered with respect to
>> each
>> other and with respect to writes, locked read-modify-write
>> instructions,
>> and fence instructions[1].
>>
>> With footnote 1 reading:
>>
>> Earlier versions of this manual specified that executions of the
>> CLFLUSH
>> instruction were ordered only by the MFENCE instruction. All
>> processors
>> implementing the CLFLUSH instruction also order it relative to the
>> other
>> operations enumerated above.
>>
>> i.e. The SDM was incorrect at the time, and barriers should not have
>> been
>> inserted. Double checking the original AAI65 errata (not available from
>> intel.com any more) shows no mention of barriers either.
>>
>> Note: If this were a general codepath, the MFENCEs would be needed,
>> because
>> AMD CPUs of the same vintage do sport otherwise-unordered
>> CLFLUSHs.
>>
>> Furthermore, use a plain alternative, rather than
>> static_cpu_has_bug() and/or
>> no optimisation. The workaround is a single instruction.
>>
>> Use an explicit %rax pointer rather than a general memory operand,
>> because
>> MONITOR takes the pointer implicitly in the same way.
>>
>> Link:
>> https://web.archive.org/web/20090219054841/http://download.intel.com/design/xeon/specupdt/32033601.pdf
>> Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
>> workaround, add barriers")
>> Fixes: f8e617f45829 ("sched/idle/x86: Optimize unnecessary
>> mwait_idle() resched IPIs")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> CC: Ingo Molnar <mingo@xxxxxxxxxx>
>> CC: Borislav Petkov <bp@xxxxxxxxx>
>> CC: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>> CC: x86@xxxxxxxxxx
>> CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
>> CC: linux-kernel@xxxxxxxxxxxxxxx
>>
>> v2:
>> * Fix the same pattern in mwait_idle() too
>> * Expand on why we're not using a general memory operand.
>> ---
>> arch/x86/include/asm/mwait.h | 11 +++++------
>> arch/x86/kernel/process.c | 10 ++++------
>> 2 files changed, 9 insertions(+), 12 deletions(-)
>
> There is another instance of the same sequence in
> arch/x86/kernel/smpboot.c:
>
> /*
> * The CLFLUSH is a workaround for erratum AAI65 for
> * the Xeon 7400 series. It's not clear it is actually
> * needed, but it should be harmless in either case.
> * The WBINVD is insufficient due to the spurious-wakeup
> * case where we return around the loop.
> */
> mb();
> clflush(md);
> mb();
> __monitor(md, 0, 0);
> mb();
> __mwait(eax_hint, 0);
>
> Should this also be converted to the new sequence?
Yes, it ought to.
I'm OoO for a while though. If you fancy doing a patch, please go
ahead. Or a maintainer could fold a 3rd hunk into this patch?
~Andrew