Re: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits

From: Pawan Gupta
Date: Mon Feb 14 2022 - 17:41:30 EST


On 14.02.2022 18:38, Borislav Petkov wrote:
On Wed, Feb 09, 2022 at 01:04:36PM -0800, Pawan Gupta wrote:
tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and
CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use
MSR_IA32_TSX_CTRL when supported.

Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")
Reported-by: kernel test robot <lkp@xxxxxxxxx>
Tested-by: Neelima Krishnan <neelima.krishnan@xxxxxxxxx>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>

<--- I'm assuming this needs to be

Cc: <stable@xxxxxxxxxxxxxxx>

?

Yes, this needs to be backported to a few kernels that have the commit
293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts").
Once this is reviewed, I will send a separate email to stable@ with the
list of stable kernels.

@@ -106,13 +110,11 @@ void __init tsx_init(void)
int ret;

/*
- * Hardware will always abort a TSX transaction if both CPUID bits
- * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
- * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
- * here.
+ * Hardware will always abort a TSX transaction when CPUID
+ * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
+ * CPUID.RTM and CPUID.HLE bits. Clear them here.
*/
- if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
- boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+ if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {

So you test here X86_FEATURE_RTM_ALWAYS_ABORT and tsx_clear_cpuid()
tests it again. Simplify I guess.

X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for
MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of
tsx_clear_cpuid() this condition is met, and test for
X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers
must also have this check, otherwise the MSR write will fault.
tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
tsx_clear_cpuid();
setup_clear_cpu_cap(X86_FEATURE_RTM);

Also, here you clear X86_FEATURE_RTM and X86_FEATURE_HLE but the other
caller of tsx_clear_cpuid() - init_intel() - doesn't clear those bits.
Why?

Calling setup_clear_cpu_cap() by boot CPU is sufficient to clear these
bits for secondary CPUs. Moreover, init_intel() is also called during
cpu hotplug, clearing cached CPUID bits will not be safe after boot.

IOW, can we concentrate the whole tsx_clear_cpuid() logic inside it and
have callers only call it unconditionally. Then the decision whether
to disable TSX and clear bits will happen all solely in that function
instead of being spread around the boot code and being inconsistent.

There are certain cases where this will leave the system in an
inconsistent state, for example smt toggle after a late microcode update
that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we
unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of
the CPUs will report TSX feature and other half will not.

Thanks,
Pawan