Re: [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation

From: René Rebe
Date: Tue Oct 10 2023 - 17:19:25 EST


Hi Borislav,


> On 10. Oct 2023, at 10:39, Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> On Fri, Oct 06, 2023 at 11:32:44AM +0200, Borislav Petkov wrote:
>> I'm still working on it and I'll have something soon.
>
> Ok, try this below and see whether it fixes your reproducer.

On the first day the patch so far appears to have prevented
the spurious #UD exception to appear again.

Tested-by: René Rebe <rene@xxxxxxxxxxxx>

> Thx.
>
> ---
> From: "Borislav Petkov (AMD)" <bp@xxxxxxxxx>
> Date: Sat, 7 Oct 2023 12:57:02 +0200
> Subject: [PATCH] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
>
> Fix erratum #1485 on Zen4 parts where running with STIBP disabled can
> cause an #UD exception. The performance impact of the fix is negligible.
>
> Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxx>
> ---
> arch/x86/include/asm/msr-index.h | 9 +++++++--
> arch/x86/kernel/cpu/amd.c | 8 ++++++++
> 2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 1d111350197f..b37abb55e948 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -637,12 +637,17 @@
> /* AMD Last Branch Record MSRs */
> #define MSR_AMD64_LBR_SELECT 0xc000010e
>
> -/* Fam 17h MSRs */
> -#define MSR_F17H_IRPERF 0xc00000e9
> +/* Zen4 */
> +#define MSR_ZEN4_BP_CFG 0xc001102e
> +#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
>
> +/* Zen 2 */
> #define MSR_ZEN2_SPECTRAL_CHICKEN 0xc00110e3
> #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT BIT_ULL(1)
>
> +/* Fam 17h MSRs */
> +#define MSR_F17H_IRPERF 0xc00000e9
> +
> /* Fam 16h MSRs */
> #define MSR_F16H_L2I_PERF_CTL 0xc0010230
> #define MSR_F16H_L2I_PERF_CTR 0xc0010231
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 03ef962a6992..ece2b5b7b0fe 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -80,6 +80,10 @@ static const int amd_div0[] =
> AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
> AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
>
> +static const int amd_erratum_1485[] =
> + AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf),
> + AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf));
> +
> static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
> {
> int osvw_id = *erratum++;
> @@ -1149,6 +1153,10 @@ static void init_amd(struct cpuinfo_x86 *c)
> pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
> setup_force_cpu_bug(X86_BUG_DIV0);
> }
> +
> + if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
> + cpu_has_amd_erratum(c, amd_erratum_1485))
> + msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
> }
>
> #ifdef CONFIG_X86_32
> --
> 2.42.0.rc0.25.ga82fb66fed25
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

--
ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com