Re: [PATCH v2 07/16] x86/mce/amd: Simplify DFR handler setup

From: Borislav Petkov
Date: Wed Apr 24 2024 - 15:07:56 EST


On Thu, Apr 04, 2024 at 10:13:50AM -0500, Yazen Ghannam wrote:
> AMD systems with the SUCCOR feature can send an APIC LVT interrupt for
> deferred errors. The LVT offset is 0x2 by convention, i.e. this is the
> default as listed in hardware documentation.
>
> However, the MCA registers may list a different LVT offset for this
> interrupt. The kernel should honor the value from the hardware.

There's this "may" thing again.

Is this enablement for some future hw too or do you really trust the
value in MSR_CU_DEF_ERR is programmed correctly in all cases?

> Simplify the enable flow by using the hardware-provided value. Any
> conflicts will be caught by setup_APIC_eilvt(). Conflicts on production
> systems can be handled as quirks, if needed.

Well, which systems support succor?

I'd like to test this on them before we face all the quirkery. :)

That area has been plagued by hw snafus if you look at
setup_APIC_eilvt() and talk to uncle Robert. :-P

> @@ -595,17 +584,15 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
> if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
> return;
>
> + /*
> + * Trust the value from hardware.
> + * If there's a conflict, then setup_APIC_eilvt() will throw an error.
> + */
> def_new = (low & MASK_DEF_LVTOFF) >> 4;
> - if (!(low & MASK_DEF_LVTOFF)) {
> - pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
> - def_new = DEF_LVT_OFF;
> - low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
> - }
> + if (setup_APIC_eilvt(def_new, DEFERRED_ERROR_VECTOR, APIC_EILVT_MSG_FIX, 0))
> + return;
>
> - def_offset = setup_APIC_deferred_error(def_offset, def_new);
> - if ((def_offset == def_new) &&
> - (deferred_error_int_vector != amd_deferred_error_interrupt))
> - deferred_error_int_vector = amd_deferred_error_interrupt;

There was a reason for that - deferred_error_int_vector is a global var
and you're calling enable_deferred_error_interrupt() on each CPU.

> + deferred_error_int_vector = amd_deferred_error_interrupt;
>
> if (!mce_flags.smca)
> low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette