Re: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models
From: Borislav Petkov
Date: Fri Mar 22 2019 - 13:34:38 EST
On Thu, Mar 21, 2019 at 08:25:18PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> AMD Family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
> errors under certain conditions. The errors are benign and can safely be
> ignored. However, the high error rate may cause the MCA threshold
> counter to overflow causing a high rate of thresholding interrupts. In
> addition, users may see the errors reported through the AMD MCE decoder
> module, even with the interrupt disabled, due to MCA polling.
> This error is reported through the Instruction Fetch bank.
> Clear the "Counter Present" bit in the Instruction Fetch bank's
> MCA_MISC0 register. This will prevent enabling MCA thresholding on this
> bank which will prevent the high interrupt rate due to this error.
> Define a function to filter these errors from the MCE event pool.
> Install this function during AMD vendor init. The MCA banks are enabled
> after vendor init, so the filter function will be installed before the
> spurious errors will be reported.
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.14.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.14.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.14.x
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> * Filter out the error earlier in MCE code rather than later in EDAC.
> arch/x86/kernel/cpu/mce/amd.c | 57 ++++++++++++++++++++++++++++-------
> 1 file changed, 46 insertions(+), 11 deletions(-)
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index e64de5149e50..2db85f65b41e 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -563,22 +563,55 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
> return offset;
> +bool filter_mce_rv(struct mce *m)
> + enum smca_bank_types bank_type = smca_get_bank_type(m->bank);
> + u8 xec = (m->status >> 16) & 0x3F;
> + /*
> + * Spurious errors of this type may be reported.
> + * See Family 17h Models 10h-2Fh Erratum #1114.
> + */
> + if (bank_type == SMCA_IF && xec == 10)
> + return true;
> + return false;
> +static void filter_mce_check(struct cpuinfo_x86 *c)
> + if (c->x86 == 0x17 && (c->x86_model >= 0x10 && c->x86_model <= 0x2F))
> + filter_mce = filter_mce_rv;
Why all the noodling here with a check function which assigns a
filter_mce_rv (btw, that "rv" means nothing outside of AMD) and a
Why not a simple filter_mce() in mce/core.c which calls amd_filter_mce()
based on x86_vendor and amd_filter_mce() is defined in mce/amd.c?
Good mailing practices for 400: avoid top-posting and trim the reply.