RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

From: Ghannam, Yazen
Date: Thu May 16 2019 - 16:23:58 EST


> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-owner@xxxxxxxxxxxxxxx> On Behalf Of Borislav Petkov
> Sent: Thursday, May 16, 2019 12:21 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: Luck, Tony <tony.luck@xxxxxxxxx>; linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Thu, May 16, 2019 at 05:09:11PM +0000, Ghannam, Yazen wrote:
> > So that the sysfs files show the control values that are set in the
> > hardware. It seemed like this would be more helpful than showing all
> > 0xF's.
>
> Yeah, but it has been like that since forever and it hasn't bugged
> anybody. Probably because anybody doesn't even look at those files. As
> Tony says:
>
> "RAS is a lonely subsystem ... even EDAC gets more love."
>
> :-)))
>
> And adding yet another vendor check for this seemed just not worth it.
>
> > Should I send out another version of this set?
>
> I simply zapped 5/6. I still think your 6/6 makes sense though.
>
> ---
> From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Date: Tue, 30 Apr 2019 20:32:21 +0000
> Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly
>
> The OS is expected to write all bits to MCA_CTL for each bank,
> thus enabling error reporting in all banks. However, some banks
> may be unused in which case the registers for such banks are
> Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
> bits because of quirks, etc.
>
> A bank can be considered uninitialized if the MCA_CTL register returns
> zero. This is because either the OS did not write anything or because
> the hardware is enforcing RAZ/WI for the bank.
>
> Set a bank's init value based on if the control bits are set or not in
> hardware. Return an error code in the sysfs interface for uninitialized
> banks.
>
> [ bp: Massage a bit. ]
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "linux-edac@xxxxxxxxxxxxxxx" <linux-edac@xxxxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Tony Luck <tony.luck@xxxxxxxxx>
> Cc: "x86@xxxxxxxxxx" <x86@xxxxxxxxxx>
> Link: https://lkml.kernel.org/r/20190430203206.104163-7-Yazen.Ghannam@xxxxxxx
> ---
> arch/x86/kernel/cpu/mce/core.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..c049689f3d73 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = &mce_banks[i];
>
> - if (!b->init)
> - continue;
> - wrmsrl(msr_ops.ctl(i), b->ctl);
> - wrmsrl(msr_ops.status(i), 0);
> + if (b->init) {
> + wrmsrl(msr_ops.ctl(i), b->ctl);
> + wrmsrl(msr_ops.status(i), 0);
> + }
> +
> + /* Bank is initialized if bits are set in hardware. */
> + b->init = !!b->ctl;

We don't actually know if there are bits set in hardware until we read it back. So I don't think this is adding anything new.

Thanks,
Yazen