RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way

From: Ghannam, Yazen
Date: Wed Apr 10 2019 - 12:58:20 EST

> -----Original Message-----
> From: Borislav Petkov <bp@xxxxxxxxx>
> Sent: Wednesday, April 10, 2019 11:41 AM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> On Wed, Apr 10, 2019 at 04:36:30PM +0000, Ghannam, Yazen wrote:
> > We have this case on AMD Family 17h with Bank 4. The hardware enforces
> > this bank to be Read-as-Zero/Writes-Ignored.
> >
> > This behavior is enforced whether the bank is in the middle or at the
> > end.
> Does num_banks contain the disabled bank? If so, then it will work.

Yes, unused banks in the middle are counted in the MCG_CAP[Count] value.

> > I'm thinking to redo the sysfs interface for banks in another patch
> > set. I could include a new file to indicate enabled/disabled, or maybe
> > just update the documentation to describe this case.
> No, the write to the bank controls should fail on a disabled bank.

Okay, so you're saying the sysfs access should fail if a bank is disabled. Is that correct?

Does "disabled" mean one or both of these?
Unused = RAZ/WI in hardware
Uninitialized = Not initialized by kernel due to quirks, etc.

For an unused bank, it doesn't hurt to write MCA_CTL, but really there's no reason to do so and go through mce_restart().

For an uninitialized bank, should we prevent users from overriding the kernel's settings?