Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

From: Borislav Petkov
Date: Mon May 27 2019 - 19:32:55 EST


On Thu, May 23, 2019 at 08:00:33PM +0000, Ghannam, Yazen wrote:
> I did a bit more testing and I noticed that writing "0" disables a bank with no way to reenable it.
>
> For example:
> 1) Read bank10.
> a) Succeeds; returns "fffffffffffffff".
> 2) Write "0" to bank10.
> a) Succeeds; hardware register is set to "0".
> b) Hardware register is checked, and b->init=0.
> 3) Read bank10.
> a) Fails, because b->init=0.
> 4) Write non-zero value to bank10 to reenable it.
> a) Fails, because b->init=0.
> 5) Reboot needed to reset bank.
>
> Is that okay?

Nope, that doesn't sound correct to me.

I guess the cleanest way to handle his properly would be to have a
function called something like __mcheck_cpu_init_banks() which gets
called in mcheck_cpu_init() after the quirks have run and then does the
final poking of the banks and sets b->init properly.

__mcheck_cpu_init_clear_banks() should then be renamed to
__mcheck_cpu_clear_banks() to denote that it only clears the banks and
would only do:

if (!b->init)
continue;

wrmsrl(msr_ops.ctl(i), b->ctl);
wrmsrl(msr_ops.status(i), 0);

And then sprinkle some commenting to not forget the scheme again.

Yeah, this sounds clean to me but you guys might have a better idea...

Thx.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.