Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
From: Ashok Raj
Date: Wed Aug 17 2022 - 07:58:10 EST
On Wed, Aug 17, 2022 at 10:09:00AM +0200, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 09:58:03AM +0200, Ingo Molnar wrote:
> > Also, Boris tells me that writing 0x0 to MSR_IA32_MCG_STATUS
> > apparently shuts the platform down - which is not ideal...
>
> Right, if you get an MCE raised while MCIP=0, the machine shuts down.
>
> And frankly, I can't think of a good solution to this whole issue:
>
> - with current hw, if you get an MCE and MCIP=0 -> shutdown
You have this reversed. if you get an MCE and MCIP=1 -> shutdown
I'm still very reluctant, this is actually an overkill. I added what is
possible based on Boris's recommendation.
When MCE's happen during the update they are always fatal errors. But
atleast you can log them, even if some other weird error were to be
observed because they stomed over the patch area that primary is currently
working on.
What we do here by setting MCIP=1, we promote to a more severe shutdown.
Ideally I would rather let the fallout happen since its observable vs a
blind shutdown is what we are promoting to.
>
> - in the future, even if you change the hardware to block MCEs from
> being detected while the microcode update runs, what happens if a CPU
> encounters a hw error during that update?
I don't think there ever will be blocking MCE's :-)
If an error happens, it leads to shutdown.
>
> You raise it immediately after? What if there are multiple MCEs? Not
> unheard of on a big machine...
Shutdown, shutdown.. There is only 1 MCE no matter how many CPUs you have.
Exception is the Local MCE which is recoverable, but only to user space.
If you get an error in the atomic we are polling, its a fatal error since
SW can't recover and we shutdown.
>
> Worse, what happens if there's a bitflip in the memory where the
> to-be-updated microcode patch is?
>
> You report the error afterwards?
>
> Just thinking about this makes me real nervous.
Overthinking :-).. If there is concensus, if Boris feels comfortable
enough, i would drop this patch.