Re: [PATCH v3 4/4] x86/mce: Add Zhaoxin LMCE support

From: Luck, Tony
Date: Tue Sep 17 2019 - 12:37:09 EST


On Tue, Sep 17, 2019 at 06:54:05AM +0000, Tony W Wang-oc wrote:
> But have a question about below codes:
> if (mcgstatus & MCG_STATUS_RIPV) {
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> return true;
> }
> These seems require all #MC exception errors set MCG_STATUS_RIPV = 1
> in order to skip synchronize which "return true;" actually does for this.
>
> As Intel SDM show, "Recoverable-not-continuable SRAR Type" errors may
> set MCG_STATUS_RIPV = 0, PCC = 0. When these #MC errors broadcast
> to offline CPU, may cause kernel panic with synchronize timeout (offline
> CPU can't skip synchronize in this case).
>
> Could "return true;" outside the if-case?
> if (mcgstatus & MCG_STATUS_RIPV) {
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> }
> return true;

If RIPV bit is not set in mcgstatus, then where will the CPU return
to if you simply return from the #MC handler? RIPV=1 means that the
CPU pushed a valid return instruction pointer onto the stack.

E.g. in the not-continuable case you mention above? The CPU
will likely do something undefined if you try to continue a
not-continuable instruction.

-Tony