RE: [PATCH 1/4] x86/mce: do not overwrite no_way_out if mce_end() fails

From: Paoloni, Gabriele
Date: Fri Nov 20 2020 - 12:31:40 EST


Hi Boris

> -----Original Message-----
> From: Borislav Petkov <bp@xxxxxxxxx>
> Sent: Friday, November 20, 2020 6:08 PM
> To: Paoloni, Gabriele <gabriele.paoloni@xxxxxxxxx>
> Cc: Luck, Tony <tony.luck@xxxxxxxxx>; tglx@xxxxxxxxxxxxx;
> mingo@xxxxxxxxxx; x86@xxxxxxxxxx; hpa@xxxxxxxxx; linux-
> edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> safety@xxxxxxxxxxxxxxxx
> Subject: Re: [PATCH 1/4] x86/mce: do not overwrite no_way_out if
> mce_end() fails
>
> On Wed, Nov 18, 2020 at 03:15:49PM +0000, Gabriele Paoloni wrote:
> > Currently if mce_end() fails no_way_out is set equal to worst.
> > worst is the worst severirty that was found in the MCA banks
> ^^^^^^^^^
>
> Please introduce a spellchecker into your patch creation workflow.
>
> > associated to the current CPU; however at this point no_way_out
> ^
> with
>
>
> > could be already set by mca_start() by looking at all severities
>
> I think you mean "could have been already set" here
>
> > of all CPUs that entered the MCE handler.
> > if mce_end() fails we first check if no_way_out is already set and
>
> Please use passive voice in your commit message: no "we" or "I", etc.
>
> Also, pls start new sentences with a capital letter and end them with a
> fullstop.

Sorry about the grammar errors above, I'll pay more attention in future

>
> > if so we stick to it, otherwise we use the local worst value
>
> So basically you're trying to say here that no_way_out might have been
> already set and other CPUs could overwrite it and that should not
> happen.
>
> Is that what you mean?

I mean that on this CPU thread at this point mce_start() already cached
global_nwo and hence could accumulate fatal severities of other CPUs.

Now here if mce_end() fails we only consider the local 'worst' severity
and we overwrite those already cached.

>
> > Signed-off-by: Gabriele Paoloni <gabriele.paoloni@xxxxxxxxx>
> > Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
> > ---
> > arch/x86/kernel/cpu/mce/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/core.c
> b/arch/x86/kernel/cpu/mce/core.c
> > index 4102b866e7c0..b990892c6766 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1385,7 +1385,7 @@ noinstr void do_machine_check(struct pt_regs
> *regs)
> > */
> > if (!lmce) {
> > if (mce_end(order) < 0)
> > - no_way_out = worst >= MCE_PANIC_SEVERITY;
> > + no_way_out = no_way_out ? no_way_out : worst >=
> MCE_PANIC_SEVERITY;
>
> I had to stare at this a bit to figure out what you're doing. So how
> about simplifying this:
>
> if (!no_way_out)
> no_way_out = worst >=

Yes that works as well improving readability.

If ok I will fix the grammar and rewrite this code in v2.

Many Thanks
Gab

> MCE_PANIC_SEVERITY;
>
> ?
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale 04236760155
Repertorio Economico Amministrativo n. 997124
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.