Re: [PATCH] x86/MCE: Get microcode revision from cpu_data instead of boot_cpu_data

From: Sironi, Filippo
Date: Thu Dec 07 2023 - 04:34:57 EST


> > Boris, I just took a quick look and I might be missing something. If cores
> > fail to load the microcode or timeout, we taint the kernel, print an error
> > message, and then bubble up an error to userspace via:
> >
> > load_late_stop_cpus
> > load_late_locked
> > reload_store
> >
> > Right?
>
> Yap.
>
> > We would take servers that fail out of production;
>
> And I'd like to hear about such issues. We added this failure checking
> only recently because something might go wrong and we want to warn. But
> it all updates fine here so kinda hard to test.

In a very large fleet, let's say that we have a handful of DPMs when considering
the entire processor, which means that in terms of cores, the defect rate is
much much lower.

What we've seen in these cases is that early loading - through the BIOS, I
actually never tried via the hypervisor - is successful while late loading
consistently fails. When it fails, we've seen two cases: 1/ the core still
reports the old microcode version or 2/ the core reports a bogus microcode
version (0xfffffffe is quite common, at least on Intel).

> My expectation is that if microcode fails loading on a subset of
> machines, the machine would more or less freeze. Depending, ofc, on what
> the microcode is updating...

It's bi-modal. We've seen servers that move along till we take them out of
production as well as servers that fail with an MCE of some sort likely leading
to a CATERR/IERR.

> > however, for others it might be interesting to have the correct
> > information. The patch - with a reworked commit message - might still
> > be useful to a few.
>
>
> https://lore.kernel.org/r/20231118193248.1296798-3-yazen.ghannam@xxxxxxx <mailto:20231118193248.1296798-3-yazen.ghannam@xxxxxxx>
>
>
> :)

:looking:




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879