Re: [PATCH v4] x86/microcode/intel: Blacklist the specific BDW-EP for late loading

From: Ingo Molnar
Date: Fri Dec 29 2017 - 07:48:45 EST



* Jia Zhang <qianyue.zj@xxxxxxxxxxxxxxx> wrote:

>
>
> å 2017/12/28 äå8:24, Ingo Molnar åé:
> >
> > * Jia Zhang <qianyue.zj@xxxxxxxxxxxxxxx> wrote:
> >
> >> Instead of blacklisting all types of Broadwell processor when running
> >> a late loading, only BDW-EP (signature 0x406f1, aka family 6, model 79,
> >> stepping 1) with the microcode version less than 0x0b000021 needs to
> >> be blacklisted.
> >>
> >> The erratum is documented in the the public documentation #334165 (See
> >> the item BDF90 for details).
> >>
> >> Signed-off-by: Jia Zhang <qianyue.zj@xxxxxxxxxxxxxxx>
> >> ---
> >> arch/x86/kernel/cpu/microcode/intel.c | 12 ++++++++++--
> >> 1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> >> index 8ccdca6..79cad85 100644
> >> --- a/arch/x86/kernel/cpu/microcode/intel.c
> >> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> >> @@ -910,8 +910,16 @@ static bool is_blacklisted(unsigned int cpu)
> >> {
> >> struct cpuinfo_x86 *c = &cpu_data(cpu);
> >>
> >> - if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X) {
> >> - pr_err_once("late loading on model 79 is disabled.\n");
> >> + /*
> >> + * The Broadwell-EP processor with the microcode version less
> >> + * then 0x0b000021 may result in system hang when running a late
> >> + * loading. This behavior is documented in item BDF90, #334165
> >> + * (Intel Xeon Processor E7-8800/4800 v4 Product Family).
> >> + */
> >> + if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X &&
> >> + c->x86_mask == 0x01 && c->microcode < 0x0b000021) {
> >> + pr_err_once("late loading on cpu (sig 0x406f1) is disabled "
> >> + "due to erratum causing system hang.\n");
> >
> > Please never break user-readable messages mid-sentence!
> >
> > This should be something like:
> >
> > pr_err_once("Late loading of the CPU microcode (sig 0x406f1) is disabled due to Intel erratum BDF90 causing system hangs.\n");
> >
> > (note the spelling and readability improvements as well)
> >
> > Btw., what does 'sig 0x406f1' refer to?
>
> It is so-called processor signature which can be used to identify a
> model of x86 processor uniquely. It's the return value of cpuid
> instruction with leaf 1(eax == 1).

Ah, indeed, the (somewhat weird) encoding described in arch/x86/lib/cpu.c, which
is essentially family+model+stepping encoded into a single integer, right?

That whole area needs a good cleanup to be less confusing (we refer to the CPU
stepping as x86_stepping(), but the field is called ->x86_mask?), but in the
meanwhile, let's please make it more obvious in user facing message what's
happening.

Instead of using the microcode signature of the CPU model, please write out what's
going on:

pr_err_once("Not loading old microcode version: erratum BDF90 on Intel Broadwell-EP stepping 1 CPUs may cause system hangs.\n");

... and please also tell the user what to do about it:

pr_err_once("Please update your microcode files.\n");

!!

Agreed?

Thanks,

Ingo