Re: [PATCH v4] x86/microcode/intel: Blacklist the specific BDW-EP for late loading

From: Jia Zhang
Date: Fri Dec 29 2017 - 08:18:10 EST




å 2017/12/29 äå8:48, Ingo Molnar åé:
>
> * Jia Zhang <qianyue.zj@xxxxxxxxxxxxxxx> wrote:
>
>>
>>
>> å 2017/12/28 äå8:24, Ingo Molnar åé:
>>>
>>> * Jia Zhang <qianyue.zj@xxxxxxxxxxxxxxx> wrote:
>>>
>>>> Instead of blacklisting all types of Broadwell processor when running
>>>> a late loading, only BDW-EP (signature 0x406f1, aka family 6, model 79,
>>>> stepping 1) with the microcode version less than 0x0b000021 needs to
>>>> be blacklisted.
>>>>
>>>> The erratum is documented in the the public documentation #334165 (See
>>>> the item BDF90 for details).
>>>>
>>>> Signed-off-by: Jia Zhang <qianyue.zj@xxxxxxxxxxxxxxx>
>>>> ---
>>>> arch/x86/kernel/cpu/microcode/intel.c | 12 ++++++++++--
>>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
>>>> index 8ccdca6..79cad85 100644
>>>> --- a/arch/x86/kernel/cpu/microcode/intel.c
>>>> +++ b/arch/x86/kernel/cpu/microcode/intel.c
>>>> @@ -910,8 +910,16 @@ static bool is_blacklisted(unsigned int cpu)
>>>> {
>>>> struct cpuinfo_x86 *c = &cpu_data(cpu);
>>>>
>>>> - if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X) {
>>>> - pr_err_once("late loading on model 79 is disabled.\n");
>>>> + /*
>>>> + * The Broadwell-EP processor with the microcode version less
>>>> + * then 0x0b000021 may result in system hang when running a late
>>>> + * loading. This behavior is documented in item BDF90, #334165
>>>> + * (Intel Xeon Processor E7-8800/4800 v4 Product Family).
>>>> + */
>>>> + if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X &&
>>>> + c->x86_mask == 0x01 && c->microcode < 0x0b000021) {
>>>> + pr_err_once("late loading on cpu (sig 0x406f1) is disabled "
>>>> + "due to erratum causing system hang.\n");
>>>
>>> Please never break user-readable messages mid-sentence!
>>>
>>> This should be something like:
>>>
>>> pr_err_once("Late loading of the CPU microcode (sig 0x406f1) is disabled due to Intel erratum BDF90 causing system hangs.\n");
>>>
>>> (note the spelling and readability improvements as well)
>>>
>>> Btw., what does 'sig 0x406f1' refer to?
>>
>> It is so-called processor signature which can be used to identify a
>> model of x86 processor uniquely. It's the return value of cpuid
>> instruction with leaf 1(eax == 1).
>
> Ah, indeed, the (somewhat weird) encoding described in arch/x86/lib/cpu.c, which
> is essentially family+model+stepping encoded into a single integer, right?

Totally correct.

>
> That whole area needs a good cleanup to be less confusing (we refer to the CPU
> stepping as x86_stepping(), but the field is called ->x86_mask?), but in the

Yes. This is a confusing name. I will send another patch to clean up it.

> meanwhile, let's please make it more obvious in user facing message what's
> happening.
>
> Instead of using the microcode signature of the CPU model, please write out what's
> going on:
>
> pr_err_once("Not loading old microcode version: erratum BDF90 on Intel Broadwell-EP stepping 1 CPUs may cause system hangs.\n");
>
> ... and please also tell the user what to do about it:
>
> pr_err_once("Please update your microcode files.\n");

Let me give a full background and we will have a best description for
this erratum clearly.

If current processor signature matches the problematic Broadwell-EP
model (0x406f1) *AND* current version of microcode is less than
0x0b000021, launching a microcode update in Linux runtime (or so-called
late loading) must be prohibited in order to prevent from system hang
due to the erratum. Namely, the end user has to make a BIOS update to
uprev the microcode. The code of microcode update loader in BIOS can
safely issus an microcode update without the concern about this erratum.
This is the so-called manner of early loading.

Thanks,
Jia

>
> !!
>
> Agreed?
>
> Thanks,
>
> Ingo
>