Re: AMD EPYC microcode update bug?

From: Tom Lendacky
Date: Tue Jan 09 2018 - 17:47:45 EST


On 1/9/2018 4:28 PM, Gabriel C wrote:
> Hello ,
>
> I'm testing an EPYC system right now with 2 EPYC 7281 16-Core Processors.
>
> I'm on 4.15.0-rc7 and tested an update to microcode_amd_fam17h.bin.
>
> First run was made by using the early microcode option with dracut[1]
> so loading from a initrd. the driver reported 63 updated CPUs while CPU0
> got still old microcode.

I'm guessing that memory encryption is enabled, correct? I've submitted a
patch series to perform early initrd decryption for just this problem. I'm
incorporating some minor feedback and getting ready to submit the next
version.

In the meantime, if you specify mem_encrypt=off on the kernel command line
it should show CPU0 updated properly (with mem_encrypt=on and SMT enabled,
I believe it really does get updated when the sibling hread is updated -
do a rdmsr of 0x0000008b to verify).

Thanks,
Tom

>
>
> snip
>
> crazy@ant:~/fw$ dmesg | grep microcode
> [ 2.615876] microcode: microcode updated early to new patch_level=0x08001213
> [ 2.615906] microcode: CPU0: patch_level=0x08001207
> [ 2.615920] microcode: CPU1: patch_level=0x08001213
>
> ...
>
> crazy@ant:~/fw$ cat /proc/cpuinfo | head -n 30
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 23
> model : 1
> model name : AMD EPYC 7281 16-Core Processor
> stepping : 2
> microcode : 0x8001207
>
> ....
>
> After reloading the microcode with
>
> echo 1 > /sys/devices/system/cpu/microcode/reload
>
>
> CPU0 got new microcode too.
>
> Now I tested the same but without initrd early microcode loading
> and CONFIG_EXTRA_FIRMWARE set like this:
>
> CONFIG_EXTRA_FIRMWARE="amd-ucode/microcode_amd.bin
> amd-ucode/microcode_amd_fam15h.bin amd-ucode/microcode_amd_fam16h.bin
> amd-ucode/microcode_amd_fam17h.bin"
>
>
> This time all CPUs got update fine without the need of reloading the microcode.
>
> Is that some sort timing problem ?
>
>
> Also I notice on a Intel system the 'early updating' means that , is
> the first I see on dmesg
> while on AMD system it seems to fire up much later. Why is that ?
>
>
> Regards,
>
> Gabriel C
>
> 1. Fix for Fam17 micrcode :
> https://github.com/dracutdevs/dracut/commit/19453dc8744e6a59725c43b61b2e3db01cb4c57c#diff-bf0c6db1d4aaaa22a88b2649ddbfcd2a
>