Re: [PATCH] x86/AMD: Apply Erratum 688 fix when BIOS doesn't

From: Ingo Molnar
Date: Sun Oct 22 2017 - 07:04:48 EST



* Borislav Petkov <bp@xxxxxxxxx> wrote:

> From: Borislav Petkov <bp@xxxxxxx>
>
> Some F14h machines have an erratum which, "under a highly specific
> and detailed set of internal timing conditions" can lead to skipping
> instructions and rIP corruption. Add the fix for those machines when
> their BIOS doesn't apply it or there simply isn't BIOS update for them.
>
> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
> Tested-by: <mirh@xxxxxxxxxxxxx>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=197285
> Cc: Sherry Hurwitz <sherry.hurwitz@xxxxxxx>
> Cc: Yazen Ghannam <Yazen.Ghannam@xxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> ---
> arch/x86/kernel/amd_nb.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 458da8509b75..7ad1dfc8f40e 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -27,6 +27,8 @@ static const struct pci_device_id amd_root_ids[] = {
> {}
> };
>
> +#define PCI_DEVICE_ID_AMD_CNB17H_F4 0x1704
> +
> const struct pci_device_id amd_nb_misc_ids[] = {
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MISC) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MISC) },
> @@ -37,6 +39,7 @@ const struct pci_device_id amd_nb_misc_ids[] = {
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_DF_F3) },
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) },
> {}
> };
> EXPORT_SYMBOL_GPL(amd_nb_misc_ids);
> @@ -48,6 +51,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F4) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F4) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_DF_F4) },
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
> {}
> };
>
> @@ -402,11 +406,46 @@ void amd_flush_garts(void)
> }
> EXPORT_SYMBOL_GPL(amd_flush_garts);
>
> +static void __fix_erratum_688(void *info)
> +{
> +#define MSR_AMD64_IC_CFG 0xC0011021
> +
> + msr_set_bit(MSR_AMD64_IC_CFG, 3);
> + msr_set_bit(MSR_AMD64_IC_CFG, 14);
> +}
> +
> +/* Apply erratum 688 fix so machines without a BIOS fix work. */
> +static __init void fix_erratum_688(void)
> +{
> + struct pci_dev *F4;
> + u32 val;
> +
> + if (boot_cpu_data.x86 != 0x14)
> + return;
> +
> + if (!amd_northbridges.num)
> + return;
> +
> + F4 = node_to_amd_nb(0)->link;
> + if (!F4)
> + return;
> +
> + if (pci_read_config_dword(F4, 0x164, &val))
> + return;
> +
> + if (val & BIT(2))
> + return;
> +
> + on_each_cpu(__fix_erratum_688, NULL, 0);

Any objections to me adding a printk message that we applied a fix?

pr_info("x86/cpu/AMD: CPU erratum 688 worked around\n");

or so?

That would also create some pressure for customers to prod manufacturers to prod
BIOS makers to fix the erratum in a BIOS update or so.

Plus, in the unlikely event that the erratum was not applied due to some other
erratum, or the erratum was mis-documented, we'd eventually discover that as well.

Thanks,

Ingo