Re: [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline

From: Borislav Petkov
Date: Fri Sep 04 2015 - 07:50:25 EST


On Thu, Sep 03, 2015 at 02:17:04PM -0400, Ashok Raj wrote:
> During CPU offline, or during suspend/resume operations, its not safe to
> clear MCi_CTL. These MSR's are either thread scoped (meaning private to
> thread), or core scoped (private to threads in that core only), or socket
> scope i.e visible and controllable from all threads in the socket.
>
> When we turn off during CPU_OFFLINE, just offlining a single CPU will
> stop signaling for all the socket wide resources, such as LLC, iMC for e.g.
>
> It is true for Intel CPU's. But there seems some history that other processors
> may require to turn these off during every CPU offline.
>
> Intel Secure Guard eXtentions will be disabled when these controls are cleared
> from a security perspective. This patch enables SGX to work across
> suspend/resume.

What does that mean? What does SGX have to do with MCI_CTL registers?
Explain that in the commit message so that !Intel people can understand.

> - Consolidated some code to use sharing
> - Minor changes to some prototypes to fit usage.
> - Left handling same for non-Intel CPU models to avoid any unknown regressions.

For the whole patch text do:

s/cpu/CPU/

s/CPU's/CPUs/ and s/MSR's/MSRs/ if you mean plural. Also spellcheck all text.

>
> Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx>
> Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
> Tested-by: Serge Ayoun <serge.ayoun@xxxxxxxxx>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 38 ++++++++++++++++++++++++++++----------
> 1 file changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index d350858..5498a79 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2100,7 +2100,7 @@ int __init mcheck_init(void)
> * Disable machine checks on suspend and shutdown. We can't really handle
> * them later.
> */
> -static int mce_disable_error_reporting(void)
> +static void mce_disable_error_reporting(void)
> {
> int i;
>
> @@ -2110,17 +2110,40 @@ static int mce_disable_error_reporting(void)
> if (b->init)
> wrmsrl(MSR_IA32_MCx_CTL(i), 0);
> }
> - return 0;
> + return;
> +}
> +
> +static void _vendor_disable_error_reporting(void)

Why the "_" prepended here?

> +{
> + struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> + switch (c->x86_vendor) {
> + case X86_VENDOR_INTEL:
> + /*
> + * Don't clear on Intel CPU's. Some of these MSR's are
> + * socket wide. Disabling them for just a single cpu offline
> + * is bad, since it will inhibit reporting for all shared
> + * resources.. such as LLC, iMC for e.g.
> + */
> + break;
> + default:
> + /*
> + * Disble MCE reporting for all other CPU Vendor.
> + * Don't want to break functionality on those
> + */
> + mce_disable_error_reporting();
> + }

I think the switch-case makes this unnecessarily bloated as code. Just
do:

if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
return;

mce_disable_error_reporting();

...

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/