Re: [patch 02/10] x86/mce: Disable tracing and kprobes on do_machine_check()

From: Andy Lutomirski
Date: Wed Feb 26 2020 - 00:29:03 EST


On 2/25/20 5:13 PM, Frederic Weisbecker wrote:
> On Tue, Feb 25, 2020 at 10:36:38PM +0100, Thomas Gleixner wrote:
>> From: Andy Lutomirski <luto@xxxxxxxxxx>
>>
>> do_machine_check() can be raised in almost any context including the most
>> fragile ones. Prevent kprobes and tracing.
>>
>> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> ---
>> arch/x86/include/asm/traps.h | 3 ---
>> arch/x86/kernel/cpu/mce/core.c | 12 ++++++++++--
>> 2 files changed, 10 insertions(+), 5 deletions(-)
>>
>> --- a/arch/x86/include/asm/traps.h
>> +++ b/arch/x86/include/asm/traps.h
>> @@ -88,9 +88,6 @@ dotraplinkage void do_page_fault(struct
>> dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *regs, long error_code);
>> dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code);
>> dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code);
>> -#ifdef CONFIG_X86_MCE
>> -dotraplinkage void do_machine_check(struct pt_regs *regs, long error_code);
>> -#endif
>> dotraplinkage void do_simd_coprocessor_error(struct pt_regs *regs, long error_code);
>> #ifdef CONFIG_X86_32
>> dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code);
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -1213,8 +1213,14 @@ static void __mc_scan_banks(struct mce *
>> * On Intel systems this is entered on all CPUs in parallel through
>> * MCE broadcast. However some CPUs might be broken beyond repair,
>> * so be always careful when synchronizing with others.
>> + *
>> + * Tracing and kprobes are disabled: if we interrupted a kernel context
>> + * with IF=1, we need to minimize stack usage. There are also recursion
>> + * issues: if the machine check was due to a failure of the memory
>> + * backing the user stack, tracing that reads the user stack will cause
>> + * potentially infinite recursion.
>> */
>> -void do_machine_check(struct pt_regs *regs, long error_code)
>> +void notrace do_machine_check(struct pt_regs *regs, long error_code)
>> {
>> DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
>> DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>> @@ -1360,6 +1366,7 @@ void do_machine_check(struct pt_regs *re
>> ist_exit(regs);
>> }
>> EXPORT_SYMBOL_GPL(do_machine_check);
>> +NOKPROBE_SYMBOL(do_machine_check);
>
> That won't protect all the function called by do_machine_check(), right?
> There are lots of them.
>

It at least means we can survive to run actual C code in
do_machine_check(), which lets us try to mitigate this issue further.
PeterZ has patches for that, and maybe this series fixes it later on.
(I'm reading in order!)