Re: [patch 02/10] x86/mce: Disable tracing and kprobes on do_machine_check()

From: Frederic Weisbecker
Date: Tue Feb 25 2020 - 20:13:57 EST


On Tue, Feb 25, 2020 at 10:36:38PM +0100, Thomas Gleixner wrote:
> From: Andy Lutomirski <luto@xxxxxxxxxx>
>
> do_machine_check() can be raised in almost any context including the most
> fragile ones. Prevent kprobes and tracing.
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> arch/x86/include/asm/traps.h | 3 ---
> arch/x86/kernel/cpu/mce/core.c | 12 ++++++++++--
> 2 files changed, 10 insertions(+), 5 deletions(-)
>
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -88,9 +88,6 @@ dotraplinkage void do_page_fault(struct
> dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *regs, long error_code);
> dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code);
> dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code);
> -#ifdef CONFIG_X86_MCE
> -dotraplinkage void do_machine_check(struct pt_regs *regs, long error_code);
> -#endif
> dotraplinkage void do_simd_coprocessor_error(struct pt_regs *regs, long error_code);
> #ifdef CONFIG_X86_32
> dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code);
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1213,8 +1213,14 @@ static void __mc_scan_banks(struct mce *
> * On Intel systems this is entered on all CPUs in parallel through
> * MCE broadcast. However some CPUs might be broken beyond repair,
> * so be always careful when synchronizing with others.
> + *
> + * Tracing and kprobes are disabled: if we interrupted a kernel context
> + * with IF=1, we need to minimize stack usage. There are also recursion
> + * issues: if the machine check was due to a failure of the memory
> + * backing the user stack, tracing that reads the user stack will cause
> + * potentially infinite recursion.
> */
> -void do_machine_check(struct pt_regs *regs, long error_code)
> +void notrace do_machine_check(struct pt_regs *regs, long error_code)
> {
> DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
> DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> @@ -1360,6 +1366,7 @@ void do_machine_check(struct pt_regs *re
> ist_exit(regs);
> }
> EXPORT_SYMBOL_GPL(do_machine_check);
> +NOKPROBE_SYMBOL(do_machine_check);

That won't protect all the function called by do_machine_check(), right?
There are lots of them.