Re: [PATCH] x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
From: Andy Lutomirski
Date: Mon Jan 05 2015 - 20:02:17 EST
On Mon, Jan 5, 2015 at 4:44 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> We now switch to the kernel stack when a machine check interrupts
> during user mode. This means that we can perform recovery actions
> in the tail of do_machine_check()
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
>
> ---
> On top of Andy's x86/paranoid branch
> Andy: Should I really move that:
> pr_err("Uncorrected hardware memory error ...
> inside the ist_begin_non_atomic() section?
>
I think I like it as is.
[...]
> @@ -1220,6 +1177,26 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> out:
> sync_core();
> +
> + if (recover_paddr == ~0ull)
> + goto done;
> +
> + pr_err("Uncorrected hardware memory error in user-access at %llx",
> + recover_paddr);
printk is safe from IRQ context, so this should be okay unless we've
totally screwed up. And, if we totally screwed up, seeing this before
the BUGs in ist_begin_non_atomic would be nice.
> + /*
> + * We must call memory_failure() here even if the current process is
> + * doomed. We still need to mark the page as poisoned and alert any
> + * other users of the page.
> + */
> + ist_begin_non_atomic(regs);
> + local_irq_enable();
> + if (memory_failure(recover_paddr >> PAGE_SHIFT, MCE_VECTOR, flags) < 0) {
> + pr_err("Memory error not recovered");
> + force_sig(SIGBUS, current);
> + }
> + local_irq_disable();
> + ist_end_non_atomic();
> +done:
> ist_exit(regs, prev_state);
> }
For the context-related bits:
Reviewed-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Should I stick this in my -next branch so it can stew?
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/