Re: [patch V4 part 1 29/36] x86/mce: Send #MC singal from task work

From: Andy Lutomirski
Date: Thu May 07 2020 - 14:02:26 EST


On Tue, May 5, 2020 at 7:13 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>
> Convert #MC over to using task_work_add(); it will run the same code
> slightly later, on the return to user path of the same exception.

I think this patch is correct, but I think it's only one small and not
that obviously wrong step away from being broken:

> if ((m.cs & 3) == 3) {
> /* If this triggers there is no way to recover. Die hard. */
> BUG_ON(!on_thread_stack() || !user_mode(regs));
> - local_irq_enable();
> - preempt_enable();
>
> - if (kill_it || do_memory_failure(&m))
> - force_sig(SIGBUS);
> - preempt_disable();
> - local_irq_disable();
> + current->mce_addr = m.addr;
> + current->mce_status = m.mcgstatus;
> + current->mce_kill_me.func = kill_me_maybe;
> + if (kill_it)
> + current->mce_kill_me.func = kill_me_now;
> + task_work_add(current, &current->mce_kill_me, true);

This is fine if the source was CPL3, but it's not going to work if CPL
was 0. We don't *currently* do this from CPL0, but people keep
wanting to. So perhaps there should be a comment like:

/*
* The #MC originated at CPL3, so we know that we will go execute the
task_work before returning to the offending user code.
*/

IOW, if we want to recover from CPL0 #MC, we will need a different mechanism.

I also confess a certain amount of sadness that my beautiful
haha-not-really-atomic-here mechanism isn't being used anymore. :(

--Andy