Re: [PATCH 2/2] x86, mce: rework use of TIF_MCE_NOTIFY

From: Hidetoshi Seto
Date: Tue Jun 14 2011 - 23:18:41 EST


(2011/06/15 11:10), Tony Luck wrote:
> On Tue, Jun 14, 2011 at 6:29 PM, Hidetoshi Seto
> <seto.hidetoshi@xxxxxxxxxxxxxx> wrote:
>> Or ... is it possible to push siginfo w/ addr and pop here?
>
> I chatted to Peter Anvin about this over lunch ... his suggestion was that since
> we know (for now) that the recovery case is always from user mode. We can
> let all the non-involved cpus return from do_machine_check() .. but catch the
> cpu with the problem and do a sideways stack jump from the machine check
> stack to the normal trap stack. At this point we'll be executing in a context
> that is effectively the same as a page fault - so we have plenty of safe options
> on functions we can call, locks we can take etc.
>
> So perhaps we can change "void do_machine_check()" to "unsigned long
> do_machine_check()" and have the bystander cpus "return 0;" and the
> cpu that hit the error "return m.addr;" ... and then do the necessary magic
> in entry_64.S to leap from stack to stack in one mighty leap (and then
> onto a "handle_action_required(regs, addr)" function.
>
> -Tony

Sounds good.

I guess we need something more for high-level recovery in kernel mode,
but it is better to set aside such difficulty for now and take things
one at a time.


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/