Re: [PATCH 02/31] x86: MCE: Improve mce_get_rip v3

From: Andi Kleen
Date: Wed May 27 2009 - 03:17:08 EST


On Wed, May 27, 2009 at 01:29:16PM +0900, Hidetoshi Seto wrote:
> Andi Kleen wrote:
> > From: Huang Ying <ying.huang@xxxxxxxxx>
> >
> > Assume RIP is valid when either EIPV or RIPV are set.
>
> Bad description.
> If RIP means "restart IP" that is valid only if RIPV is set,
> this sentence doesn't make sense completely.

No it doesn't mean restart IP, it just means normal instruction
pointer like everywhere else.

>
> > This influences
> > whether the machine check exception handler decides to return or panic.
>
> I suppose you are pointing logics in:

Yes.

>
> mce_get_rip(&m, regs);
> :
> panicm = m;
> :
> /*
> * If the EIPV bit is set, it means the saved IP is the
> * instruction which caused the MCE.
> */
> if (m.mcgstatus & MCG_STATUS_EIPV)
> user_space = panicm.ip && (panicm.cs & 3);
>
> /*
> * If we know that the error was in user space, send a
> * SIGBUS. Otherwise, panic if tolerance is low.
> *
> * force_sig() takes an awful lot of locks and has a slight
> * risk of deadlocking.
> */
> if (user_space) {
> force_sig(SIGBUS, current);
> } else if (panic_on_oops || tolerant < 2) {
> mce_panic("Uncorrected machine check",
> &panicm, mcestart);
> }
>
> So EIPV without RIPV will be no ip and will result in panic,
> while expected result is SIGBUS.

First this is only for the !MCA recovery case. In the MCA recovery
case we have more information and can decide better.

In this case no EIPV means that the kernel isn't sure where the
error occurred so it cannot safely decide if it was user space
or kernel space and in the tolerant == 2 case has to panic
just in case a kernel kill would cause deadlock.

With MCA recovery this whole this is replaced by a new improved
mechanism using the high level handler.

> >
> > Also in addition do not force the RIP to be valid with the exact
> > register MSRs.
>
> I think the forced one is EIP:
> > - m->mcgstatus |= MCG_STATUS_EIPV;

True. Changed.

>
> And please note that it keep use CS on stack even if MSR is available.
>
> I made an alternative patch for this, with no functional change.
> Please consider replacing.

No, sorry I got burned too much last time you touched the description
of this simple patch. I think my description is simple and to the point
and this patch doesn't really deserve anything more.

-Andi
--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/