Re: [git pull] machine check recovery fix

From: Linus Torvalds
Date: Mon May 21 2012 - 19:44:41 EST


On Mon, May 21, 2012 at 4:32 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>
> No it matters which instruction caused the error, because it's the
> one which saw data corruption. If that was not in kernel you
> can safely just return because the kernel is completely fine
> and the instruction can be restarted. It's just like a interrupt.

Wrong.

The MCE interface doesn't even *give* that information, so you're just
full of it.

There's no way to know whether the EIP that you read from the MSR
happened in real mode, in kernel mode, or whatever. You need to check
eflags and the code segment, neither of which - as far as I know, and
certainly not as far as the current code knows - even exists.

So you *have* to check the return information, not the "MCE" information.

Checking the MCE data is stupid and wrong. Stop doing it, and stop
making idiotic excuses for it.

The only thing that actually has the relevant information is the
return stack. Seriously. As such, only the RIPV bit can *possibly* be
the one that you can use. Any time you use "mce->ip" for anything at
all that isn't just reporting, you are just doing moronic things.

Stop doing stupid things.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/