Re: [Patch V0] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.

From: Raj, Ashok
Date: Fri Dec 04 2015 - 18:08:25 EST


On Fri, Dec 04, 2015 at 02:34:52PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 4, 2015 at 9:53 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> > ist_enter() is black magic to me. Andy? Would you be worried about executing
> > ist_{enter,exit}() on a cpu that was once online, but is currently marked offline
> > by Linux?
>
> Offline CPUs are black magic to me. But as long as the CPU works the
> way that the normal specs say it should, then ist_enter is fair game.
> In any event, if context tracking blows up on an offline CPU, I'd
> argue that's a context tracking bug and needs to be fixed.
>
> But maybe offlined CPUs are supposed to have all interrupts off
> (including MCE?) and the argument goes the other way? Dunno.

MCE's are broadcast by the hardware and cannot be blocked. Offline
is only a Linux specific state. Now if the offline was a result of an ACPI
event (eject) that triggered the CPU removal (offline in Linux, as it would
have in a platform that supports true hotplug) then the platform would
remove this cpu from the broadcast list.

if kernel were to set CR4.MCE=0 that would cause system shutdown when
an MCE is broadcast and hits this cpu.

Cheers,
Ashok
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/