Re: [Patch V0] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.

From: Borislav Petkov
Date: Fri Dec 04 2015 - 12:36:46 EST


On Fri, Dec 04, 2015 at 05:23:18PM +0000, Luck, Tony wrote:
> > Franky, I'm not sure at all and very very wary of adding *any* code
> > which runs on an offlined CPU. Because *no one* does that and it hasn't
> > been tested at all. So who knows what happens.
> >
> > What we should be doing is execute the *minimal* amount of code possible
> > and get out. No counting, no per-cpu variables. No nothing.
>
> The minimal code requires we use:
>
> smp_processor_id() [to get our cpu number]
> cpu_is_offline() [to find out the cpu is offline]
>
> The first of those looks more dangerous in that it accesses a per-cpu variable.
>
> I don't think we need to be totally paranoid here. We know that the offline cpus
> were once online and went through normal kernel initialization code (if they didn't,
> then we can't possibly be executing this code ... their CR4.MCE bit would be zero so their
> response to a machine check would have been to reset the system).

I don't mean that - I mean the stuff we do before we call
cpu_is_offline() like ist_enter, this_cpu_inc(mce_exception_count),
etc. Then we do a whole another bunch of stuff at the "out:" label like
printk and whatnot which shouldn't run on an offlined CPU.

I.e., the check whether a CPU is offline should be the first thing we do
in do_machine_check and get the hell out if so.

> Agreed. It would be more pleasant if we had some way to *really* offline a cpu,
> including telling the rest of the system not to send it any more broadcast events
> like MCE, SMI. But the h/w guys like to give the s/w guys job security by making
> these corner cases that we have to work around in s/w :-)

Mind you, this is unintentional from the hw guys. But ha(!), I know
*exactly* what you mean.

:-)

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/