Re: kexec reboot fails with extra wbinvd introduced for AME SME

From: Linus Torvalds
Date: Wed Jan 17 2018 - 21:56:09 EST


On Wed, Jan 17, 2018 at 6:47 PM, Dave Young <dyoung@xxxxxxxxxx> wrote:
> Did several quick tests, probably need more tests, but till now the
> results are:
>
> void stop_this_cpu(void *dummy)
> {
> =====> add wbinvd here: kexec works
> local_irq_disable();
> =====> add wbinvd here: kexec works
> /*
> * Remove this CPU:
> */
> set_cpu_online(smp_processor_id(), false);
> =====> add wbinvd here: kexec does not work

Funky.

> So it seems that it will not work after cpu offined..

Well, that set_cpu_online() call really just clears a bit in our
'__cpu_online_mask' CPU mask. It doesn't really do anything to the
*hardware*.

But I do wonder if the wbinvd causes an SMI or something on your
system. I _think_ wbinvd causes some external pin to be wiggled just
to tell possible external cache hardware to flush too, and on a system
level that could be tied to some random thing.

And then if we get an SMI/NMI when we've marked the system offline,
maybe we do something odd.

Very odd. But maybe this makes somebody go "Duh, that's because of xyz.."

Linus