Re: kexec reboot fails with extra wbinvd introduced for AME SME

From: Tom Lendacky
Date: Wed Jan 17 2018 - 17:53:49 EST


On 1/17/2018 2:01 PM, Tom Lendacky wrote:
> On 1/17/2018 1:42 PM, Linus Torvalds wrote:
>> On Tue, Jan 16, 2018 at 11:22 PM, Dave Young <dyoung@xxxxxxxxxx> wrote:
>>>
>>> For the kexec reboot hang, if I remove the wbinvd in stop_this_cpu()
>>> then kexec works fine. like this:
>>
>> Honestly, I think we should apply that patch regardless.
>>
>> Using 'wbinvd' should not be some "just because of random reasons".
>> There are CPU's with errata on wbinvd, and the thing in general is
>> slow and nasty.
>>
>> Doing the wbinvd in a loop sounds even stranger.
>>
>> If we're only doing it because of some SME issue, why isn't it
>> dependent on SME? And why is it inside that loop at all?
>
> My original patches did check for X86_FEATURE_SME and only do the
> wbinvd if SME was supported (although still in the loop). The general
> consensus was to just do the wbinvd no matter what and so it is as it is
> today.
>
> It can probably be outside of the loop. The issue I was seeing was
> memory corruption from the stack when using halt() with paravirt ops
> enabled. So a native_halt() should be used.
>
>>
>> Anyway, does it work for you if you just do the wbinvd() once, outside
>> the loop? Admittedly the loop shouldn't actually loop (hlt with
>> interrupts disabled), but who the hell knows.. Some of the errata
>> around SME have been about machine check exceptions or something.
>
> I think that should work as long as it's a native_wbinvd() call and it
> can also be conditional on boot_cpu_has(X86_FEATURE_SME).
>
> I'll do some testing.

Looks like everything is good with the suggested changes. Patch to follow
shortly.

Thanks,
Tom

>
> Thanks,
> Tom
>
>>
>> See commit a68e5c94f7d3 ("x86, hotplug: Move WBINVD back outside the
>> play_dead loop") for another example where wbinvd was inside a loop
>> and apparently caused some odd issues.
>>
>> Linus
>>