Re: [PATCH 1/2] x86/entry/64: Do not clear %rbx under Xen

From: Andy Lutomirski
Date: Sun Jul 22 2018 - 13:56:57 EST


On Sat, Jul 21, 2018 at 6:20 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx> wrote:
> On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
>>
>> On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
>>>
>>> On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>>>>>> exceptions/interrupts, to reduce speculation attack surface")
>>>>>> unintendedly
>>>>>> broke Xen PV virtual machines by clearing the %rbx register at the end
>>>>>> of
>>>>>> xen_failsafe_callback before the latter jumps to error_exit.
>>>>>> error_exit expects the %rbx register to be a flag indicating whether
>>>>>> there should be a return to kernel mode.
>>>>>>
>>>>>> This commit makes sure that the %rbx register is not cleared by
>>>>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is
>>>>>> instantiated
>>>>>> by xen_failsafe_callback, to avoid the issue.
>>>>>
>>>>>
>>>>>
>>>>> Seems like a genuine problem, but:
>>>>>
>>>>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>>>>> index c7449f377a77..96e8ff34129e 100644
>>>>>> --- a/arch/x86/entry/entry_64.S
>>>>>> +++ b/arch/x86/entry/entry_64.S
>>>>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>>>>> addq $0x30, %rsp
>>>>>> UNWIND_HINT_IRET_REGS
>>>>>> pushq $-1 /* orig_ax = -1 => not a system call */
>>>>>> - PUSH_AND_CLEAR_REGS
>>>>>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>>>>>> ENCODE_FRAME_POINTER
>>>>>> jmp error_exit
>>>>>
>>>>>
>>>>>
>>>>> The old code first set RBX to zero then, if frame pointers are on,
>>>>> sets it to some special non-zero value, then crosses its fingers and
>>>>> hopes for the best. Your patched code just skips the zeroing part, so
>>>>> RBX either contains the ENCODE_FRAME_POINTER result or is
>>>>> uninitialized.
>>>>>
>>>>> How about actually initializing rbx to something sensible like, say, 1?
>>>>
>>>>
>>>> Hello Andy,
>>>>
>>>> Thank you for the review! Apparently, I have not done my homework fully.
>>>> I will test your suggestion and report back, most likely in a few hours.
>>>>
>>>> I have been testing with the next/linux-next tree's master branch
>>>> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
>>>> frame pointer (i.e., RBP) register, as opposed to the RBX register,
>>>> which the patch aims to avoid changing before jumping to error_exit.
>>>> It is possible that I am missing something though -- I am not sure about
>>>> the connection between the RBP and RBX registers.
>>>
>>>
>>> Sorry, brain fart on my part.
>>
>>
>> No problem! :-)
>>
>>>> The change introduced by commit 3ac6d8c787b8 is in the excerpt below.
>>>> Would
>>>> it
>>>> be valid to state that the original code had the same issue that you
>>>> referred
>>>> to (i.e., leaving the RBX register uninitialized)?
>>>
>>>
>>> Presumably.
>>>
>>> I would propose a rather different fix:
>>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>>>
>>> Any chance you could test that and see if it fixes your problem?
>>
>>
>> Of course; I will report back with the result in a few hours.
>
>
> Hello Andy,
>
> I confirm that the commit at [1] resolves the issue in question as well.
>
> To test, I first reverted my commit, applied your commit and verified that
> the bug cannot be reproduced. Afterwards, I reverted your commit and
> verified that the bug is reproducible.
>
> I am not sure about the best way to document the bug I encountered in your
> commit message, but in case you plan to have your commit merged, please
> feel free to add a "Reported-and-tested-by: M. Vefa Bicakci
> <m.v.b@xxxxxxxxxx>"
> tag to the commit message, possibly with a link to this e-mail thread.
>
> Finally, as I had mentioned in my commit message, this bug exists in all
> kernel versions 4.14 and greater, so it would be nice if you could
> carbon-copy
> "stable@xxxxxxxxxxxxxxx" in the commit message as well.
>
> Thank you,

I'm curious why the sigreturn_64 selftest didn't catch this bug. Do
you happen to know why? The xen_failsafe_callback mechanism is a bit
mysterious to me.

>
> Vefa
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>