Re: [Xen-devel] [PATCH 1/2] x86/entry/64: Do not clear %rbx under Xen

From: Andrew Cooper
Date: Sun Jul 22 2018 - 14:33:29 EST


On 22/07/18 18:56, Andy Lutomirski wrote:
> On Sat, Jul 21, 2018 at 6:20 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx> wrote:
>> On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
>>> On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
>>>> On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx>
>>>> wrote:
>>>>> On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <m.v.b@xxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>>>>>>> exceptions/interrupts, to reduce speculation attack surface")
>>>>>>> unintendedly
>>>>>>> broke Xen PV virtual machines by clearing the %rbx register at the end
>>>>>>> of
>>>>>>> xen_failsafe_callback before the latter jumps to error_exit.
>>>>>>> error_exit expects the %rbx register to be a flag indicating whether
>>>>>>> there should be a return to kernel mode.
>>>>>>>
>>>>>>> This commit makes sure that the %rbx register is not cleared by
>>>>>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is
>>>>>>> instantiated
>>>>>>> by xen_failsafe_callback, to avoid the issue.
>>>>>>
>>>>>>
>>>>>> Seems like a genuine problem, but:
>>>>>>
>>>>>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>>>>>> index c7449f377a77..96e8ff34129e 100644
>>>>>>> --- a/arch/x86/entry/entry_64.S
>>>>>>> +++ b/arch/x86/entry/entry_64.S
>>>>>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>>>>>> addq $0x30, %rsp
>>>>>>> UNWIND_HINT_IRET_REGS
>>>>>>> pushq $-1 /* orig_ax = -1 => not a system call */
>>>>>>> - PUSH_AND_CLEAR_REGS
>>>>>>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>>>>>>> ENCODE_FRAME_POINTER
>>>>>>> jmp error_exit
>>>>>>
>>>>>>
>>>>>> The old code first set RBX to zero then, if frame pointers are on,
>>>>>> sets it to some special non-zero value, then crosses its fingers and
>>>>>> hopes for the best. Your patched code just skips the zeroing part, so
>>>>>> RBX either contains the ENCODE_FRAME_POINTER result or is
>>>>>> uninitialized.
>>>>>>
>>>>>> How about actually initializing rbx to something sensible like, say, 1?
>>>>>
>>>>> Hello Andy,
>>>>>
>>>>> Thank you for the review! Apparently, I have not done my homework fully.
>>>>> I will test your suggestion and report back, most likely in a few hours.
>>>>>
>>>>> I have been testing with the next/linux-next tree's master branch
>>>>> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
>>>>> frame pointer (i.e., RBP) register, as opposed to the RBX register,
>>>>> which the patch aims to avoid changing before jumping to error_exit.
>>>>> It is possible that I am missing something though -- I am not sure about
>>>>> the connection between the RBP and RBX registers.
>>>>
>>>> Sorry, brain fart on my part.
>>>
>>> No problem! :-)
>>>
>>>>> The change introduced by commit 3ac6d8c787b8 is in the excerpt below.
>>>>> Would
>>>>> it
>>>>> be valid to state that the original code had the same issue that you
>>>>> referred
>>>>> to (i.e., leaving the RBX register uninitialized)?
>>>>
>>>> Presumably.
>>>>
>>>> I would propose a rather different fix:
>>>>
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>>>>
>>>> Any chance you could test that and see if it fixes your problem?
>>>
>>> Of course; I will report back with the result in a few hours.
>>
>> Hello Andy,
>>
>> I confirm that the commit at [1] resolves the issue in question as well.
>>
>> To test, I first reverted my commit, applied your commit and verified that
>> the bug cannot be reproduced. Afterwards, I reverted your commit and
>> verified that the bug is reproducible.
>>
>> I am not sure about the best way to document the bug I encountered in your
>> commit message, but in case you plan to have your commit merged, please
>> feel free to add a "Reported-and-tested-by: M. Vefa Bicakci
>> <m.v.b@xxxxxxxxxx>"
>> tag to the commit message, possibly with a link to this e-mail thread.
>>
>> Finally, as I had mentioned in my commit message, this bug exists in all
>> kernel versions 4.14 and greater, so it would be nice if you could
>> carbon-copy
>> "stable@xxxxxxxxxxxxxxx" in the commit message as well.
>>
>> Thank you,
> I'm curious why the sigreturn_64 selftest didn't catch this bug. Do
> you happen to know why? The xen_failsafe_callback mechanism is a bit
> mysterious to me.

It is invoked whenever Xen suffers a fault when trying to load a guest
segment register.

In practice, it is used far more rarely with 64bit builds of Xen than it
used to be with 32bit builds. This is because the only time we reload
guest data segments is in the vcpu context switch path.

More recent versions of Xen also don't have a fallback in the iret path,
due to what think was actually some overzealous cleanup. As a result,
#GP gets delivered pointing at the target of the iret instruction.

Looking at the sigreturn tests, I'd expect the test to complain a lot,
not least because Xen doesn't have espfix64.

~Andrew