Re: [PATCH] KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.
From: Paolo Bonzini
Date: Wed Jun 29 2016 - 16:48:41 EST
On 29/06/2016 19:25, Quentin Casasnovas wrote:
> On Fri, Jun 24, 2016 at 03:10:03PM +0200, Paolo Bonzini wrote:
>> On 24/06/2016 15:04, Quentin Casasnovas wrote:
>>> On Thu, Jun 23, 2016 at 06:03:01PM +0200, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 18/06/2016 11:01, Quentin Casasnovas wrote:
>>>>> Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
>>>>> Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
>>>>> Control Structure", I found that we're enforcing that the destination
>>>>> operand is NOT located in a read-only data segment or any code segment when
>>>>> the L1 is in long mode - BUT that check should only happen when it is in
>>>>> protected mode.
>>>>>
>>>>> Shuffling the code a bit to make our emulation follow the specification
>>>>> allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
>>>>> without problems.
>>>>
>>>> That's great, and I'm applying the patch, but it's also pretty weird. :)
>>>> Do you have a pointer to Xen source code that does a VMREAD into a
>>>> read-only data segment or a code segment?
>>>
>>> It is indeed pretty weird. Looking at the Xen stack trace, it looks like
>>> the vmread is writing to an on-stack buffer, and surely it must be writable
>>> so I wonder if Xen might not be using an executable stack for some reason?
>>> That would be a bit scary so I'm surely missing something.
>>>
>>> Is there an easy way to know from my KVM host the different segment
>>> permission setup by the guest?
>>
>> Remove your patch, call dump_vmcs() where the #GP is injected, and
>> you'll find the VMCS (including segment permissions, but not the
>> instruction info field---you probably should add it) in dmesg.
>
> Thanks for the heads up :)
>
> I've had a bit more time to spend on this this morning and attached is the
> VMCS dump. I've look at the vmcs_instruction_info and it appears the
> segment referenced is SS (which is in sync with the backtrace where the
> instruction causing the vmexit is "vmread %rbp, %rbp), and it has awkward
> attributes:
>
> SS: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
>
> The lower 16 bits are all zero so KVM VMX emulation was injecting the GP(0)
> because we were about to write to a read-only segment. At least the stack
> isn't executable from what I can tell!
Yes, that was my reading of the VMCS dump too. The weird attributes
come from the (non)handling of selectors in 64-bit mode.
Paolo
> Attached is the full VMCS dump where I've added a printk() to show the
> 'type' (all zeroes) and vmcs_instruction_info in case my above analysis is
> complete non-sense.
>
> Quentin
>