Re: [PATCH] KVM: nVMX: do not pin the VMCS12

From: Wanpeng Li
Date: Thu Jul 27 2017 - 21:28:59 EST


2017-07-28 1:20 GMT+08:00 David Matlack <dmatlack@xxxxxxxxxx>:
> On Thu, Jul 27, 2017 at 6:54 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>> Since the current implementation of VMCS12 does a memcpy in and out
>> of guest memory, we do not need current_vmcs12 and current_vmcs12_page
>> anymore. current_vmptr is enough to read and write the VMCS12.
>
> This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
> VMCS12 page by using kvm_write_guest. nested_release_page() only marks
> the struct page dirty.
>
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/vmx.c | 23 ++++++-----------------
>> 1 file changed, 6 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index b37161808352..142f16ebdca2 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -416,9 +416,6 @@ struct nested_vmx {
>>
>> /* The guest-physical address of the current VMCS L1 keeps for L2 */
>> gpa_t current_vmptr;
>> - /* The host-usable pointer to the above */
>> - struct page *current_vmcs12_page;
>> - struct vmcs12 *current_vmcs12;
>> /*
>> * Cache of the guest's VMCS, existing outside of guest memory.
>> * Loaded from guest memory during VMPTRLD. Flushed to guest
>> @@ -7183,10 +7180,6 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>> if (vmx->nested.current_vmptr == -1ull)
>> return;
>>
>> - /* current_vmptr and current_vmcs12 are always set/reset together */
>> - if (WARN_ON(vmx->nested.current_vmcs12 == NULL))
>> - return;
>> -
>> if (enable_shadow_vmcs) {
>> /* copy to memory all shadowed fields in case
>> they were modified */
>> @@ -7199,13 +7192,11 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>> vmx->nested.posted_intr_nv = -1;
>>
>> /* Flush VMCS12 to guest memory */
>> - memcpy(vmx->nested.current_vmcs12, vmx->nested.cached_vmcs12,
>> - VMCS12_SIZE);
>> + kvm_vcpu_write_guest_page(&vmx->vcpu,
>> + vmx->nested.current_vmptr >> PAGE_SHIFT,
>> + vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
>
> Have you hit any "suspicious RCU usage" error messages during VM

Yeah, I observe this splat when testing Paolo's patch today.

[87214.855344] =============================
[87214.855346] WARNING: suspicious RCU usage
[87214.855348] 4.13.0-rc2+ #2 Tainted: G OE
[87214.855350] -----------------------------
[87214.855352] ./include/linux/kvm_host.h:573 suspicious
rcu_dereference_check() usage!
[87214.855353]
other info that might help us debug this:

[87214.855355]
rcu_scheduler_active = 2, debug_locks = 1
[87214.855357] 1 lock held by qemu-system-x86/17059:
[87214.855359] #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc051bb12>]
vcpu_load+0x22/0x80 [kvm]
[87214.855396]
stack backtrace:
[87214.855399] CPU: 3 PID: 17059 Comm: qemu-system-x86 Tainted: G
OE 4.13.0-rc2+ #2
[87214.855401] Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY,
BIOS FBKTC1AUS 02/16/2016
[87214.855403] Call Trace:
[87214.855408] dump_stack+0x99/0xce
[87214.855413] lockdep_rcu_suspicious+0xc5/0x100
[87214.855423] kvm_vcpu_gfn_to_memslot+0x166/0x180 [kvm]
[87214.855432] kvm_vcpu_write_guest_page+0x24/0x50 [kvm]
[87214.855438] free_nested.part.76+0x76/0x270 [kvm_intel]
[87214.855443] vmx_free_vcpu+0x7a/0xc0 [kvm_intel]
[87214.855454] kvm_arch_destroy_vm+0x104/0x1d0 [kvm]
[87214.855463] kvm_put_kvm+0x17a/0x2b0 [kvm]
[87214.855473] kvm_vm_release+0x21/0x30 [kvm]
[87214.855477] __fput+0xfb/0x240
[87214.855482] ____fput+0xe/0x10
[87214.855485] task_work_run+0x7e/0xb0
[87214.855490] do_exit+0x323/0xcf0
[87214.855494] ? get_signal+0x318/0x930
[87214.855498] ? _raw_spin_unlock_irq+0x2c/0x60
[87214.855503] do_group_exit+0x50/0xd0
[87214.855507] get_signal+0x24f/0x930
[87214.855514] do_signal+0x37/0x750
[87214.855518] ? __might_fault+0x3e/0x90
[87214.855523] ? __might_fault+0x85/0x90
[87214.855527] ? exit_to_usermode_loop+0x2b/0x100
[87214.855531] ? __this_cpu_preempt_check+0x13/0x20
[87214.855535] exit_to_usermode_loop+0xab/0x100
[87214.855539] syscall_return_slowpath+0x153/0x160
[87214.855542] entry_SYSCALL_64_fastpath+0xc0/0xc2
[87214.855545] RIP: 0033:0x7ff40d24a26d


Regards,
Wanpeng Li

> teardown with this patch? We did when we replaced memcpy with
> kvm_write_guest a while back. IIRC it was due to kvm->srcu not being
> held in one of the teardown paths. kvm_write_guest() expects it to be
> held in order to access memslots.
>
> We fixed this by skipping the VMCS12 flush during VMXOFF. I'll send
> that patch along with a few other nVMX dirty tracking related patches
> I've been meaning to get upstreamed.
>
>>
>> - kunmap(vmx->nested.current_vmcs12_page);
>> - nested_release_page(vmx->nested.current_vmcs12_page);
>> vmx->nested.current_vmptr = -1ull;
>> - vmx->nested.current_vmcs12 = NULL;
>> }
>>
>> /*
>> @@ -7623,14 +7614,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
>> }
>>
>> nested_release_vmcs12(vmx);
>> - vmx->nested.current_vmcs12 = new_vmcs12;
>> - vmx->nested.current_vmcs12_page = page;
>> /*
>> * Load VMCS12 from guest memory since it is not already
>> * cached.
>> */
>> - memcpy(vmx->nested.cached_vmcs12,
>> - vmx->nested.current_vmcs12, VMCS12_SIZE);
>> + memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE);
>> + kunmap(page);
>
> + nested_release_page_clean(page);
>
>> +
>> set_current_vmptr(vmx, vmptr);
>> }
>>
>> @@ -9354,7 +9344,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>>
>> vmx->nested.posted_intr_nv = -1;
>> vmx->nested.current_vmptr = -1ull;
>> - vmx->nested.current_vmcs12 = NULL;
>>
>> vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED;
>>
>> --
>> 1.8.3.1
>>