Re: [PATCH 2/4] KVM: nVMX: track dirty state of non-shadowed VMCS fields

From: Wanpeng Li
Date: Sun Dec 31 2017 - 17:49:27 EST


2017-12-31 16:08 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
> On 25/12/2017 04:03, Wanpeng Li wrote:
>> 2017-12-21 20:43 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
>>> VMCS12 fields that are not handled through shadow VMCS are rarely
>>> written, and thus they are also almost constant in the vmcs02. We can
>>> thus optimize prepare_vmcs02 by skipping all the work for non-shadowed
>>> fields in the common case.
>>>
>>> This patch introduces the (pretty simple) tracking infrastructure; the
>>> next patches will move work to prepare_vmcs02_full and save a few hundred
>>> clock cycles per VMRESUME on a Haswell Xeon E5 system:
>>>
>>> before after
>>> cpuid 14159 13869
>>> vmcall 15290 14951
>>> inl_from_kernel 17703 17447
>>> outl_to_kernel 16011 14692
>>> self_ipi_sti_nop 16763 15825
>>> self_ipi_tpr_sti_nop 17341 15935
>>> wr_tsc_adjust_msr 14510 14264
>>> rd_tsc_adjust_msr 15018 14311
>>> mmio-wildcard-eventfd:pci-mem 16381 14947
>>> mmio-datamatch-eventfd:pci-mem 18620 17858
>>> portio-wildcard-eventfd:pci-io 15121 14769
>>> portio-datamatch-eventfd:pci-io 15761 14831
>>>
>>> (average savings 748, stdev 460).
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>> ---
>>> arch/x86/kvm/vmx.c | 29 ++++++++++++++++++++++++++++-
>>> 1 file changed, 28 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 2ee842990976..8b6013b529b3 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -441,6 +441,7 @@ struct nested_vmx {
>>> * data hold by vmcs12
>>> */
>>> bool sync_shadow_vmcs;
>>> + bool dirty_vmcs12;
>>>
>>> bool change_vmcs01_virtual_x2apic_mode;
>>> /* L2 must run next, and mustn't decide to exit to L1. */
>>> @@ -7879,8 +7880,10 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
>>> {
>>> unsigned long field;
>>> gva_t gva;
>>> + struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>> u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>>> +
>>> /* The value to write might be 32 or 64 bits, depending on L1's long
>>> * mode, and eventually we need to write that into a field of several
>>> * possible lengths. The code below first zero-extends the value to 64
>>> @@ -7923,6 +7926,20 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
>>> return kvm_skip_emulated_instruction(vcpu);
>>> }
>>>
>>> + switch (field) {
>>> +#define SHADOW_FIELD_RW(x) case x:
>>> +#include "vmx_shadow_fields.h"
>>
>> What's will happen here if enable_shadow_vmcs == false?
>
> If enable_shadow_vmcs == true, these fields never even go through
> handle_vmwrite, so they have to be written always by prepare_vmcs02.
> Other fields go through handle_vmwrite and set dirty_vmcs12.
>
> If enable_shadow_vmcs == false, or if the fields do not exist in the
> hardware VMCS (e.g. PML index on Haswell systems), writes go through
> handle_vmwrite, but they still don't need to go through
> prepare_vmcs02_full. So the switch statement recognizes these flags, so
> that they don't set dirty_vmcs12.

You are right.

Regards,
Wanpeng Li