Re: [RESEND PATCH ] KVM: VMX: Enable/disable PML when dirty logging gets enabled/disabled

From: Makarand Sonare
Date: Fri Feb 12 2021 - 14:15:55 EST


>> Currently, if enable_pml=1 PML remains enabled for the entire lifetime
>> of the VM irrespective of whether dirty logging is enable or disabled.
>> When dirty logging is disabled, all the pages of the VM are manually
>> marked dirty, so that PML is effectively non-operational. Clearing
>
> s/clearing/setting
>
> Clearing is also expensive, but that can't be optimized away with this
> change.

Thanks for catching the typo, it should be setting.

>
>> the dirty bits is an expensive operation which can cause severe MMU
>> lock contention in a performance sensitive path when dirty logging
>> is disabled after a failed or canceled live migration. Also, this
>> would break if some other code path clears the dirty bits in which
>> case, PML will actually start logging dirty pages even when dirty
>> logging is disabled incurring unnecessary vmexits when the PML buffer
>> becomes full. In order to avoid this extra overhead, we should
>> enable or disable PML in VMCS when dirty logging gets enabled
>> or disabled instead of keeping it always enabled.
>
>
> ...
>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 777177ea9a35e..eb6639f0ee7eb 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4276,7 +4276,7 @@ static void
>> vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
>> */
>> exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
>>
>> - if (!enable_pml)
>> + if (!enable_pml || !vcpu->kvm->arch.pml_enabled)
>> exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>
> The checks are unnecessary if PML is dynamically toggled, i.e. this snippet
> can
> unconditionally clear PML. When setting SECONDARY_EXEC (below snippet),
> PML
> will be preserved in the current controls, which is what we want.

Assuming a new VCPU can be added at a later time after PML is already
enabled, should we clear
PML in VMCS for the new VCPU. If yes what will be the trigger for
setting PML for the new VCPU?

>
>> if (cpu_has_vmx_xsaves()) {
>> @@ -7133,7 +7133,8 @@ static void vmcs_set_secondary_exec_control(struct
>> vcpu_vmx *vmx)
>> SECONDARY_EXEC_SHADOW_VMCS |
>> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>> - SECONDARY_EXEC_DESC;
>> + SECONDARY_EXEC_DESC |
>> + SECONDARY_EXEC_ENABLE_PML;
>>
>> u32 new_ctl = vmx->secondary_exec_control;
>> u32 cur_ctl = secondary_exec_controls_get(vmx);
>> @@ -7509,6 +7510,19 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int
>> cpu)
>> static void vmx_slot_enable_log_dirty(struct kvm *kvm,
>> struct kvm_memory_slot *slot)
>> {
>> + /*
>> + * Check all slots and enable PML if dirty logging
>> + * is being enabled for the 1st slot
>> + *
>> + */
>> + if (enable_pml &&
>> + kvm->dirty_logging_enable_count == 1 &&
>> + !kvm->arch.pml_enabled) {
>> + kvm->arch.pml_enabled = true;
>> + kvm_make_all_cpus_request(kvm,
>> + KVM_REQ_UPDATE_VCPU_DIRTY_LOGGING_STATE);
>> + }
>
> This is flawed. .slot_enable_log_dirty() and .slot_disable_log_dirty() are
> only
> called when LOG_DIRTY_PAGE is toggled in an existing memslot _and_ only the
> flags of the memslot are being changed. This fails to enable PML if the
> first
> memslot with LOG_DIRTY_PAGE is created or moved, and fails to disable PML if
> the
> last memslot with LOG_DIRTY_PAGE is deleted.

Thanks for pointing out. If there is such a scenario, what do you
suggest to handle this?

>
>> +
>> if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
>> kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
>> kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
>> @@ -7517,9 +7531,39 @@ static void vmx_slot_enable_log_dirty(struct kvm
>> *kvm,
>> static void vmx_slot_disable_log_dirty(struct kvm *kvm,
>> struct kvm_memory_slot *slot)
>> {
>> + /*
>> + * Check all slots and disable PML if dirty logging
>> + * is being disabled for the last slot
>> + *
>> + */
>> + if (enable_pml &&
>> + kvm->dirty_logging_enable_count == 0 &&
>> + kvm->arch.pml_enabled) {
>> + kvm->arch.pml_enabled = false;
>> + kvm_make_all_cpus_request(kvm,
>> + KVM_REQ_UPDATE_VCPU_DIRTY_LOGGING_STATE);
>> + }
>> +
>> kvm_mmu_slot_set_dirty(kvm, slot);
>> }
>
> ...
>
>> #define kvm_err(fmt, ...) \
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index ee4ac2618ec59..c6e5b026bbfe8 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -307,6 +307,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm,
>> unsigned int req)
>> {
>> return kvm_make_all_cpus_request_except(kvm, req, NULL);
>> }
>> +EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);
>>
>> #ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> void kvm_flush_remote_tlbs(struct kvm *kvm)
>> @@ -1366,15 +1367,24 @@ int __kvm_set_memory_region(struct kvm *kvm,
>> }
>>
>> /* Allocate/free page dirty bitmap as needed */
>> - if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
>> + if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>> new.dirty_bitmap = NULL;
>> - else if (!new.dirty_bitmap && !kvm->dirty_ring_size) {
>> +
>> + if (old.flags & KVM_MEM_LOG_DIRTY_PAGES) {
>> + WARN_ON(kvm->dirty_logging_enable_count == 0);
>> + --kvm->dirty_logging_enable_count;
>
> The count will be corrupted if kvm_set_memslot() fails.
>
> The easiest/cleanest way to fix both this and the refcounting bug is to
> handle
> the count in kvm_mmu_slot_apply_flags(). That will also allow making the
> dirty
> log count x86-only, and it can then be renamed to cpu_dirty_log_count to
> align
> with the
>
> We can always move/rename the count variable if additional motivation for
> tracking dirty logging comes along.

Thanks for pointing out. Will this solution take care of the scenario
where a memslot is created/deleted with LOG_DIRTY_PAGE?

>
>
>> + }
>> +
>> + } else if (!new.dirty_bitmap && !kvm->dirty_ring_size) {
>> r = kvm_alloc_dirty_bitmap(&new);
>> if (r)
>> return r;
>>
>> if (kvm_dirty_log_manual_protect_and_init_set(kvm))
>> bitmap_set(new.dirty_bitmap, 0, new.npages);
>> +
>> + ++kvm->dirty_logging_enable_count;
>> + WARN_ON(kvm->dirty_logging_enable_count == 0);
>> }
>>
>> r = kvm_set_memslot(kvm, mem, &old, &new, as_id, change);
>> --
>> 2.30.0.478.g8a0d178c01-goog
>