Re: [PATCH v2] KVM: LAPIC: Recalculate apic map in batch

From: Paolo Bonzini
Date: Tue Feb 25 2020 - 09:20:50 EST


On 25/02/20 10:47, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
>
> In the vCPU reset and set APIC_BASE MSR path, the apic map will be recalculated
> several times, each time it will consume 10+ us observed by ftrace in my
> non-overcommit environment since the expensive memory allocate/mutex/rcu etc
> operations. This patch optimizes it by recaluating apic map in batch, I hope
> this can benefit the serverless scenario which can frequently create/destroy
> VMs.
>
> Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> ---
> v1 -> v2:
> * add apic_map_dirty to kvm_lapic
> * error condition in kvm_apic_set_state, do recalcuate unconditionally
>
> arch/x86/kvm/lapic.c | 29 +++++++++++++++++++----------
> arch/x86/kvm/lapic.h | 2 ++
> arch/x86/kvm/x86.c | 2 ++
> 3 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afcd30d..3476dbc 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -164,7 +164,7 @@ static void kvm_apic_map_free(struct rcu_head *rcu)
> kvfree(map);
> }
>
> -static void recalculate_apic_map(struct kvm *kvm)
> +void kvm_recalculate_apic_map(struct kvm *kvm)
> {

It's better to add an "if" here rather than in every caller. It should
be like:

if (!apic->apic_map_dirty) {
/*
* Read apic->apic_map_dirty before
* kvm->arch.apic_map.
*/
smp_rmb();
return;
}

mutex_lock(&kvm->arch.apic_map_lock);
if (!apic->apic_map_dirty) {
/* Someone else has updated the map. */
mutex_unlock(&kvm->arch.apic_map_lock);
return;
}
...
out:
old = rcu_dereference_protected(kvm->arch.apic_map,
lockdep_is_held(&kvm->arch.apic_map_lock));
rcu_assign_pointer(kvm->arch.apic_map, new);
/*
* Write kvm->arch.apic_map before
* clearing apic->apic_map_dirty.
*/
smp_wmb();
apic->apic_map_dirty = false;
mutex_unlock(&kvm->arch.apic_map_lock);
...

But actually it seems to me that, given we're going through all this
pain, it's better to put the "dirty" flag in kvm->arch, next to the
mutex and the map itself. This should also reduce the number of calls
to kvm_recalculate_apic_map that recompute the map. A lot of them will
just wait on the mutex and exit.

Paolo