Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

From: Radim KrÄmÃÅ
Date: Tue Apr 11 2017 - 16:45:46 EST


2017-04-11 13:25+0800, Paolo Bonzini:
> On 07/04/2017 05:02, James Hogan wrote:
>> - you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you
>> get two of these in quick succession only the first will wait for the
>> IPI, which might work as long as they're already serialised but it
>> still feels wrong).
>
> But this is okay---avoiding multiple IPIs is the exact purpose of
> EXITING_GUEST_MODE.

I think this applied to the missed synchronization, in which case the
point is valid as the latter caller would assume that it can proceed to
reuse the memory even though the guest was still using it.

> There are evidently multiple uses of kvm_make_all_cpus_request, and we
> should avoid smp_call_function_many(..., true) if possible. So perhaps:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a17d78759727..20e3bd60bdda 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -169,7 +169,7 @@ static void ack_flush(void *_completed)
> {
> }
>
> -bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> +bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req, bool wait)
> {
> int i, cpu, me;
> cpumask_var_t cpus;
> @@ -182,18 +182,19 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> kvm_for_each_vcpu(i, vcpu, kvm) {
> kvm_make_request(req, vcpu);
> cpu = vcpu->cpu;
> + if (cpus == NULL || cpu == -1 || cpu == me)
> + continue;
>
> /* Set ->requests bit before we read ->mode. */
> smp_mb__after_atomic();
> -
> - if (cpus != NULL && cpu != -1 && cpu != me &&
> - kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
> + if (kvm_arch_vcpu_should_kick(vcpu) ||
> + (wait && vcpu->mode != OUTSIDE_GUEST_MODE))
> cpumask_set_cpu(cpu, cpus);
> }
> if (unlikely(cpus == NULL))
> - smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1);
> + smp_call_function_many(cpu_online_mask, ack_flush, NULL, wait);
> else if (!cpumask_empty(cpus))
> - smp_call_function_many(cpus, ack_flush, NULL, 1);
> + smp_call_function_many(cpus, ack_flush, NULL, wait);
> else
> called = false;
> put_cpu();
> @@ -221,7 +222,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> * kvm_make_all_cpus_request() reads vcpu->mode. We reuse that
> * barrier here.
> */
> - if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> + if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH, true))
> ++kvm->stat.remote_tlb_flush;
> cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> }
> @@ -230,7 +231,7 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
>
> void kvm_reload_remote_mmus(struct kvm *kvm)
> {
> - kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> + /* FIXME, is wait=true really needed? */

Probably not. There are two uses,

in kvm_mmu_prepare_zap_page():
The only change that happens between kvm_reload_remote_mmus() and
kvm_flush_remote_tlbs() in kvm_mmu_commit_zap_page() is setting of
sp->role.invalid -- synchronizing it doesn't prevent any race with
READING_SHADOW_PAGE_TABLES mode and the unconditional TLB flush is the
important one. I think that kvm_reload_remote_mmus doesn't even need
to kick in this case.

in kvm_mmu_invalidate_zap_all_pages():
Same situation: the guest cannot do an entry without increasing the
generation number, but can enter READING_SHADOW_PAGE_TABLES mode
between reload and flush.
I think that we don't need to call

but my knowledge of this area is obviously lacking ...

> + kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, true);
> }
>
> int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>
>
> Other users do not need wait=false.

You mean "wait=true"?

(Would be safer to assume they depend on the VM exit wait until proved
otherwise ...)

> Or another idea is to embed wait in the request number, as suggested in the
> ARM thread, so that for example:

Right, I don't think that a TLB flush makes sense without
synchronization and adding context sensitivity

> - bits 0-4 = bit number in vcpu->requests
>
> - bit 8 = wait when making request

Sounds good. The single-target kvm_make_request() + kvm_vcpu_kick()
should use this as well.

> - bit 9 = kick after making request

Maybe add bit mask to denote in which modes the kick/wait is necessary?

bit 9 : IN_GUEST_MODE
bit 10 : EXITING_GUEST_MODE
bit 11 : READING_SHADOW_PAGE_TABLES

TLB_FLUSH would set bits 8-11. IIUC, ARM has use for requests that need
to make sure that the guest is not in guest mode before proceeding and
those would set bit 8-10.

The common requests, "notice me as soon as possible", would set bit 9.
The bits 9-11 could also be used only when bit 8 is set, to make the
transition easier. (9 and 10 could be squished then as well.)