Re: [PATCH 2/2] KVM: x86: Simplify APICv update request logic

From: Sean Christopherson
Date: Mon Oct 11 2021 - 13:55:49 EST


On Sun, Oct 10, 2021, Maxim Levitsky wrote:
> On Fri, 2021-10-08 at 18:01 -0700, Sean Christopherson wrote:
> > Drop confusing and flawed code that intentionally sets that per-VM APICv
> > inhibit mask after sending KVM_REQ_APICV_UPDATE to all vCPUs. The code
> > is confusing because it's not obvious that there's no race between a CPU
> > seeing the request and consuming the new mask. The code works only
> > because the request handling path takes the same lock, i.e. responding
> > vCPUs will be blocked until the full update completes.
>
> Actually this code is here on purpose:
>
> While it is true that the main reader of apicv_inhibit_reasons (KVM_REQ_APICV_UPDATE handler)
> does take the kvm->arch.apicv_update_lock lock, so it will see the correct value
> regardless of this patch, the reason why this code first raises the KVM_REQ_APICV_UPDATE
> and only then updates the arch.apicv_inhibit_reasons is that I put a warning into svm_vcpu_run
> which checks that per cpu AVIC inhibit state matches the global AVIC inhibit state.
>
> That warning proved to be very useful to ensure that AVIC inhibit works correctly.
>
> If this patch is applied, the warning can no longer work reliably unless
> it takes the apicv_update_lock which will have a performance hit.
>
> The reason is that if we just update apicv_inhibit_reasons, we can race
> with vCPU which is about to re-enter the guest mode and trigger this warning.

Ah, and it relies on kvm_make_all_cpus_request() to wait for vCPUs to ack the
IRQ before updating apicv_inhibit_reasons, and then relies on kvm_vcpu_update_apicv()
to stall on acquiring apicv_update_lock() so that the vCPU can't redo svm_vcpu_run()
without seeing the new inhibit state.

I'll drop this patch and send one to add comments, there are a lot of subtle/hidden
dependencies here. Setting the inhibit _after_ the request in particular needs a
comment as it goes directly against the behavior of pretty much every other request
flow.

Thanks!