Re: [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change
From: Sean Christopherson
Date: Thu Oct 13 2022 - 16:53:38 EST
On Thu, Oct 13, 2022, Like Xu wrote:
> Firstly, thanks for your comments that spewed out around vpmu.
>
> On 23/9/2022 8:13 am, Sean Christopherson wrote:
> > Force vCPUs to reprogram all counters on a PMU filter change to provide
> > a sane ABI for userspace. Use the existing KVM_REQ_PMU to do the
> > programming, and take advantage of the fact that the reprogram_pmi bitmap
> > fits in a u64 to set all bits in a single atomic update. Note, setting
> > the bitmap and making the request needs to be done _after_ the SRCU
> > synchronization to ensure that vCPUs will reprogram using the new filter.
> >
> > KVM's current "lazy" approach is confusing and non-deterministic. It's
>
> The resolute lazy approach was introduced in patch 03, right after this change.
This is referring to the lazy recognition of the filter, not the deferred
reprogramming of the counters. Regardless of whether reprogramming is handled
via request or in-line, KVM is still lazily recognizing the new filter as vCPUs
won't picke up the new filter until the _guest_ triggers a refresh.
> > @@ -613,9 +615,18 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
> > mutex_lock(&kvm->lock);
> > filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
> > mutex_is_locked(&kvm->lock));
> > - mutex_unlock(&kvm->lock);
> > -
> > synchronize_srcu_expedited(&kvm->srcu);
>
> The relative order of these two operations has been reversed
> mutex_unlock() and synchronize_srcu_expedited()
> , extending the execution window of the critical area of "kvm->lock)".
> The motivation is also not explicitly stated in the commit message.
I'll add a blurb, after I re-convince myself that the sync+request needs to be
done under kvm->lock.
> > + BUILD_BUG_ON(sizeof(((struct kvm_pmu *)0)->reprogram_pmi) >
> > + sizeof(((struct kvm_pmu *)0)->__reprogram_pmi));
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm)
> > + atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull);
>
> How about:
> bitmap_copy(pmu->reprogram_pmi, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
> to avoid further cycles on calls of
> "static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit)" ?
bitmap_copy() was my first choice too, but unfortunately it's doesn't guarantee
atomicity and could lead to data corruption if the target vCPU is concurrently
modifying the bitmap.