On Thu, Oct 13, 2022, Like Xu wrote:
Firstly, thanks for your comments that spewed out around vpmu.
On 23/9/2022 8:13 am, Sean Christopherson wrote:
Force vCPUs to reprogram all counters on a PMU filter change to provide
a sane ABI for userspace. Use the existing KVM_REQ_PMU to do the
programming, and take advantage of the fact that the reprogram_pmi bitmap
fits in a u64 to set all bits in a single atomic update. Note, setting
the bitmap and making the request needs to be done _after_ the SRCU
synchronization to ensure that vCPUs will reprogram using the new filter.
KVM's current "lazy" approach is confusing and non-deterministic. It's
The resolute lazy approach was introduced in patch 03, right after this change.
This is referring to the lazy recognition of the filter, not the deferred
reprogramming of the counters. Regardless of whether reprogramming is handled
via request or in-line, KVM is still lazily recognizing the new filter as vCPUs
won't picke up the new filter until the _guest_ triggers a refresh.
@@ -613,9 +615,18 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
mutex_lock(&kvm->lock);
filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
mutex_is_locked(&kvm->lock));
- mutex_unlock(&kvm->lock);
-
synchronize_srcu_expedited(&kvm->srcu);
The relative order of these two operations has been reversed
mutex_unlock() and synchronize_srcu_expedited()
, extending the execution window of the critical area of "kvm->lock)".
The motivation is also not explicitly stated in the commit message.
I'll add a blurb, after I re-convince myself that the sync+request needs to be
done under kvm->lock.
+ BUILD_BUG_ON(sizeof(((struct kvm_pmu *)0)->reprogram_pmi) >
+ sizeof(((struct kvm_pmu *)0)->__reprogram_pmi));
+
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull);
How about:
bitmap_copy(pmu->reprogram_pmi, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
to avoid further cycles on calls of
"static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit)" ?
bitmap_copy() was my first choice too, but unfortunately it's doesn't guarantee
atomicity and could lead to data corruption if the target vCPU is concurrently
modifying the bitmap.