Re: [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed

From: Sean Christopherson
Date: Fri Oct 14 2022 - 12:26:57 EST


On Fri, Oct 14, 2022, Like Xu wrote:
> For subject title, the "reprogram" bit is _only_ used to keep track of
> pmc->perf_event,
> not whether the counter is disabled.
>
> On 23/9/2022 8:13 am, Sean Christopherson wrote:
> > When reprogramming a counter, clear the counter's "reprogram pending" bit
> > if the counter is disabled (by the guest) or is disallowed (by the
> > userspace filter). In both cases, there's no need to re-attempt
> > programming on the next coincident KVM_REQ_PMU as enabling the counter by
> > either method will trigger reprogramming.
>
> Perhaps we could move the check_pmu_event_filter() towards the top of the
> call stack.

Top of what call stack exactly? reprogram_counter() has multiple callers, and
the filter check is already near the top of reprogram_counter().

> > @@ -245,7 +245,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
> > perf_event_enable(pmc->perf_event);
> > pmc->is_paused = false;
> > - clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
>
> This change is very suspicious.

In the current code, pmc_resume_counter() clears the bit iff it returns true.
With this patch, reprogram_counter() is guarnteed to clear the bit if
pmc_resume_counter() returns true.

if (pmc->current_config == new_config && pmc_resume_counter(pmc))
goto reprogram_complete;

pmc_release_perf_event(pmc);

pmc->current_config = new_config;

/*
* If reprogramming fails, e.g. due to contention, leave the counter's
* regprogram bit set, i.e. opportunistically try again on the next PMU
* refresh. Don't make a new request as doing so can stall the guest
* if reprogramming repeatedly fails.
*/
if (pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
(eventsel & pmu->raw_event_mask),
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
eventsel & ARCH_PERFMON_EVENTSEL_INT))
return;

reprogram_complete:
clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
pmc->prev_counter = 0;

> > @@ -324,16 +323,27 @@ void reprogram_counter(struct kvm_pmc *pmc)
> > }
> > if (pmc->current_config == new_config && pmc_resume_counter(pmc))
> > - return;
> > + goto reprogram_complete;
> > pmc_release_perf_event(pmc);
> > pmc->current_config = new_config;
> > - pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
> > - (eventsel & pmu->raw_event_mask),
> > - !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> > - !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> > - eventsel & ARCH_PERFMON_EVENTSEL_INT);
> > +
> > + /*
> > + * If reprogramming fails, e.g. due to contention, leave the counter's
> > + * regprogram bit set, i.e. opportunistically try again on the next PMU
>
> This is what we need, in the upstream case we need to keep trying regprogram
> to try to occupy the hardware.

Maybe in an ideal world, but in reality KVM can't guarantee that programming will
ever succeed. Making a new KVM_REQ_PMU will prevent entering the guest, i.e. will
effectively hang the vCPU. Breaking the vPMU isn't great, but hanging the guest
is worse.

> > + * refresh. Don't make a new request as doing so can stall the guest
> > + * if reprogramming repeatedly fails.
>
> This does not happen, the guest still enters w/p perf_event backend support
> and the vPMU is broken until the next vm-exit.
>
> There is no need to endlessly call kvm_pmu_handle_event() when reprogram fails.

Yes, that's what the above comment is calling out, or at least trying to call out.