Re: [PATCH] KVM: arm64: Properly restore PMU state during live-migration

From: Marc Zyngier
Date: Tue Jun 08 2021 - 04:18:52 EST


On Mon, 07 Jun 2021 19:34:08 +0100,
"Jain, Jinank" <jinankj@xxxxxxxxx> wrote:
>
> Hi Marc.
>
> On Mon, 2021-06-07 at 17:35 +0100, Marc Zyngier wrote:
> > CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender
> > and know the content is safe.
> >
> >
> >
> > On Mon, 07 Jun 2021 17:05:01 +0100,
> > "Jain, Jinank" <jinankj@xxxxxxxxx> wrote:
> > > On Thu, 2021-06-03 at 17:03 +0100, Marc Zyngier wrote:
> > > > Hi Jinank,
> > > >
> > > > On Thu, 03 Jun 2021 12:05:54 +0100,
> > > > Jinank Jain <jinankj@xxxxxxxxx> wrote:
> > > > > Currently if a guest is live-migrated while it is actively
> > > > > using
> > > > > perf
> > > > > counters, then after live-migrate it will notice that all
> > > > > counters
> > > > > would
> > > > > suddenly start reporting 0s. This is due to the fact we are not
> > > > > re-creating the relevant perf events inside the kernel.
> > > > >
> > > > > Usually on live-migration guest state is restored using
> > > > > KVM_SET_ONE_REG
> > > > > ioctl interface, which simply restores the value of PMU
> > > > > registers
> > > > > values but does not re-program the perf events so that the
> > > > > guest
> > > > > can seamlessly
> > > > > use these counters even after live-migration like it was doing
> > > > > before
> > > > > live-migration.
> > > > >
> > > > > Instead there are two completely different code path between
> > > > > guest
> > > > > accessing PMU registers and VMM restoring counters on
> > > > > live-migration.
> > > > >
> > > > > In case of KVM_SET_ONE_REG:
> > > > >
> > > > > kvm_arm_set_reg()
> > > > > ...... kvm_arm_sys_reg_set_reg()
> > > > > ........... reg_from_user()
> > > > >
> > > > > but in case when guest tries to access these counters:
> > > > >
> > > > > handle_exit()
> > > > > ..... kvm_handle_sys_reg()
> > > > > ..........perform_access()
> > > > > ...............access_pmu_evcntr()
> > > > > ...................kvm_pmu_set_counter_value()
> > > > > .......................kvm_pmu_create_perf_event()
> > > > >
> > > > > The drawback of using the KVM_SET_ONE_REG interface is that the
> > > > > host pmu
> > > > > events which were registered for the source instance and not
> > > > > present for
> > > > > the destination instance.
> > > >
> > > > I can't parse this sentence. Do you mean "are not present"?
> > > >
> > > > > Thus passively restoring PMCR_EL0 using
> > > > > KVM_SET_ONE_REG interface would not create the necessary host
> > > > > pmu
> > > > > events
> > > > > which are crucial for seamless guest experience across live
> > > > > migration.
> > > > >
> > > > > In ordet to fix the situation, on first vcpu load we should
> > > > > restore
> > > > > PMCR_EL0 in the same exact way like the guest was trying to
> > > > > access
> > > > > these counters. And then we will also recreate the relevant
> > > > > host
> > > > > pmu
> > > > > events.
> > > > >
> > > > > Signed-off-by: Jinank Jain <jinankj@xxxxxxxxx>
> > > > > Cc: Alexander Graf (AWS) <graf@xxxxxxxxx>
> > > > > Cc: Marc Zyngier <maz@xxxxxxxxxx>
> > > > > Cc: James Morse <james.morse@xxxxxxx>
> > > > > Cc: Alexandru Elisei <alexandru.elisei@xxxxxxx>
> > > > > Cc: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
> > > > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> > > > > Cc: Will Deacon <will@xxxxxxxxxx>
> > > > > ---
> > > > > arch/arm64/include/asm/kvm_host.h | 1 +
> > > > > arch/arm64/kvm/arm.c | 1 +
> > > > > arch/arm64/kvm/pmu-emul.c | 10 ++++++++--
> > > > > arch/arm64/kvm/pmu.c | 15 +++++++++++++++
> > > > > include/kvm/arm_pmu.h | 3 +++
> > > > > 5 files changed, 28 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/kvm_host.h
> > > > > b/arch/arm64/include/asm/kvm_host.h
> > > > > index 7cd7d5c8c4bc..2376ad3c2fc2 100644
> > > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > > @@ -745,6 +745,7 @@ static inline int
> > > > > kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> > > > > void kvm_set_pmu_events(u32 set, struct perf_event_attr
> > > > > *attr);
> > > > > void kvm_clr_pmu_events(u32 clr);
> > > > >
> > > > > +void kvm_vcpu_pmu_restore(struct kvm_vcpu *vcpu);
> > > > > void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
> > > > > void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
> > > > > #else
> > > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > > index e720148232a0..c66f6d16ec06 100644
> > > > > --- a/arch/arm64/kvm/arm.c
> > > > > +++ b/arch/arm64/kvm/arm.c
> > > > > @@ -408,6 +408,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu
> > > > > *vcpu,
> > > > > int cpu)
> > > > > if (has_vhe())
> > > > > kvm_vcpu_load_sysregs_vhe(vcpu);
> > > > > kvm_arch_vcpu_load_fp(vcpu);
> > > > > + kvm_vcpu_pmu_restore(vcpu);
> > > >
> > > > If this only needs to be run once per vcpu, why not trigger it
> > > > from
> > > > kvm_arm_pmu_v3_enable(), which is also called once per vcpu?
> > > >
> > > > This can done on the back of a request, saving most of the
> > > > overhead
> > > > and not requiring any extra field. Essentially, something like
> > > > the
> > > > (untested) patch below.
> > > >
> > > > > kvm_vcpu_pmu_restore_guest(vcpu);
> > > > > if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
> > > > > kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
> > > > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-
> > > > > emul.c
> > > > > index fd167d4f4215..12a40f4b5f0d 100644
> > > > > --- a/arch/arm64/kvm/pmu-emul.c
> > > > > +++ b/arch/arm64/kvm/pmu-emul.c
> > > > > @@ -574,10 +574,16 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu
> > > > > *vcpu, u64 val)
> > > > > kvm_pmu_disable_counter_mask(vcpu, mask);
> > > > > }
> > > > >
> > > > > - if (val & ARMV8_PMU_PMCR_C)
> > > > > + /*
> > > > > + * Cycle counter needs to reset in case of first vcpu
> > > > > load.
> > > > > + */
> > > > > + if (val & ARMV8_PMU_PMCR_C ||
> > > > > !kvm_arm_pmu_v3_restored(vcpu))
> > > >
> > > > Why? There is no architectural guarantee that a counter resets to
> > > > 0
> > > > without writing PMCR_EL0.C. And if you want the guest to continue
> > > > counting where it left off, resetting the counter is at best
> > > > counter-productive.
> > >
> > > Without this we would not be resetting PMU which is required for
> > > creating host perf events. With the patch that you suggested we are
> > > restoring PMCR_EL0 properly but still missing recreation of host
> > > perf
> > > events.
> >
> > How? The request that gets set on the first vcpu run will call
> > kvm_pmu_handle_pmcr() -> kvm_pmu_enable_counter_mask() ->
> > kvm_pmu_create_perf_event(). What are we missing?
> >
>
> I found out what I was missing. I was working with an older kernel
> which was missing this upstream patch:
>
> https://lore.kernel.org/lkml/20200124142535.29386-3-eric.auger@xxxxxxxxxx/

:-(

Please test whatever you send with an upstream kernel. Actually,
please *develop* on an upstream kernel. This will avoid this kind of
discussion where we talk past each other, and make it plain that your
production kernel is lacking all sorts of fixes.

Now, can you please state whether or not this patch fixes it for you
*on an upstream kernel*? I have no interest in results from a
production kernel.

M.

--
Without deviation from the norm, progress is not possible.