Re: [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline

From: Gautham R Shenoy
Date: Wed Sep 18 2019 - 08:51:54 EST


On Wed, Sep 18, 2019 at 03:14:15PM +1000, Michael Ellerman wrote:
> "Gautham R. Shenoy" <ego@xxxxxxxxxxxxxxxxxx> writes:
> > From: "Gautham R. Shenoy" <ego@xxxxxxxxxxxxxxxxxx>
> >
> > Currently on Pseries Linux Guests, the offlined CPU can be put to one
> > of the following two states:
> > - Long term processor cede (also called extended cede)
> > - Returned to the Hypervisor via RTAS "stop-self" call.
> >
> > This is controlled by the kernel boot parameter "cede_offline=on/off".
> >
> > By default the offlined CPUs enter extended cede.
>
> Since commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") (Nov 2009)
>
> Which you wrote :)

Mea Culpa! I forgot to include the "Fixes commit 3aa565f53c39" into
Patch 1 of the series.

>
> Why was that wrong?

It was wrong from the definition of what PHYP considers as
"not-active" CPU. From the point of view of that hypervisor, a CPU is
not-active iff it is in RTAS "stop-self". Thus if a CPU is offline via
extended cede, and not using any cycles, it is still considered to be
active, by PHYP. This causes PURR accounting is broken.

>
> > The PHYP hypervisor considers CPUs in extended cede to be "active"
> > since the CPUs are still under the control fo the Linux Guests. Hence, when we change the
> > SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
> > will continue to count the values for offlined CPUs in extended cede
> > as if they are online.
> >
> > One of the expectations with PURR is that the for an interval of time,
> > the sum of the PURR increments across the online CPUs of a core should
> > equal the number of timebase ticks for that interval.
> >
> > This is currently not the case.
>
> But why does that matter? It's just some accounting stuff, does it
> actually break something meaningful?

As Naveen mentioned, it breaks lparstat which the customers are using
for capacity planning. Unfortunately we discovered this 10 years after
the feature was written.

>
> Also what does this do to the latency of CPU online/offline.

It will have a slightly higher latency compared to extended cede,
since it involves an additional rtas-call for both the start and
stopping of CPU. Will measure the exact difference and post it in the
next version.

> And what does this do on KVM?

KVM doesn't seem to depend on the state of the offline VCPU as it has
an explicit way of signalling whether a CPU is online or not, via
KVM_REG_PPC_ONLINE. In commit 7aa15842c15f ("KVM: PPC: Book3S HV: Set
RWMR on POWER8 so PURR/SPURR count correctly") we use this KVM reg to
update the count of online vCPUs in a core, and use this count to set
the RWMR correctly before dispatching the core.

So, this patchset doesn't affect KVM.

>
>
> > In the following data (Generated using
> > https://github.com/gautshen/misc/blob/master/purr_tb.py):
> >
> >
> > delta tb = tb ticks elapsed in 1 second.
> > delta purr = sum of PURR increments on online CPUs of that core in 1
> > second
> >
> > SMT=off
> > ===========================================
> > Core delta tb(apprx) delta purr
> > ===========================================
> > core00 [ 0] 512000000 69883784
> > core01 [ 8] 512000000 88782536
> > core02 [ 16] 512000000 94296824
> > core03 [ 24] 512000000 80951968
>
> Showing the expected value in another column would make this much
> clearer.

Thanks. Will update the testcase to call out the expected value.
>
> cheers
>


--
Thanks and Regards
gautham.