[PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline

From: Gautham R. Shenoy
Date: Thu Sep 12 2019 - 06:36:16 EST


From: "Gautham R. Shenoy" <ego@xxxxxxxxxxxxxxxxxx>

Currently on Pseries Linux Guests, the offlined CPU can be put to one
of the following two states:
- Long term processor cede (also called extended cede)
- Returned to the Hypervisor via RTAS "stop-self" call.

This is controlled by the kernel boot parameter "cede_offline=on/off".

By default the offlined CPUs enter extended cede. The PHYP hypervisor
considers CPUs in extended cede to be "active" since the CPUs are
still under the control fo the Linux Guests. Hence, when we change the
SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
will continue to count the values for offlined CPUs in extended cede
as if they are online.

One of the expectations with PURR is that the for an interval of time,
the sum of the PURR increments across the online CPUs of a core should
equal the number of timebase ticks for that interval.

This is currently not the case.

In the following data (Generated using
https://github.com/gautshen/misc/blob/master/purr_tb.py):


delta tb = tb ticks elapsed in 1 second.
delta purr = sum of PURR increments on online CPUs of that core in 1
second

SMT=off
===========================================
Core delta tb(apprx) delta purr
===========================================
core00 [ 0] 512000000 69883784
core01 [ 8] 512000000 88782536
core02 [ 16] 512000000 94296824
core03 [ 24] 512000000 80951968

SMT=2
===========================================
Core delta tb(apprx) delta purr
===========================================
core00 [ 0,1] 512000000 136147792
core01 [ 8,9] 512000000 128636784
core02 [ 16,17] 512000000 135426488
core03 [ 24,25] 512000000 153027520

SMT=4
===================================================
Core delta tb(apprx) delta purr
===================================================
core00 [ 0,1,2,3] 512000000 258331616
core01 [ 8,9,10,11] 512000000 274220072
core02 [ 16,17,18,19] 512000000 260013736
core03 [ 24,25,26,27] 512000000 260079672

SMT=on
===================================================================
Core delta tb(apprx) delta purr
===================================================================
core00 [ 0,1,2,3,4,5,6,7] 512000000 512941248
core01 [ 8,9,10,11,12,13,14,15] 512000000 512936544
core02 [ 16,17,18,19,20,21,22,23] 512000000 512931544
core03 [ 24,25,26,27,28,29,30,31] 512000000 512923800

This patchset addresses this issue by ensuring that by default, the
offlined CPUs are returned to the Hypervisor via RTAS "stop-self" call
by changing the default value of "cede_offline_enabled" to false.

The patchset also defines a new sysfs attribute
"/sys/device/system/cpu/cede_offline_enabled" on PSeries Linux guests
to allow userspace programs to change the state into which the
offlined CPU need to be put to at runtime. This is intended for
userspace programs that fold CPUs for the purpose of saving energy
when the utilization is low. Setting the value of this attribute
ensures that subsequent CPU offline operations will put the offlined
CPUs to extended cede. However, it will cause inconsistencies in the
PURR accounting. Clearing the attribute will make the offlined CPUs
call the RTAS "stop-self" call thereby returning the CPU to the
hypervisor.

With the patches,

SMT=off
===========================================
Core delta tb(apprx) delta purr
===========================================
core00 [ 0] 512000000 512527568
core01 [ 8] 512000000 512556128
core02 [ 16] 512000000 512590016
core03 [ 24] 512000000 512589440

SMT=2
===========================================
Core delta tb(apprx) delta purr
===========================================
core00 [ 0,1] 512000000 512635328
core01 [ 8,9] 512000000 512610416
core02 [ 16,17] 512000000 512639360
core03 [ 24,25] 512000000 512638720

SMT=4
===================================================
Core delta tb(apprx) delta purr
===================================================
core00 [ 0,1,2,3] 512000000 512757328
core01 [ 8,9,10,11] 512000000 512727920
core02 [ 16,17,18,19] 512000000 512754712
core03 [ 24,25,26,27] 512000000 512739040

SMT=on
==============================================================
Core delta tb(apprx) delta purr
==============================================================
core00 [ 0,1,2,3,4,5,6,7] 512000000 512920936
core01 [ 8,9,10,11,12,13,14,15] 512000000 512878728
core02 [ 16,17,18,19,20,21,22,23] 512000000 512921192
core03 [ 24,25,26,27,28,29,30,31] 512000000 512924816

Gautham R. Shenoy (2):
pseries/hotplug-cpu: Change default behaviour of cede_offline to "off"
pseries/hotplug-cpu: Add sysfs attribute for cede_offline

Documentation/ABI/testing/sysfs-devices-system-cpu | 14 ++++
Documentation/core-api/cpu_hotplug.rst | 2 +-
arch/powerpc/platforms/pseries/hotplug-cpu.c | 80 ++++++++++++++++++++--
3 files changed, 88 insertions(+), 8 deletions(-)

--
1.9.4