Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

From: Pratik Sampat
Date: Tue Jun 08 2021 - 12:42:54 EST


I've implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.

The POC has been hosted here:
https://github.com/pratiksampat/powerpc-utils/tree/H_GET_ENERGY_SCALE_INFO
and based on comments I suggestions I can further improve the
parsing logic from this initial implementation.

Sample output from the powerpc-utils tool is as follows:

# ppc64_cpu --frequency
Power and Performance Mode: XXXX
Idle Power Saver Status : XXXX
Processor Folding Status : XXXX --> Printed if Idle power save status is supported

Platform reported frequencies --> Frequencies reported from the platform's H_CALL i.e PAPR interface
min        :    NNNN GHz
max        :    NNNN GHz
static    :    NNNN GHz

Tool Computed frequencies
min        :    NNNN GHz (cpu XX)
max        :    NNNN GHz (cpu XX)
avg        :    NNNN GHz


On 04/06/21 10:05 pm, Pratik R. Sampat wrote:
Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".

H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.

The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
uint64 flags, // Per the flag request
uint64 firstAttributeId,// The attribute id
uint64 bufferAddress, // The logical address of the output buffer
uint64 bufferSize // The size in bytes of the output buffer
);

This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id

The output buffer consists of the following
1. number of attributes - 8 bytes
2. array offset to the data location - 8 bytes
3. version info - 1 byte
4. A data array of size num attributes, which contains the following:
a. attribute ID - 8 bytes
b. attribute value in number - 8 bytes
c. attribute name in string - 64 bytes
d. attribute value in string - 64 bytes

The new H_CALL exports information in direct string value format, hence
a new interface has been introduced in /sys/firmware/papr to export
this information to userspace in an extensible pass-through format.
The H_CALL returns the name, numeric value and string value. As string
values are in human readable format, therefore if the string value
exists then that is given precedence over the numeric value.

The format of exposing the sysfs information is as follows:
/sys/firmware/papr/
|-- attr_0_name
|-- attr_0_val
|-- attr_1_name
|-- attr_1_val
...

The energy information that is exported is useful for userspace tools
such as powerpc-utils. Currently these tools infer the
"power_mode_data" value in the lparcfg, which in turn is obtained from
the to be deprecated H_GET_EM_PARMS H_CALL.
On future platforms, such userspace utilities will have to look at the
data returned from the new H_CALL being populated in this new sysfs
interface and report this information directly without the need of
interpretation.

Signed-off-by: Pratik R. Sampat <psampat@xxxxxxxxxxxxx>
---
Documentation/ABI/testing/sysfs-firmware-papr | 24 +++
arch/powerpc/include/asm/hvcall.h | 21 +-
arch/powerpc/kvm/trace_hv.h | 1 +
arch/powerpc/platforms/pseries/Makefile | 3 +-
.../pseries/papr_platform_attributes.c | 203 ++++++++++++++++++
5 files changed, 250 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr
create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-papr b/Documentation/ABI/testing/sysfs-firmware-papr
new file mode 100644
index 000000000000..1c040b44ac3b
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-papr
@@ -0,0 +1,24 @@
+What: /sys/firmware/papr
+Date: June 2021
+Contact: Linux for PowerPC mailing list <linuxppc-dev@xxxxxxxxxx>
+Description : Director hosting a set of platform attributes on Linux
+ running as a PAPR guest.
+
+ Each file in a directory contains a platform
+ attribute pertaining to performance/energy-savings
+ mode and processor frequency.
+
+What: /sys/firmware/papr/attr_X_name
+ /sys/firmware/papr/attr_X_val
+Date: June 2021
+Contact: Linux for PowerPC mailing list <linuxppc-dev@xxxxxxxxxx>
+Description: PAPR attributes directory for POWERVM servers
+
+ This directory provides PAPR information. It
+ contains below sysfs attributes:
+
+ - attr_X_name: File contains the name of
+ attribute X
+
+ - attr_X_val: Numeric/string value of
+ attribute X
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index e3b29eda8074..19a2a8c77a49 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -316,7 +316,8 @@
#define H_SCM_PERFORMANCE_STATS 0x418
#define H_RPT_INVALIDATE 0x448
#define H_SCM_FLUSH 0x44C
-#define MAX_HCALL_OPCODE H_SCM_FLUSH
+#define H_GET_ENERGY_SCALE_INFO 0x450
+#define MAX_HCALL_OPCODE H_GET_ENERGY_SCALE_INFO
/* Scope args for H_SCM_UNBIND_ALL */
#define H_UNBIND_SCOPE_ALL (0x1)
@@ -631,6 +632,24 @@ struct hv_gpci_request_buffer {
uint8_t bytes[HGPCI_MAX_DATA_BYTES];
} __packed;
+#define MAX_EM_ATTRS 10
+#define MAX_EM_DATA_BYTES \
+ (sizeof(struct energy_scale_attributes) * MAX_EM_ATTRS)
+struct energy_scale_attributes {
+ __be64 attr_id;
+ __be64 attr_value;
+ unsigned char attr_desc[64];
+ unsigned char attr_value_desc[64];
+} __packed;
+
+struct hv_energy_scale_buffer {
+ __be64 num_attr;
+ __be64 array_offset;
+ __u8 data_header_version;
+ unsigned char data[MAX_EM_DATA_BYTES];
+} __packed;
+
+
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 830a126e095d..38cd0ed0a617 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -115,6 +115,7 @@
{H_VASI_STATE, "H_VASI_STATE"}, \
{H_ENABLE_CRQ, "H_ENABLE_CRQ"}, \
{H_GET_EM_PARMS, "H_GET_EM_PARMS"}, \
+ {H_GET_ENERGY_SCALE_INFO, "H_GET_ENERGY_SCALE_INFO"}, \
{H_SET_MPP, "H_SET_MPP"}, \
{H_GET_MPP, "H_GET_MPP"}, \
{H_HOME_NODE_ASSOCIATIVITY, "H_HOME_NODE_ASSOCIATIVITY"}, \
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index c8a2b0b05ac0..d14fca89ac25 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,8 @@ obj-y := lpar.o hvCall.o nvram.o reconfig.o \
of_helpers.o \
setup.o iommu.o event_sources.o ras.o \
firmware.o power.o dlpar.o mobility.o rng.o \
- pci.o pci_dlpar.o eeh_pseries.o msi.o
+ pci.o pci_dlpar.o eeh_pseries.o msi.o \
+ papr_platform_attributes.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_SCANLOG) += scanlog.o
obj-$(CONFIG_KEXEC_CORE) += kexec.o
diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
new file mode 100644
index 000000000000..8818877ff47e
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * PowerPC64 LPAR PAPR Information Driver
+ *
+ * This driver creates a sys file at /sys/firmware/papr/ which contains
+ * files keyword - value pairs that specify energy configuration of the system.
+ *
+ * Copyright 2021 IBM Corp.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/hugetlb.h>
+#include <asm/lppaca.h>
+#include <asm/hvcall.h>
+#include <asm/firmware.h>
+#include <asm/time.h>
+#include <asm/prom.h>
+#include <asm/vdso_datapage.h>
+#include <asm/vio.h>
+#include <asm/mmu.h>
+#include <asm/machdep.h>
+#include <asm/drmem.h>
+
+#include "pseries.h"
+
+#define MAX_KOBJ_ATTRS 2
+
+struct papr_attr {
+ u64 id;
+ struct kobj_attribute attr;
+} *pgattrs;
+
+struct kobject *papr_kobj;
+struct hv_energy_scale_buffer *em_buf;
+struct energy_scale_attributes *ea;
+
+static ssize_t papr_show_name(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
+ int idx, ret = 0;
+
+ /*
+ * We do not expect the name to change, hence use the old value
+ * and save a HCALL
+ */
+ for (idx = 0; idx < be64_to_cpu(em_buf->num_attr); idx++) {
+ if (pattr->id == be64_to_cpu(ea[idx].attr_id)) {
+ ret = sprintf(buf, "%s\n", ea[idx].attr_desc);
+ if (ret < 0)
+ ret = -EIO;
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static ssize_t papr_show_val(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
+ struct hv_energy_scale_buffer *t_buf;
+ struct energy_scale_attributes *t_ea;
+ int data_offset, ret = 0;
+
+ t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
+ if (t_buf == NULL)
+ return -ENOMEM;
+
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
+ pattr->id, virt_to_phys(t_buf),
+ sizeof(*t_buf));
+
+ if (ret != H_SUCCESS) {
+ pr_warn("hcall faiiled: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ data_offset = be64_to_cpu(t_buf->array_offset) -
+ (sizeof(t_buf->num_attr) +
+ sizeof(t_buf->array_offset) +
+ sizeof(t_buf->data_header_version));
+
+ t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
+
+ /* Prioritize string values over numerical */
+ if (strlen(t_ea->attr_value_desc) != 0)
+ ret = sprintf(buf, "%s\n", t_ea->attr_value_desc);
+ else
+ ret = sprintf(buf, "%llu\n", be64_to_cpu(t_ea->attr_value));
+ if (ret < 0)
+ ret = -EIO;
+out:
+ kfree(t_buf);
+ return ret;
+}
+
+static struct papr_ops_info {
+ const char *attr_name;
+ ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf);
+} ops_info[MAX_KOBJ_ATTRS] = {
+ { "name", papr_show_name },
+ { "val", papr_show_val },
+};
+
+static int __init papr_init(void)
+{
+ uint64_t num_attr;
+ int ret, idx, i, data_offset;
+
+ em_buf = kmalloc(sizeof(*em_buf), GFP_KERNEL);
+ if (em_buf == NULL)
+ return -ENOMEM;
+ /*
+ * hcall(
+ * uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
+ * uint64 flags, // Per the flag request
+ * uint64 firstAttributeId, // The attribute id
+ * uint64 bufferAddress, // The logical address of the output buffer
+ * uint64 bufferSize); // The size in bytes of the output buffer
+ */
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0, 0,
+ virt_to_phys(em_buf), sizeof(*em_buf));
+
+ if (!firmware_has_feature(FW_FEATURE_LPAR) || ret != H_SUCCESS ||
+ em_buf->data_header_version != 0x1) {
+ pr_warn("hcall faiiled: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ num_attr = be64_to_cpu(em_buf->num_attr);
+
+ /*
+ * Typecast the energy buffer to the attribute structure at the offset
+ * specified in the buffer
+ */
+ data_offset = be64_to_cpu(em_buf->array_offset) -
+ (sizeof(em_buf->num_attr) +
+ sizeof(em_buf->array_offset) +
+ sizeof(em_buf->data_header_version));
+
+ ea = (struct energy_scale_attributes *) &em_buf->data[data_offset];
+
+ papr_kobj = kobject_create_and_add("papr", firmware_kobj);
+ if (!papr_kobj) {
+ pr_warn("kobject_create_and_add papr failed\n");
+ goto out_kobj;
+ }
+
+ for (idx = 0; idx < num_attr; idx++) {
+ pgattrs = kcalloc(MAX_KOBJ_ATTRS,
+ sizeof(*pgattrs),
+ GFP_KERNEL);
+ if (!pgattrs)
+ goto out_kobj;
+
+ /*
+ * Create the sysfs attribute hierarchy for each PAPR
+ * property found
+ */
+ for (i = 0; i < MAX_KOBJ_ATTRS; i++) {
+ char buf[20];
+
+ pgattrs[i].id = be64_to_cpu(ea[idx].attr_id);
+ sysfs_attr_init(&pgattrs[i].attr.attr);
+ sprintf(buf, "%s_%d_%s", "attr", idx,
+ ops_info[i].attr_name);
+ pgattrs[i].attr.attr.name = buf;
+ pgattrs[i].attr.attr.mode = 0444;
+ pgattrs[i].attr.show = ops_info[i].show;
+
+ if (sysfs_create_file(papr_kobj, &pgattrs[i].attr.attr)) {
+ pr_warn("Failed to create papr file %s\n",
+ pgattrs[i].attr.attr.name);
+ goto out_pgattrs;
+ }
+ }
+ }
+
+ return 0;
+
+out_pgattrs:
+ for (i = 0; i < MAX_KOBJ_ATTRS; i++)
+ kfree(pgattrs);
+out_kobj:
+ kobject_put(papr_kobj);
+out:
+ kfree(em_buf);
+
+ return -ENOMEM;
+}
+
+machine_device_initcall(pseries, papr_init);