Re: [PATCH v3 1/2] perf x86: Infrastructure for exposing an Uncore unit to PMON mapping

From: Sudarikov, Roman
Date: Tue Jan 14 2020 - 08:24:43 EST


On 13.01.2020 17:34, Greg KH wrote:
On Mon, Jan 13, 2020 at 04:54:43PM +0300, roman.sudarikov@xxxxxxxxxxxxxxx wrote:
From: Roman Sudarikov <roman.sudarikov@xxxxxxxxxxxxxxx>

Intel Xeon Scalable processor family (code name Skylake-SP) makes
significant changes in the integrated I/O (IIO) architecture. The new
solution introduces IIO stacks which are responsible for managing traffic
between the PCIe domain and the Mesh domain. Each IIO stack has its own
PMON block and can handle either DMI port, x16 PCIe root port, MCP-Link
or various built-in accelerators. IIO PMON blocks allow concurrent
monitoring of I/O flows up to 4 x4 bifurcation within each IIO stack.

Software is supposed to program required perf counters within each IIO
stack and gather performance data. The tricky thing here is that IIO PMON
reports data per IIO stack but users have no idea what IIO stacks are -
they only know devices which are connected to the platform.

Understanding IIO stack concept to find which IIO stack that particular
IO device is connected to, or to identify an IIO PMON block to program
for monitoring specific IIO stack assumes a lot of implicit knowledge
about given Intel server platform architecture.

Usage example:
/sys/devices/uncore_<type>_<pmu_idx>/platform_mapping

Each Uncore unit type, by its nature, can be mapped to its own context,
for example:
1. CHA - each uncore_cha_<pmu_idx> is assigned to manage a distinct slice
of LLC capacity;
2. UPI - each uncore_upi_<pmu_idx> is assigned to manage one link of Intel
UPI Subsystem;
3. IIO - each uncore_iio_<pmu_idx> is assigned to manage one stack of the
IIO module;
4. IMC - each uncore_imc_<pmu_idx> is assigned to manage one channel of
Memory Controller.

Implementation details:
Two callbacks added to struct intel_uncore_type to discover and map Uncore
units to PMONs:
int (*get_topology)(void)
int (*set_mapping)(struct intel_uncore_pmu *pmu)

Details of IIO Uncore unit mapping to IIO PMON:
Each IIO stack is either DMI port, x16 PCIe root port, MCP-Link or various
built-in accelerators. For Uncore IIO Unit type, the platform_mapping file
holds bus numbers of devices, which can be monitored by that IIO PMON block
on each die.

Co-developed-by: Alexander Antonov <alexander.antonov@xxxxxxxxx>
Reviewed-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
Signed-off-by: Alexander Antonov <alexander.antonov@xxxxxxxxx>
Signed-off-by: Roman Sudarikov <roman.sudarikov@xxxxxxxxxxxxxxx>
---
arch/x86/events/intel/uncore.c | 37 +++++++++++++++++++++++++++++++++-
arch/x86/events/intel/uncore.h | 9 ++++++++-
2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 86467f85c383..2c53ad44b51f 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -905,6 +905,32 @@ static void uncore_types_exit(struct intel_uncore_type **types)
uncore_type_exit(*types);
}
+static struct attribute *empty_attrs[] = {
+ NULL,
+};
+
+static const struct attribute_group empty_group = {
+ .attrs = empty_attrs,
+};
What is this for? Why is it needed? It doesn't do anything?

+
+static ssize_t platform_mapping_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct intel_uncore_pmu *pmu = dev_get_drvdata(dev);
+
+ return snprintf(buf, PAGE_SIZE - 1, "%s\n", pmu->mapping);
+}
+static DEVICE_ATTR_RO(platform_mapping);
You are creating new sysfs attributes without any Documentation/ABI
updates, which is not ok. Please fix this up for your next round of
patches.

+static struct attribute *mapping_attrs[] = {
+ &dev_attr_platform_mapping.attr,
+ NULL,
+};
+
+static const struct attribute_group uncore_mapping_group = {
+ .attrs = mapping_attrs,
+};
ATTRIBUTE_GROUPS()?

Messing around with single attribute_group lists is usually a sign that
something is really wrong as the driver core should handle arrays of
attribute group lists instead.


+
static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
{
struct intel_uncore_pmu *pmus;
@@ -950,10 +976,19 @@ static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
attr_group->attrs[j] = &type->event_descs[j].attr.attr;
type->events_group = &attr_group->group;
- }
+ } else
+ type->events_group = &empty_group;
Why???
Hi Greg,

Technically, what I'm trying to do is to add an attribute which depends on
the uncore pmu type and BIOS support. New attribute is added to the end of
the attribute groups array. It appears that the events attribute group is
optional for most of the uncore pmus for x86/intel, i.e. events_group = NULL.

NULL element in the middle of the attribute groups array "hides" all others
attribute groups which follows that element.

To work around it, embedded NULL elements should be either removed from
the attribute groups array [1] or replaced with empty attribute; see implementation above.

If both approaches are incorrect then please advice what would be correct
solution for that case.

[1] https://lore.kernel.org/lkml/20191210091451.6054-3-roman.sudarikov@xxxxxxxxxxxxxxx/

Thanks,
Roman
Didn't we fix up the x86 attributes to work properly and not mess around
with trying to merge groups and the like? Please don't perpetuate that
more...

thanks,

greg k-h