On Wed, Jul 10, 2024 at 05:09:34PM +0100, Robin Murphy wrote:
The Arm NI-700 Network-on-Chip Interconnect has a relatively
straightforward design with a hierarchy of voltage, power, and clock
domains, where each clock domain then contains a number of interface
units and a PMU which can monitor events thereon. As such, it begets a
relatively straightforward driver to interface those PMUs with perf.
Even more so than with arm-cmn, users will require detailed knowledge of
the wider system topology in order to meaningfully analyse anything,
since the interconnect itself cannot know what lies beyond the boundary
of each inscrutably-numbered interface. Given that, for now they are
also expected to refer to the NI-700 documentation for the relevant
event IDs to provide as well. An identifier is implemented so we can
come back and add jevents if anyone really wants to.
IIUC the relevant documentation would be the NI-700 TRM:
https://developer.arm.com/documentation/101566/0203/?lang=en
... right?
[...]
+====================================
+Arm Network-on Chip Interconnect PMU
+====================================
+
+NI-700 and friends implement a distinct PMU for each clock domain within the
+interconnect. Correspondingly, the driver exposes multiple PMU devices named
+arm_ni_<x>_cd_<y>, where <x> is an (abritrary) instance identifier and <y> is
+the clock domain ID within that particular instance. If multiple NI instances
+exist within a system, the PMU devices can be correlated with the underlying
+hardware instance via sysfs parentage.
I suspect that name suffixing *might* confuse userspace in some cases,
since IIUC the perf tool tries to aggregate some_pmu_<number> instances,
and here it would presumably aggregate all the clock domains as a
arm_ni_<x>_cd PMU.
We should check how that behaves and speak to the userspace folk about
this; taking a step back we probably need a better way for determining
when things can or should be aggregated. >
I assume we don't have HW to test on, but we should be able to fake up
the sysfs hierarchy and see how "perf list" treats that.
[...]
+static int arm_ni_validate_group(struct perf_event *event)
+{
+ struct perf_event *sibling, *leader;
+ struct arm_ni_val val = { 0 };
+
+ arm_ni_validate_event(event, &val);
+
+ leader = event->group_leader;
+ if (leader != event && !arm_ni_validate_event(leader, &val))
+ return - EINVAL;
+
+ for_each_sibling_event(sibling, leader) {
+ if (!arm_ni_validate_event(sibling, &val))
+ return - EINVAL;
+ }
+ return 0;
+}
As a trivial nit, something has gone wrong with spacing and you have:
return - EINVAL;
... rather than:
return -EINVAL;
As a more substantial thing, this will trigger splats when lockdep is
enabled and an event is opened where event == event->group_leader,
because for_each_sibling_event(..., event) checks event->ctx->mutex is
held, and event->ctx isn't initialised until pmu::event_init() returns.
That's a latent bug in many PMU drivers at the moment, and is on my TODO
list to fix elsewhere. For now, can you make the above:
| static int arm_ni_validate_group(struct perf_event *event)
| {
| struct perf_event *sibling, *leader = event->group_leader;
| struct arm_ni_val val = { 0 };
|
| if (event == leader)
| return 0;
|
| arm_ni_validate_event(event, &val);
|
| if (!arm_ni_validate_event(leader, &val))
| return -EINVAL;
|
| for_each_sibling_event(sibling, leader) {
| if (!arm_ni_validate_event(sibling, &val))
| return -EINVAL;
| }
|
| return 0;
| }
... where the early exit for the (event == leader) case will avoid the
bad call.
[...]
+static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start)
+{
+ struct arm_ni_cd *cd = ni->cds + node->id;
+ const char *name;
+ int err;
+ cd->cpu = cpumask_local_spread(0, dev_to_node(ni->dev));
Can dev_to_node(ni->dev) return NUMA_NO_NODE, and if so, do we need to
error out here? ...
+static int arm_ni_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
+{
+ struct arm_ni_cd *cd;
+ int node;
+
+ cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node);
+ node = dev_to_node(cd_to_ni(cd)->dev);
+ if (node != NUMA_NO_NODE && cpu_to_node(cd->cpu) != node && cpu_to_node(cpu) == node)
+ arm_ni_pmu_migrate(cd, cpu);
+ return 0;
+}
... since we expect to handle similar here ...
+
+static int arm_ni_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
+{
+ struct arm_ni_cd *cd;
+ unsigned int target;
+ int node;
+
+ cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node);
+ if (cpu != cd->cpu)
+ return 0;
+
+ node = dev_to_node(cd_to_ni(cd)->dev);
+ target = cpumask_any_and_but(cpumask_of_node(node), cpu_online_mask, cpu);
+ if (target >= nr_cpu_ids)
+ target = cpumask_any_but(cpu_online_mask, cpu);
+
+ if (target < nr_cpu_ids)
+ arm_ni_pmu_migrate(cd, target);
+ return 0;
+}
... though not here, and overall that seems inconsistent.