Re: [PATCH v5 6/6] perf: ARM DynamIQ Shared Unit PMU support

From: Suzuki K Poulose
Date: Fri Aug 18 2017 - 06:43:44 EST


On 17/08/17 16:57, Mark Rutland wrote:
On Thu, Aug 17, 2017 at 03:52:24PM +0100, Suzuki K Poulose wrote:
On 16/08/17 15:10, Mark Rutland wrote:
On Tue, Aug 08, 2017 at 12:37:26PM +0100, Suzuki K Poulose wrote:

+static struct attribute *dsu_pmu_event_attrs[] = {
+ DSU_EVENT_ATTR(cycles, 0x11),
+ DSU_EVENT_ATTR(bus_acecss, 0x19),
+ DSU_EVENT_ATTR(memory_error, 0x1a),
+ DSU_EVENT_ATTR(bus_cycles, 0x1d),
+ DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
+ DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
+ DSU_EVENT_ATTR(l3d_cache, 0x2b),
+ DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),

MAX_COMMON_EVENTS seems to be 0x40, so are we just assuming the below
are implemented?

If so, why bother exposing them at all? We can't know that they're going
to work...

Thats right. The only defending argument is that the event codes are specific
to the DynamIQ, unlike the COMMON_EVENTS which matches the ARMv8 PMU event codes.
So, someone would need to carefully find the event code for a particular event.
Having these entries would make it easier for the user to do the profiling.

Also, future revisions of the DSU could potentially expose more events. So there
wouldn't be any way to tell the user (provided there aren't any changes to the
programming model and we decide to reuse the same compatible string) what we *could*
potentially support. In short, this is not a problem at the moment and we could
do something about it as and when required.

I'd rather that we only describes those that we can probe from the
PMCEID* registers, and for the rest, left the user to consult a manual.

I can well imagine future variants of this IP supporing different
events, and I'd prefer to avoid poriflerating tables for those.

[...]

Fair enough. I will trim the list.


+static struct attribute *dsu_pmu_cpumask_attrs[] = {
+ DSU_CPUMASK_ATTR(cpumask, DSU_ACTIVE_CPU_MASK),
+ DSU_CPUMASK_ATTR(supported_cpus, DSU_SUPPORTED_CPU_MASK),
+ NULL,
+};

Normally we only expose one mask.

Why do we need the supported cpu mask? What is the intended use-case?

Thats just to let the user know the CPUs bound to this PMU instance.

I guess that can be useful, though the cpumasks we expose today are
confusing as-is, and this is another point of confusion.

We could drop this for now, and add it when requested, or we should try
to make the naming clearer somehow -- "supported" can be read in a
number of ways.

How about dsu_cpus or connected_cpus ?


Further, it would be worth documenting this PMU under
Documentation/perf/.

[...]


OK


+static int dsu_pmu_add(struct perf_event *event, int flags)
+{
+ struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+ struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+ struct hw_perf_event *hwc = &event->hw;
+ int idx;
+
+ if (!cpumask_test_cpu(smp_processor_id(), &dsu_pmu->supported_cpus))
+ return -ENOENT;

This shouldn't ever happen, and we can check against the active cpumask,
with a WARN_ON_ONCE(). We have to do this for CPU PMUs since they
support events which can migrate with tasks, but that's not the case
here.

[...]

But, we have to make sure we are reading the events from a CPU within this
DSU in case we have multiple DSU units.

Regardless of how many instances there are, the core should *never*
add() a CPU-bound event (which DSU events are) on another CPU. To do so
would be a major bug.

So if this is just a paranoid check, we should WARN_ON_ONCE().
Otherwise, it's unnecessary.

OK


+/**
+ * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
+ */
+static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
+{
+ int i = 0, n, cpu;
+ struct device_node *cpu_node;
+
+ n = of_count_phandle_with_args(dev, "cpus", NULL);
+ if (n <= 0)
+ return -ENODEV;
+ for (; i < n; i++) {
+ cpu_node = of_parse_phandle(dev, "cpus", i);
+ if (!cpu_node)
+ break;
+ cpu = of_cpu_node_to_id(cpu_node);
+ of_node_put(cpu_node);
+ if (cpu < 0)
+ break;

I believe this can happen if the kernel's nr_cpus were capped below the
number of CPUs in the system. So we need to iterate all entries of the
cpus list regardless.


Good point. Initial version of the driver used to ignore the failures, but
was changed later. I will roll it back.

Thanks. If you could add a comment as to why, that'll hopefully avoid
anyone trying to "fix" the logic later.

[...]


Sure

+ cpmcr = __dsu_pmu_read_pmcr();
+ dsu_pmu->num_counters = ((cpmcr >> CLUSTERPMCR_N_SHIFT) &
+ CLUSTERPMCR_N_MASK);
+ if (!dsu_pmu->num_counters)
+ return;

Is that valid? What range of values makes sense?

[...]


We should at least have one counter implemented (excluding the cycle counter).
And yes, we should check if the num_counters <= 31.

Ok.

+ /*
+ * Find one CPU in the DSU to handle the IRQs.
+ * It is highly unlikely that we would fail
+ * to find one, given the probing has succeeded.
+ */
+ cpu = dsu_pmu_get_online_cpu(dsu_pmu);
+ if (cpu >= nr_cpu_ids)
+ return -ENODEV;
+ cpumask_set_cpu(cpu, &dsu_pmu->active_cpu);
+ rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu);

Is setting the affinity hint strong enough?

Can the affinity be changed behind the back of this driver?

Did you mean to use "force"d affinity settings ? If so, modules
don't have the luxury of doing that.

Perhaps. We absolutely need to ensure that the driver makes the IRQ
affine to the active CPU, and no other SW will change the affinitiy of
the IRQ.

Otherwise, the IRQ handler is dangerous, violating locking requirements,
potentially corrupting memory, etc.

Hence this one. I think that also brings up the problem where we could
be reading the counters from a different CPU than we requested. So, I
think it would be good to keep the CPU check, wherever we could access
the PMU.

While I'm happy to have that as a paranoid sanity check, we cannot rely
upon that for correctness. We must ensure that we amange the interupt
affinity correctly.

If that means we need the forced affinity helpers, we must ensure that
we have access to those.

As per our offline discussion, I will go ahead with set_affinity_hint and
IRQ_NO_BALANCING flag, so that the IRQ affinity is not disturbed by the
userspace.

Suzuki