RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

From: Shameerali Kolothum Thodi
Date: Wed Oct 03 2018 - 04:46:21 EST


Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@xxxxxxx]
> Sent: 02 October 2018 15:11
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
> lorenzo.pieralisi@xxxxxxx; robin.murphy@xxxxxxx
> Cc: mark.rutland@xxxxxxx; vkilari@xxxxxxxxxxxxxx;
> neil.m.leeder@xxxxxxxxx; pabba@xxxxxxxxxxxxxx; John Garry
> <john.garry@xxxxxxxxxx>; will.deacon@xxxxxxx; rruigrok@xxxxxxxxxxxxxx;
> Linuxarm <linuxarm@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux-
> acpi@xxxxxxxxxxxxxxx; Guohanjun (Hanjun Guo) <guohanjun@xxxxxxxxxx>;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
>
> Hi Shameer,
>
> I have a few comments below, mostly naive since I don't know anything
> about perf drivers.

Thanks for taking a look at this.

> On 21/09/2018 16:08, Shameer Kolothum wrote:
> > From: Neil Leeder <nleeder@xxxxxxxxxxxxxx>
> >
> > Adds a new driver to support the SMMUv3 PMU and add it into the
> > perf events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as smmuv3_pmcg_<phys_addr_page>
> where
> > <phys_addr_page> is the physical page address of the SMMU PMCG.
> > For example, the PMCG at 0xff88840000 is named smmuv3_pmcg_ff88840
> >
> > Filtering by stream id is done by specifying filtering parameters
> > with the event. options are:
> > filter_enable - 0 = no filtering, 1 = filtering enabled
> > filter_span - 0 = exact match, 1 = pattern match
> > filter_stream_id - pattern to filter against
> > Further filtering information is available in the SMMU documentation.
> >
> > Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> > filter_span=1,filter_stream_id=0x42/ -a pwd
> > Applies filter pattern 0x42 to transaction events.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder <nleeder@xxxxxxxxxxxxxx>
> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@xxxxxxxxxx>
> > ---
> > drivers/perf/Kconfig | 9 +
> > drivers/perf/Makefile | 1 +
> > drivers/perf/arm_smmuv3_pmu.c | 736
> ++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 746 insertions(+)
> > create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 08ebaf7..34969dd 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> > depends on ARM_PMU && ACPI
> > def_bool y
> >
> > +config ARM_SMMU_V3_PMU
> > + bool "ARM SMMUv3 Performance Monitors {Extension}"
>
> Why the curly braces? I didn't find that notation in other Kconfig files

Hmm..That's probably because I just copied a suggestion from previous
review. I will double check and correct it.

> > + depends on ARM64 && ACPI && ARM_SMMU_V3
> > + help
> > + Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > + on ARM-based systems.
> > + Adds the SMMU PMU into the perf events subsystem for
> > + monitoring SMMU performance events.
> > +
> > config ARM_DSU_PMU
> > tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> > depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b3902bd..f10a932 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> [...]
> > +/*
> > + * This driver adds support for perf events to use the Performance
> > + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node
> > + * to monitor that node.
> > + *
> > + * SMMUv3 PMCG devices are named as
> smmuv3_pmcg_<phys_addr_page> where
> > + * <phys_addr_page> is the physical page address of the SMMU PMCG.
> > + * For example, the PMCG at 0xff88840000 is named
> smmuv3_pmcg_ff88840
> > +
> > + * Filtering by stream id is done by specifying filtering parameters
> > + * with the event. options are:
> > + * filter_enable - 0 = no filtering, 1 = filtering enabled
> > + * filter_span - 0 = exact match, 1 = pattern match
> > + * filter_stream_id - pattern to filter against
> > + * Further filtering information is available in the SMMU documentation.
> > + *
> > + * Example: perf stat -e
> smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> > + * filter_span=1,filter_stream_id=0x42/ -a pwd
>
> I'm curious, why is pwd used as example? Wouldn't something like netperf
> be a more realistic workload?

Agree. Thatâs a more relevant workload example.

> > + * Applies filter pattern 0x42 to transaction events.
>
> Adding that this pattern matches SIDs 0x42 and 0x43 might be helpful,
> since span filtering is a bit awkward

Ok.

> [...]
> > +#define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _size,
> _shift) \
> > + static inline u32 get_##_name(struct perf_event *event) \
> > + { \
> > + return (event->attr._config >> (_shift)) & \
> > + GENMASK_ULL((_size) - 1, 0); \
>
> FIELD_GET could make this slightly nicer

Sure.

> > + }
> > +
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(event, config, 16, 0);
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(filter_stream_id, config1, 32, 0);
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(filter_span, config1, 1, 32);
> > +SMMU_PMU_EVENT_ATTR_EXTRACTOR(filter_enable, config1, 1, 34);
>
> filter_enable is at bit 33, not 34

Thanks.

> [...]
> > +static void smmu_pmu_set_period(struct smmu_pmu *smmu_pmu,
> > + struct hw_perf_event *hwc)
> > +{
> > + u32 idx = hwc->idx;
> > + u64 new;
> > +
> > + /*
> > + * We limit the max period to half the max counter value of the
> counter
> > + * size, so that even in the case of extreme interrupt latency the
> > + * counter will (hopefully) not wrap past its initial value.
> > + */
> > + new = smmu_pmu->counter_mask >> 1;
>
> Cool trick :)
>
> > +
> > + local64_set(&hwc->prev_count, new);
> > + smmu_pmu_counter_set_value(smmu_pmu, idx, new);
> > +}
> > +
> > +static unsigned int smmu_pmu_get_event_idx(struct smmu_pmu
> *smmu_pmu)
> > +{
> > + unsigned int idx;
> > + unsigned int num_ctrs = smmu_pmu->num_counters;
> > +
> > + idx = find_first_zero_bit(smmu_pmu->used_counters, num_ctrs);
> > + if (idx == num_ctrs)
> > + /* The counters are all in use. */
> > + return -EAGAIN;
>
> Then this function's return type probably shouldn't be unsigned

Oops!

> > +
> > + set_bit(idx, smmu_pmu->used_counters);
> > +
> > + return idx;
> > +}
> > +
> > +/*
> > + * Implementation of abstract pmu functionality required by
> > + * the core perf events code.
> > + */
> > +
> > +static int smmu_pmu_event_init(struct perf_event *event)
> > +{
> > + struct hw_perf_event *hwc = &event->hw;
> > + struct smmu_pmu *smmu_pmu = to_smmu_pmu(event->pmu);
> > + struct device *dev = smmu_pmu->dev;
> > + struct perf_event *sibling;
> > + u32 event_id;
> > +
> > + if (event->attr.type != event->pmu->type)
> > + return -ENOENT;
> > +
> > + if (hwc->sample_period) {
> > + dev_dbg_ratelimited(dev, "Sampling not supported\n");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + if (event->cpu < 0) {
> > + dev_dbg_ratelimited(dev, "Per-task mode not supported\n");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + /* We cannot filter accurately so we just don't allow it. */
> > + if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > + event->attr.exclude_hv || event->attr.exclude_idle) {
> > + dev_dbg_ratelimited(dev, "Can't exclude execution levels\n");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + /* Verify specified event is supported on this PMU */
> > + event_id = get_event(event);
> > + if (((event_id < SMMU_ARCH_MAX_EVENT_ID) &&
> > + (!test_bit(event_id, smmu_pmu->supported_events))) ||
> > + (event_id > SMMU_IMPDEF_MAX_EVENT_ID)) {
>
> >= ?

I was slightly confused by the spec here as it says,

Performance events are indicated by a numeric ID, in the following ranges:
â 0x0000 to 0x007F: Architected events
â 0x0080 to 0xFFFF: IMPLEMENTATION DEFINED events

It looks to me the ids are valid including those limits.

> > + dev_dbg_ratelimited(dev, "Invalid event %d for this PMU\n",
> > + event_id);
> > + return -EINVAL;
> > + }
>
> [...]
> > +static struct attribute *smmu_pmu_events[] = {
> > + SMMU_EVENT_ATTR(cycles, 0),
> > + SMMU_EVENT_ATTR(transaction, 1),
> > + SMMU_EVENT_ATTR(tlb_miss, 2),
> > + SMMU_EVENT_ATTR(config_cache_miss, 3),
> > + SMMU_EVENT_ATTR(trans_table_walk, 4),
>
> This name is a bit misleading - as far as I understand the event doesn't
> count table walks, but memory accesses performed during a walk.

Ok. I will take a look at this.

> > + SMMU_EVENT_ATTR(config_struct_access, 5),
> > + SMMU_EVENT_ATTR(pcie_ats_trans_rq, 6),
> > + SMMU_EVENT_ATTR(pcie_ats_trans_passed, 7),
> > + NULL
> > +};
>
> [...]
> > +static int smmu_pmu_setup_irq(struct smmu_pmu *pmu)
> > +{
> > + unsigned long flags = IRQF_NOBALANCING | IRQF_SHARED |
> IRQF_NO_THREAD;
>
> Why do you need SHARED?

Though I am not aware of any implementation that is sharing this
interrupt, I think it is good to keep it that way as we are anyway
checking for the OVSSET0 status register in the interrupt handler.

> > + int irq, ret = -ENXIO;
> > +
> > + irq = pmu->irq;
> > + if (irq)
> > + ret = devm_request_irq(pmu->dev, irq,
> smmu_pmu_handle_irq,
> > + flags, "smmuv3-pmu", pmu);
> > + return ret;
> > +}
> > +
> > +static void smmu_pmu_reset(struct smmu_pmu *smmu_pmu)
> > +{
> > + smmu_pmu_disable(&smmu_pmu->pmu);
> > +
> > + /* Disable counter and interrupt */
> > + writeq_relaxed(smmu_pmu->counter_present_mask,
> > + smmu_pmu->reg_base + SMMU_PMCG_CNTENCLR0);
> > + writeq_relaxed(smmu_pmu->counter_present_mask,
> > + smmu_pmu->reg_base + SMMU_PMCG_INTENCLR0);
> > + writeq_relaxed(smmu_pmu->counter_present_mask,
> > + smmu_pmu->reloc_base + SMMU_PMCG_OVSCLR0);
> > +
> > + writeq_relaxed(0, smmu_pmu->reg_base + SMMU_PMCG_IRQ_CFG0);
>
> smmu_pmu_setup_msi clears CFG0 a second time, so this line can be
> deleted in patch 3, or moved to smmu_pmu_setup_msi right away.

Ok. I will move this to smmu_pmu_setup_msi.

> > +}
> > +
> > +static int smmu_pmu_probe(struct platform_device *pdev)
> > +{
> > + struct smmu_pmu *smmu_pmu;
> > + struct resource *res_0, *res_1;
> > + u32 cfgr, reg_size;
> > + u64 ceid_64[2];
> > + int irq, err;
> > + char *name;
> > + struct device *dev = &pdev->dev;
> > +
> > + smmu_pmu = devm_kzalloc(dev, sizeof(*smmu_pmu), GFP_KERNEL);
> > + if (!smmu_pmu)
> > + return -ENOMEM;
> > +
> > + smmu_pmu->dev = dev;
> > +
> > + platform_set_drvdata(pdev, smmu_pmu);
> > + smmu_pmu->pmu = (struct pmu) {
> > + .task_ctx_nr = perf_invalid_context,
> > + .pmu_enable = smmu_pmu_enable,
> > + .pmu_disable = smmu_pmu_disable,
> > + .event_init = smmu_pmu_event_init,
> > + .add = smmu_pmu_event_add,
> > + .del = smmu_pmu_event_del,
> > + .start = smmu_pmu_event_start,
> > + .stop = smmu_pmu_event_stop,
> > + .read = smmu_pmu_event_read,
> > + .attr_groups = smmu_pmu_attr_grps,
> > + };
> > +
> > + res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > + smmu_pmu->reg_base = devm_ioremap_resource(dev, res_0);
> > + if (IS_ERR(smmu_pmu->reg_base))
> > + return PTR_ERR(smmu_pmu->reg_base);
> > +
> > + cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
> > +
> > + /* Determine if page 1 is present */
> > + if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
> > + res_1 = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> > + smmu_pmu->reloc_base = devm_ioremap_resource(dev,
> res_1);
> > + if (IS_ERR(smmu_pmu->reloc_base))
> > + return PTR_ERR(smmu_pmu->reloc_base);
> > + } else {
> > + smmu_pmu->reloc_base = smmu_pmu->reg_base;
> > + }
> > +
> > + irq = platform_get_irq(pdev, 0);
> > + if (irq > 0)
> > + smmu_pmu->irq = irq;
> > +
> > + ceid_64[0] = readq_relaxed(smmu_pmu->reg_base +
> SMMU_PMCG_CEID0);
> > + ceid_64[1] = readq_relaxed(smmu_pmu->reg_base +
> SMMU_PMCG_CEID1);
> > + bitmap_from_arr32(smmu_pmu->supported_events, (u32 *)ceid_64,
> > + SMMU_ARCH_MAX_EVENT_ID);
> > +
> > + smmu_pmu->num_counters =
> FIELD_GET(SMMU_PMCG_CFGR_NCTR_MASK, cfgr) + 1;
> > + smmu_pmu->counter_present_mask = GENMASK(smmu_pmu-
> >num_counters - 1, 0);
> > +
> > + reg_size = FIELD_GET(SMMU_PMCG_CFGR_SIZE_MASK, cfgr);
> > + smmu_pmu->counter_mask = GENMASK_ULL(reg_size, 0);
> > +
> > + smmu_pmu_reset(smmu_pmu);
> > +
> > + err = smmu_pmu_setup_irq(smmu_pmu);
> > + if (err) {
> > + dev_err(dev, "Setup irq failed, PMU @%pa\n", &res_0->start);
>
> You can probably remove "PMU @%pa" from error and info messages, since
> the device name already uniquely identifies it:
> "[ 6.168200] arm-smmu-v3-pmu 2b442000.smmu-pmcg: Registered SMMU
> PMU
> @ 0x000000002b442000 using 4 counters"

Interesting. What I have is,

[ 25.669636] arm-smmu-v3-pmu arm-smmu-v3-pmu.6.auto: Registered SMMU
PMU @ 0x0000000148001000 using 8 counters

Are you using the same patches and is booting using ACPI? IIRC, in the iort
code it uses the name "arm-smmu-v3-pmu" and AUTO id to register/add the platform
dev. So not sure, how it is printing the address in your case.

Please check and let me know.

Thanks,
Shameer

> Thanks,
> Jean