Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.

From: Will Deacon
Date: Fri Jun 28 2019 - 05:08:54 EST


Hi again, Ganapat,

Thanks for the quick reply.

On Fri, Jun 28, 2019 at 11:09:33AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Jun 27, 2019 at 3:27 PM Will Deacon <will@xxxxxxxxxx> wrote:
> > On Fri, Jun 14, 2019 at 05:42:46PM +0000, Ganapatrao Kulkarni wrote:
> > > CCPI2 is a low-latency high-bandwidth serial interface for connecting
> > > ThunderX2 processors. This patch adds support to capture CCPI2 perf events.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni <gkulkarni@xxxxxxxxxxx>
> > > ---
> > > drivers/perf/thunderx2_pmu.c | 179 ++++++++++++++++++++++++++++++-----
> > > 1 file changed, 157 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
> > > index 43d76c85da56..3791ac9b897d 100644
> > > --- a/drivers/perf/thunderx2_pmu.c
> > > +++ b/drivers/perf/thunderx2_pmu.c
> > > @@ -16,7 +16,9 @@
> > > * they need to be sampled before overflow(i.e, at every 2 seconds).
> > > */
> > >
> > > -#define TX2_PMU_MAX_COUNTERS 4
> > > +#define TX2_PMU_DMC_L3C_MAX_COUNTERS 4
> >
> > I find it odd that you rename this...
>
> i am not sure, how to avoid this since dmc/l3c have 4 counters and ccpi2 has 8.
> i will try to make this better in v2.
> >
> > > +#define TX2_PMU_CCPI2_MAX_COUNTERS 8
> > > +
> > > #define TX2_PMU_DMC_CHANNELS 8
> > > #define TX2_PMU_L3_TILES 16
> > >
> > > @@ -28,11 +30,22 @@
> > > */
> > > #define DMC_EVENT_CFG(idx, val) ((val) << (((idx) * 8) + 1))
> > >
> > > +#define GET_EVENTID_CCPI2(ev) ((ev->hw.config) & 0x1ff)
> > > +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */
> > > +#define GET_COUNTERID_CCPI2(ev) ((ev->hw.idx) & 0x7)
> > > +#define CCPI2_COUNTER_OFFSET 8
> >
> >
> > ... but leave GET_EVENTID alone, even though it only applies to DMC/L3C
> > events. Saying that, if you have the event you can figure out its type,
> > so could you avoid the need for additional macros entirely and just use
> > the correct masks based on the corresponding PMU type? That might also
> > avoid some of the conditional control flow you're introducing elsewhere.
>
> sure, i will add mask as argument to macro.
> >
> > > #define L3C_COUNTER_CTL 0xA8
> > > #define L3C_COUNTER_DATA 0xAC
> > > #define DMC_COUNTER_CTL 0x234
> > > #define DMC_COUNTER_DATA 0x240
> > >
> > > +#define CCPI2_PERF_CTL 0x108
> > > +#define CCPI2_COUNTER_CTL 0x10C
> > > +#define CCPI2_COUNTER_SEL 0x12c
> > > +#define CCPI2_COUNTER_DATA_L 0x130
> > > +#define CCPI2_COUNTER_DATA_H 0x134
> > > +
> > > /* L3C event IDs */
> > > #define L3_EVENT_READ_REQ 0xD
> > > #define L3_EVENT_WRITEBACK_REQ 0xE
> > > @@ -51,9 +64,16 @@
> > > #define DMC_EVENT_READ_TXNS 0xF
> > > #define DMC_EVENT_MAX 0x10
> > >
> > > +#define CCPI2_EVENT_REQ_PKT_SENT 0x3D
> > > +#define CCPI2_EVENT_SNOOP_PKT_SENT 0x65
> > > +#define CCPI2_EVENT_DATA_PKT_SENT 0x105
> > > +#define CCPI2_EVENT_GIC_PKT_SENT 0x12D
> > > +#define CCPI2_EVENT_MAX 0x200
> > > +
> > > enum tx2_uncore_type {
> > > PMU_TYPE_L3C,
> > > PMU_TYPE_DMC,
> > > + PMU_TYPE_CCPI2,
> > > PMU_TYPE_INVALID,
> > > };
> > >
> > > @@ -73,8 +93,8 @@ struct tx2_uncore_pmu {
> > > u32 max_events;
> > > u64 hrtimer_interval;
> > > void __iomem *base;
> > > - DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
> > > - struct perf_event *events[TX2_PMU_MAX_COUNTERS];
> > > + DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS);
> > > + struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS];
> >
> > Hmm, that looks very odd. Why are they now different sizes?
>
> events[ ] is used to hold reference to active events to use in timer
> callback, which is not applicable to ccpi2, hence 4.
> active_counters is set to max of both. i.e, 8. i will try to make it
> better readable in v2.

Thanks. I suspect renaming the field would help a lot, or perhaps reworking
your data structures so that you have a union of ccpi2 and dmc/l2c
structures where necessary.

> > > struct device *dev;
> > > struct hrtimer hrtimer;
> > > const struct attribute_group **attr_groups;
> > > @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
> > > return container_of(pmu, struct tx2_uncore_pmu, pmu);
> > > }
> > >
> > > -PMU_FORMAT_ATTR(event, "config:0-4");
> > > +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format) \
> > > +static ssize_t \
> > > +__tx2_pmu_##_var##_show(struct device *dev, \
> > > + struct device_attribute *attr, \
> > > + char *page) \
> > > +{ \
> > > + BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \
> > > + return sprintf(page, _format "\n"); \
> > > +} \
> > > + \
> > > +static struct device_attribute format_attr_##_var = \
> > > + __ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL)
> > > +
> > > +TX2_PMU_FORMAT_ATTR(event, event, "config:0-4");
> > > +TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9");
> >
> > This doesn't look right. Can perf deal with overlapping fields? It seems
> > wrong that we'd allow the user to specify both, for example.
>
> I am not sure what is the issue here? both are different PMUs
> root@SBR-26> cat /sys/bus/event_source/devices/uncore_dmc_0/format/event
> config:0-4
> root@SBR-26> cat /sys/bus/event_source/devices/uncore_ccpi2_0/format/event
> config:0-9

Ah, sorry about that. I got _var and _name the wrong way around and thought
you were introducing a file called event_ccpi2! What you have looks fine.

Will