Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses

From: Peter Zijlstra
Date: Wed Sep 08 2021 - 05:28:19 EST


On Wed, Sep 08, 2021 at 05:17:53PM +1000, Michael Ellerman wrote:
> Kajol Jain <kjain@xxxxxxxxxxxxx> writes:

> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index f92880a15645..030b3e990ac3 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -1265,7 +1265,9 @@ union perf_mem_data_src {
> > #define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
> > #define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
> > #define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
> > -/* 5-0xa available */
> > +#define PERF_MEM_LVLNUM_OC_L2 0x05 /* On Chip L2 */
> > +#define PERF_MEM_LVLNUM_OC_L3 0x06 /* On Chip L3 */
>
> The obvious use for 5 is for "L5" and so on.
>
> I'm not sure adding new levels is the best idea, because these don't fit
> neatly into the hierarchy, they are off to the side.
>
>
> I wonder if we should use the remote field.
>
> ie. for another core's L2 we set:
>
> mem_lvl = PERF_MEM_LVL_L2
> mem_remote = 1

This mixes APIs (see below), IIUC the correct usage would be something
like: lvl_num=L2 remote=1

> Which would mean "remote L2", but not remote enough to be
> lvl = PERF_MEM_LVL_REM_CCE1.
>
> It would be printed by the existing tools/perf code as "Remote L2", vs
> "Remote cache (1 hop)", which seems OK.
>
>
> ie. we'd be able to express:
>
> Current core's L2: LVL_L2
> Other core's L2: LVL_L2 | REMOTE
> Other chip's L2: LVL_REM_CCE1 | REMOTE
>
> And similarly for L3.
>
> I think that makes sense? Unless people think remote should be reserved
> to mean on another chip, though we already have REM_CCE1 for that.

IIRC the PERF_MEM_LVL_* namespace is somewhat depricated in favour of
the newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Of
course, ABIs being what they are, we get to support both :/ But I'm not
sure mixing them is a great idea.

Also, clearly this could use a comment...

The 'new' composite doesnt have a hops field because the hardware that
nessecitated that change doesn't report it, but we could easily add a
field there.

Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
corresponding PERF_MEM_HOPS_{NA, 0..6}

Then I suppose you can encode things like:

L2 - local L2
L2 | REMOTE - remote L2 at an unspecified distance (NA)
L2 | REMOTE | HOPS_0 - remote L2 on the same node
L2 | REMOTE | HOPS_1 - remote L2 on a node 1 removed

Would that work?