Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses
From: Michael Ellerman
Date: Thu Sep 09 2021 - 08:59:34 EST
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> On Wed, Sep 08, 2021 at 05:17:53PM +1000, Michael Ellerman wrote:
>> Kajol Jain <kjain@xxxxxxxxxxxxx> writes:
>
>> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> > index f92880a15645..030b3e990ac3 100644
>> > --- a/include/uapi/linux/perf_event.h
>> > +++ b/include/uapi/linux/perf_event.h
>> > @@ -1265,7 +1265,9 @@ union perf_mem_data_src {
>> > #define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
>> > #define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
>> > #define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
>> > -/* 5-0xa available */
>> > +#define PERF_MEM_LVLNUM_OC_L2 0x05 /* On Chip L2 */
>> > +#define PERF_MEM_LVLNUM_OC_L3 0x06 /* On Chip L3 */
>>
>> The obvious use for 5 is for "L5" and so on.
>>
>> I'm not sure adding new levels is the best idea, because these don't fit
>> neatly into the hierarchy, they are off to the side.
>>
>>
>> I wonder if we should use the remote field.
>>
>> ie. for another core's L2 we set:
>>
>> mem_lvl = PERF_MEM_LVL_L2
>> mem_remote = 1
>
> This mixes APIs (see below), IIUC the correct usage would be something
> like: lvl_num=L2 remote=1
Aha, I was wondering how lvl and lvl_num were supposed to interact.
>> Which would mean "remote L2", but not remote enough to be
>> lvl = PERF_MEM_LVL_REM_CCE1.
>>
>> It would be printed by the existing tools/perf code as "Remote L2", vs
>> "Remote cache (1 hop)", which seems OK.
>>
>>
>> ie. we'd be able to express:
>>
>> Current core's L2: LVL_L2
>> Other core's L2: LVL_L2 | REMOTE
>> Other chip's L2: LVL_REM_CCE1 | REMOTE
>>
>> And similarly for L3.
>>
>> I think that makes sense? Unless people think remote should be reserved
>> to mean on another chip, though we already have REM_CCE1 for that.
>
> IIRC the PERF_MEM_LVL_* namespace is somewhat depricated in favour of
> the newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Of
> course, ABIs being what they are, we get to support both :/ But I'm not
> sure mixing them is a great idea.
OK.
> Also, clearly this could use a comment...
>
> The 'new' composite doesnt have a hops field because the hardware that
> nessecitated that change doesn't report it, but we could easily add a
> field there.
>
> Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
> corresponding PERF_MEM_HOPS_{NA, 0..6}
It's really 7 if we use remote && hop = 0 to mean the first hop.
If we're wanting to use some of the hop levels to represent
intra-chip/package hops then we could possibly use them all on a really
big system.
eg. you could imagine something like:
L2 | - local L2
L2 | REMOTE | HOPS_0 - L2 of neighbour core
L2 | REMOTE | HOPS_1 - L2 of near core on same chip (same 1/2 of chip)
L2 | REMOTE | HOPS_2 - L2 of far core on same chip (other 1/2 of chip)
L2 | REMOTE | HOPS_3 - L2 of sibling chip in same package
L2 | REMOTE | HOPS_4 - L2 on separate package 1 hop away
L2 | REMOTE | HOPS_5 - L2 on separate package 2 hops away
L2 | REMOTE | HOPS_6 - L2 on separate package 3 hops away
Whether it's useful to represent all those levels I'm not sure, but it's
probably good if we have the ability.
I guess I'm 50/50 on whether that's enough levels, or whether we want
another bit to allow for future growth.
> Then I suppose you can encode things like:
>
> L2 - local L2
> L2 | REMOTE - remote L2 at an unspecified distance (NA)
> L2 | REMOTE | HOPS_0 - remote L2 on the same node
> L2 | REMOTE | HOPS_1 - remote L2 on a node 1 removed
>
> Would that work?
Yeah that looks good to me.
cheers