Re: [PATCH] perf/x86/intel/lbr: fix branch type encoding
From: Stephane Eranian
Date: Thu Aug 11 2022 - 11:37:03 EST
On Thu, Aug 11, 2022 at 6:28 PM Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>
> On Thu, Aug 11, 2022 at 5:42 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > On 2022-08-11 10:17 a.m., Stephane Eranian wrote:
> > > On Thu, Aug 11, 2022 at 3:23 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> > >>
> > >>
> > >>
> > >> On 2022-08-10 5:06 p.m., Stephane Eranian wrote:
> > >>> With architected LBR, the procesosr can record the type of each sampled taken
> > >>> branch. The type is encoded in 4-bit field in the LBR_INFO MSR of each entry.
> > >>>
> > >>> The branch type must then extracted and saved in the perf_branch_entry in the
> > >>> perf_events sampling buffer. With the current code, the raw Intel encoding of
> > >>> the branch is exported to user tools.
> > >>
> > >> In the intel_pmu_lbr_filter(), the raw encoding will be converted into
> > >> the X86_BR_* format via arch_lbr_br_type_map[]. Then the
> > >> common_branch_type() will convert the X86_BR_* format to the generic
> > >> PERF_BR_* type and expose to user tools.
> > >>
> > >> I double check the existing arch_lbr_br_type_map[] and branch_map[].
> > >> They should generate the same PERF_BR_* type as your arch_lbr_type_map[].
> > >>
> > >> Is there a test case which I can use to reproduce the problem?
> > >>
> > > I was doing a simple:
> > > $ perf record -b -e cpu/event=0xc4/ ....
> > > $ perf report -D
> > > Looking at the LBR information and the BR type, many entries has no branch type.
> > > What I see is a function where you do: e->type = get_lbr_br_type() and
> > > that is what
> > > is then saved in the buffer. Unless I am missing a later patch.
> > >
> >
> > To get the LBR type, the save_type filter option must be applied. See
> > 60f83fa6341d ("perf record: Create a new option save_type in
> > --branch-filter").
> >
> That seems overly complicated. I don't recall having to pass a new option
> to get the LBR latency. It showed up automatically. So why for branch_type?
>
> > The -b only include the ANY option. Maybe we should extend the -b option
> > to ANY|SAVE_TYPE.
> >
> Ok, that explains it then. I think we need to simplify.
>
In fact, I don't see a case where you would not benefit from the branch type.
Furthermore, not having the branch type DOES NOT save any space in the
branch record (given we have a reserved field). So I think I prefer not having
to specify yet another cmdline option to get the branch type. In fact, if you do
not pass the option, then perf report -D reports some bogus branch types, i.e.,
not all entries have empty types.
> > >
> > >> Thanks,
> > >> Kan
> > >>
> > >>> Yet tools, such as perf, expected the
> > >>> branch type to be encoded using perf_events branch type enum
> > >>> (see tools/perf/util/branch.c). As a result of the discrepancy, the output of
> > >>> perf report -D shows bogus branch types.
> > >>>
> > >>> Fix the problem by converting the Intel raw encoding into the perf_events
> > >>> branch type enum values. With that in place and with no changes to the tools,
> > >>> the branch types are now reported properly.
> > >>>
> > >>> Signed-off-by: Stephane Eranian <eranian@xxxxxxxxxx>
> > >>> ---
> > >>> arch/x86/events/intel/lbr.c | 35 ++++++++++++++++++++++++++++++++---
> > >>> 1 file changed, 32 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> > >>> index 4f70fb6c2c1e..ef63d4d46b50 100644
> > >>> --- a/arch/x86/events/intel/lbr.c
> > >>> +++ b/arch/x86/events/intel/lbr.c
> > >>> @@ -894,9 +894,23 @@ static DEFINE_STATIC_KEY_FALSE(x86_lbr_mispred);
> > >>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_cycles);
> > >>> static DEFINE_STATIC_KEY_FALSE(x86_lbr_type);
> > >>>
> > >>> -static __always_inline int get_lbr_br_type(u64 info)
> > >>> +/*
> > >>> + * Array index encodes IA32_LBR_x_INFO Branch Type Encodings
> > >>> + * as per Intel SDM Vol3b Branch Types section
> > >>> + */
> > >>> +static const int arch_lbr_type_map[]={
> > >>> + [0] = PERF_BR_COND,
> > >>> + [1] = PERF_BR_IND,
> > >>> + [2] = PERF_BR_UNCOND,
> > >>> + [3] = PERF_BR_IND_CALL,
> > >>> + [4] = PERF_BR_CALL,
> > >>> + [5] = PERF_BR_RET,
> > >>> +};
> > >>> +#define ARCH_LBR_TYPE_COUNT ARRAY_SIZE(arch_lbr_type_map)
> > >>> +
> > >>> +static __always_inline u16 get_lbr_br_type(u64 info)
> > >>> {
> > >>> - int type = 0;
> > >>> + u16 type = 0;
> > >>>
> > >>> if (static_branch_likely(&x86_lbr_type))
> > >>> type = (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET;
> > >>> @@ -904,6 +918,21 @@ static __always_inline int get_lbr_br_type(u64 info)
> > >>> return type;
> > >>> }
> > >>>
> > >>> +/*
> > >>> + * The kernel cannot expose raw Intel branch type encodings because they are
> > >>> + * not generic. Instead, the function below maps the encoding to the
> > >>> + * perf_events user visible branch types.
> > >>> + */
> > >>> +static __always_inline int get_lbr_br_type_mapping(u64 info)
> > >>> +{
> > >>> + if (static_branch_likely(&x86_lbr_type)) {
> > >>> + u16 raw_type = get_lbr_br_type(info);
> > >>> + if (raw_type < ARCH_LBR_TYPE_COUNT)
> > >>> + return arch_lbr_type_map[raw_type];
> > >>> + }
> > >>> + return PERF_BR_UNKNOWN;
> > >>> +}
> > >>> +
> > >>> static __always_inline bool get_lbr_mispred(u64 info)
> > >>> {
> > >>> bool mispred = 0;
> > >>> @@ -957,7 +986,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
> > >>> e->in_tx = !!(info & LBR_INFO_IN_TX);
> > >>> e->abort = !!(info & LBR_INFO_ABORT);
> > >>> e->cycles = get_lbr_cycles(info);
> > >>> - e->type = get_lbr_br_type(info);
> > >>> + e->type = get_lbr_br_type_mapping(info);
> > >>> }
> > >>>
> > >>> cpuc->lbr_stack.nr = i;