Re: [PATCH v19 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)

From: Rob Herring
Date: Mon Feb 03 2025 - 12:59:27 EST


On Mon, Feb 3, 2025 at 10:53 AM James Clark <james.clark@xxxxxxxxxx> wrote:
>
>
>
> On 03/02/2025 12:43 am, Rob Herring (Arm) wrote:
> > From: Anshuman Khandual <anshuman.khandual@xxxxxxx>
> >
> > The ARMv9.2 architecture introduces the optional Branch Record Buffer
> > Extension (BRBE), which records information about branches as they are
> > executed into set of branch record registers. BRBE is similar to x86's
> > Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer
> > (BHRB).
> >
> > BRBE supports filtering by exception level and can filter just the
> > source or target address if excluded to avoid leaking privileged
> > addresses. The h/w filter would be sufficient except when there are
> > multiple events with disjoint filtering requirements. In this case, BRBE
> > is configured with a union of all the events' desired branches, and then
> > the recorded branches are filtered based on each event's filter. For
> > example, with one event capturing kernel events and another event
> > capturing user events, BRBE will be configured to capture both kernel
> > and user branches. When handling event overflow, the branch records have
> > to be filtered by software to only include kernel or user branch
> > addresses for that event. In contrast, x86 simply configures LBR using
> > the last installed event which seems broken.
> >
> > The event and branch exception level filtering are separately
> > controlled. On x86, it is possible to request filtering which is
> > disjoint (e.g. kernel only event with user only branches). It is also
> > possible on x86 to configure branch filter such that no branches are
> > ever recorded (e.g. -j save_type). For BRBE, events with mismatched
> > exception level filtering or a configuration that will result in no
> > samples are rejected. This can be relaxed in the future if such a need
> > arises.
> >
> > The handling of KVM guests is similar to the above. On x86, branch
> > recording is always disabled when a guest is running. However,
> > requesting branch recording in guests is allowed. The guest events are
> > recorded, but the resulting branches are all from the host. For BRBE,
> > branch recording is similarly disabled when guest is running. In
> > addition, events with branch recording and "exclude_host" set are
> > rejected. Requiring "exclude_guest" to be set did not work. The default
> > for the perf tool does set "exclude_guest" if no exception level
> > options are specified. However, specifying kernel or user defaults to
> > including both host and guest. In this case, only host branches are
> > recorded.
> >
> > BRBE can support some additional exception, FIQ, and debug branch
> > types, but they are not supported currently. There's no control in the
> > perf ABI to enable/disable these branch types, so they could only be
> > enabled for the 'any' filter which might be undesired or unexpected.
> > The other architectures don't have any support similar events (at least
> > with perf). These can be added in the future if there is demand by
> > adding additional specific filter types.
> >
> > BRBE records are invalidated whenever events are reconfigured, a new
> > task is scheduled in, or after recording is paused (and the records
> > have been recorded for the event). The architecture allows branch
> > records to be invalidated by the PE under implementation defined
> > conditions. It is expected that these conditions are rare.
> >
> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
> > Co-developed-by: Mark Rutland <mark.rutland@xxxxxxx>
> > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> > Co-developed-by: Rob Herring (Arm) <robh@xxxxxxxxxx>
> > Signed-off-by: Rob Herring (Arm) <robh@xxxxxxxxxx>
> > ---
> > v19:
> > - Drop saving of branch records when task scheduled out. (Mark)
> > - Got rid of added armpmu ops. All BRBE support contained within pmuv3
> > code.
> > - Dropped armpmu.num_branch_records as reg_brbidr has same info.
> > - Make sched_task() callback actually get called. Enabling requires a
> > call to perf_sched_cb_inc().
> > - Fix freeze on overflow for VHE
> > - The cycle counter doesn't freeze BRBE on overflow, so avoid assigning
> > it when BRBE is enabled.
> > - Drop all the Arm specific exception branches. Not a clear need for
> > them.
> > - Simplify enable/disable to avoid RMW and document ISBs needed
> > - Fix handling of branch 'cycles' reading. CC field is
> > mantissa/exponent, not an integer.
> > - Save BRBFCR and BRBCR settings in event->hw.branch_reg.config and
> > event->hw.extra_reg.config to avoid recalculating the register value
> > each time the event is installed.
> > - Rework s/w filtering to better match h/w filtering
> > - Reject events with disjoint event filter and branch filter
> > - Reject events if exclude_host is set
> >
> > v18: https://lore.kernel.org/all/20240613061731.3109448-6-anshuman.khandual@xxxxxxx/
> > ---
> > drivers/perf/Kconfig | 11 +
> > drivers/perf/Makefile | 1 +
> > drivers/perf/arm_brbe.c | 794 +++++++++++++++++++++++++++++++++++++++++++
> > drivers/perf/arm_brbe.h | 47 +++
> > drivers/perf/arm_pmu.c | 15 +-
> > drivers/perf/arm_pmuv3.c | 87 ++++-
> > include/linux/perf/arm_pmu.h | 8 +
> > 7 files changed, 958 insertions(+), 5 deletions(-)
> >
> [...]
> > +bool brbe_branch_attr_valid(struct perf_event *event)
> > +{
> > + u64 branch_type = event->attr.branch_sample_type;
> > +
> > + /*
> > + * Ensure both perf branch filter allowed and exclude
> > + * masks are always in sync with the generic perf ABI.
> > + */
> > + BUILD_BUG_ON(BRBE_PERF_BRANCH_FILTERS != (PERF_SAMPLE_BRANCH_MAX - 1));
> > +
> > + if (branch_type & BRBE_EXCLUDE_BRANCH_FILTERS) {
> > + pr_debug_once("requested branch filter not supported 0x%llx\n", branch_type);
> > + return false;
> > + }
> > +
> > + /* Ensure at least 1 branch type is enabled */
> > + if (!(branch_type & BRBE_ALLOWED_BRANCH_TYPES)) {
> > + pr_debug_once("no branch type enabled 0x%llx\n", branch_type);
> > + return false;
> > + }
> > +
> > + /*
> > + * No branches are recorded in guests nor nVHE hypervisors, so
> > + * excluding the host or both kernel and user is invalid.
> > + *
> > + * Ideally we'd just require exclude_guest and exclude_hv, but setting
> > + * event filters with perf for kernel or user don't set exclude_guest.
> > + * So effectively, exclude_guest and exclude_hv are ignored.
> > + */
> > + if (event->attr.exclude_host || (event->attr.exclude_user && event->attr.exclude_kernel))
> > + return false;
>
> Is there a reason to do the pr_debugs for the two cases above, but not
> for the remaining ones? Seems like it should be all or nothing.

Shrug. Anshuman wrote the pr_debugs. I wrote this part. Honestly, I
don't know why you'd want them only once if they are gated off by
debug. I guess since other cases of rejecting events outside this
function have pr_debug() we should do the same here.

> > +
> > + /*
> > + * Require that the event filter and branch filter permissions match.
> > + *
> > + * The event and branch permissions can only mismatch if the user set
> > + * at least one of the privilege branch filters in PERF_SAMPLE_BRANCH_PLM_ALL.
> > + * Otherwise, the core will set the branch sample permissions in
> > + * perf_copy_attr().
> > + */
> > + if ((event->attr.exclude_user != !(branch_type & PERF_SAMPLE_BRANCH_USER)) ||
> > + (event->attr.exclude_kernel != !(branch_type & PERF_SAMPLE_BRANCH_KERNEL)) ||
>
> I don't think this one is right. By default perf_copy_attr() copies the
> exclude_ settings into the branch settings so this works, but if the
> user sets any _less_ permissive branch setting this fails. For example:
>
> # perf record -j any,u -- true
> Error:
> cycles:PH: PMU Hardware or event type doesn't support branch stack
> sampling.
>
> Here we want the default sampling permissions (exclude_kernel == 0,
> exclude_user == 0), but only user branch records, which doesn't match.
> It should be allowed because it doesn't include anything that we're not
> allowed to see.

I know it is allowed(on x86), but why would we want that? If you do
something even more restricted:

perf record -e cycles:k -j any,u -- true

That's allowed on x86 and gives you samples with user addresses. But
all the events happened in the kernel. How does that make any sense?

I suppose in your example, we could avoid attaching branch stack on
samples from the kernel. However, given how my example works, I'm
pretty sure that's not what x86 does.

There's also combinations that are allowed, but record no samples.
Though I think that was with guest events. I've gone with reject
non-sense combinations as much as possible. We can easily remove those
restrictions later if needed. Changing the behavior later (for the
same configuration) wouldn't be good.


> This also makes the Perf branch test skip because it uses
> any,save_type,u to see if BRBE exists.

Yes, I plan to update that if we keep this behavior.

> > + (!is_kernel_in_hyp_mode() &&
> > + (event->attr.exclude_hv != !(branch_type & PERF_SAMPLE_BRANCH_HV))))
> > + return false;
> > +
> > + event->hw.branch_reg.config = branch_type_to_brbfcr(event->attr.branch_sample_type);
> > + event->hw.extra_reg.config = branch_type_to_brbcr(event->attr.branch_sample_type);
> > +
> > + return true;
> > +}
> > +
> [...]
> > +static const int brbe_type_to_perf_type_map[BRBINFx_EL1_TYPE_DEBUG_EXIT + 1][2] = {
> > + [BRBINFx_EL1_TYPE_DIRECT_UNCOND] = { PERF_BR_UNCOND, 0 },
>
> Does the second field go into 'new_type'? They all seem to be zero so
> I'm not sure why new_type isn't ignored instead of having it mapped.

Well, left over from when all the Arm specific types were supported.
So yeah, that can be simplified.

> > + [BRBINFx_EL1_TYPE_INDIRECT] = { PERF_BR_IND, 0 },
> > + [BRBINFx_EL1_TYPE_DIRECT_LINK] = { PERF_BR_CALL, 0 },
> > + [BRBINFx_EL1_TYPE_INDIRECT_LINK] = { PERF_BR_IND_CALL, 0 },
> > + [BRBINFx_EL1_TYPE_RET] = { PERF_BR_RET, 0 },
> > + [BRBINFx_EL1_TYPE_DIRECT_COND] = { PERF_BR_COND, 0 },
> > + [BRBINFx_EL1_TYPE_CALL] = { PERF_BR_CALL, 0 },
> > + [BRBINFx_EL1_TYPE_ERET] = { PERF_BR_ERET, 0 },
> > + [BRBINFx_EL1_TYPE_IRQ] = { PERF_BR_IRQ, 0 },
>
> How do ones that don't map to anything appear in Perf? For example
> BRBINFx_EL1_TYPE_TRAP is missing, and the test that was attached to the
> previous versions fails because it doesn't see the trap that jumps to
> the kernel, but it does still see the ERET back to userspace:
>
> [unknown]/trap_bench+0x20/-/-/-/0/ERET/-
>
> In older versions we'd also have BRBINFx_EL1_TYPE_TRAP mapping to
> PERF_BR_SYSCALL so you could see it go into the kernel before the return:
>
> trap_bench+0x1C/[unknown]/-/-/-/0/SYSCALL/-
> [unknown]/trap_bench+0x20/-/-/-/0/ERET/-

My read of that was we should see a CALL in this case. Whether SVC
generates a TRAP or CALL depends on HFGITR_EL2.SVC_EL0 (table D18-2).
I assumed "SVC due to HFGITR_EL2.SVC_EL0" means when SVC_EL0 is set
(and set has additional conditions). We have SVC_EL0 cleared, so that
should be a CALL. Maybe the FVP has this wrong?

> > +};
> > +
> > +static void brbe_set_perf_entry_type(struct perf_branch_entry *entry, u64 brbinf)
> > +{
> > + int brbe_type = brbinf_get_type(brbinf);
> > +
> > + if (brbe_type <= BRBINFx_EL1_TYPE_DEBUG_EXIT) {
> > + const int *br_type = brbe_type_to_perf_type_map[brbe_type];
> > +
> > + entry->type = br_type[0];
> > + entry->new_type = br_type[1];
> > + }
> > +}
> > +
>
> [...]
>
> > + if (branch_sample & PERF_SAMPLE_BRANCH_ANY_RETURN) {
> > + set_bit(PERF_BR_RET, event_type_mask);
> > +
> > + if (!event->attr.exclude_kernel)
> > + set_bit(PERF_BR_ERET, event_type_mask);
>
> You could argue that ERET should be included even if exclude_kernel is
> set, otherwise you miss the point that you returned to in userspace and
> leave a gap in the program flow. See the trap and eret example above.
>
> It looks like we still have the zeroing of the kernel address in this
> version if we only have userspace priviledge, so it should be fine to
> show the ERET and the target address.

Yes, agreed.

Rob