On Mon, Feb 3, 2025 at 10:53 AM James Clark <james.clark@xxxxxxxxxx> wrote:
On 03/02/2025 12:43 am, Rob Herring (Arm) wrote:
From: Anshuman Khandual <anshuman.khandual@xxxxxxx>[...]
The ARMv9.2 architecture introduces the optional Branch Record Buffer
Extension (BRBE), which records information about branches as they are
executed into set of branch record registers. BRBE is similar to x86's
Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer
(BHRB).
BRBE supports filtering by exception level and can filter just the
source or target address if excluded to avoid leaking privileged
addresses. The h/w filter would be sufficient except when there are
multiple events with disjoint filtering requirements. In this case, BRBE
is configured with a union of all the events' desired branches, and then
the recorded branches are filtered based on each event's filter. For
example, with one event capturing kernel events and another event
capturing user events, BRBE will be configured to capture both kernel
and user branches. When handling event overflow, the branch records have
to be filtered by software to only include kernel or user branch
addresses for that event. In contrast, x86 simply configures LBR using
the last installed event which seems broken.
The event and branch exception level filtering are separately
controlled. On x86, it is possible to request filtering which is
disjoint (e.g. kernel only event with user only branches). It is also
possible on x86 to configure branch filter such that no branches are
ever recorded (e.g. -j save_type). For BRBE, events with mismatched
exception level filtering or a configuration that will result in no
samples are rejected. This can be relaxed in the future if such a need
arises.
The handling of KVM guests is similar to the above. On x86, branch
recording is always disabled when a guest is running. However,
requesting branch recording in guests is allowed. The guest events are
recorded, but the resulting branches are all from the host. For BRBE,
branch recording is similarly disabled when guest is running. In
addition, events with branch recording and "exclude_host" set are
rejected. Requiring "exclude_guest" to be set did not work. The default
for the perf tool does set "exclude_guest" if no exception level
options are specified. However, specifying kernel or user defaults to
including both host and guest. In this case, only host branches are
recorded.
BRBE can support some additional exception, FIQ, and debug branch
types, but they are not supported currently. There's no control in the
perf ABI to enable/disable these branch types, so they could only be
enabled for the 'any' filter which might be undesired or unexpected.
The other architectures don't have any support similar events (at least
with perf). These can be added in the future if there is demand by
adding additional specific filter types.
BRBE records are invalidated whenever events are reconfigured, a new
task is scheduled in, or after recording is paused (and the records
have been recorded for the event). The architecture allows branch
records to be invalidated by the PE under implementation defined
conditions. It is expected that these conditions are rare.
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
Co-developed-by: Mark Rutland <mark.rutland@xxxxxxx>
Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
Co-developed-by: Rob Herring (Arm) <robh@xxxxxxxxxx>
Signed-off-by: Rob Herring (Arm) <robh@xxxxxxxxxx>
---
v19:
- Drop saving of branch records when task scheduled out. (Mark)
- Got rid of added armpmu ops. All BRBE support contained within pmuv3
code.
- Dropped armpmu.num_branch_records as reg_brbidr has same info.
- Make sched_task() callback actually get called. Enabling requires a
call to perf_sched_cb_inc().
- Fix freeze on overflow for VHE
- The cycle counter doesn't freeze BRBE on overflow, so avoid assigning
it when BRBE is enabled.
- Drop all the Arm specific exception branches. Not a clear need for
them.
- Simplify enable/disable to avoid RMW and document ISBs needed
- Fix handling of branch 'cycles' reading. CC field is
mantissa/exponent, not an integer.
- Save BRBFCR and BRBCR settings in event->hw.branch_reg.config and
event->hw.extra_reg.config to avoid recalculating the register value
each time the event is installed.
- Rework s/w filtering to better match h/w filtering
- Reject events with disjoint event filter and branch filter
- Reject events if exclude_host is set
v18: https://lore.kernel.org/all/20240613061731.3109448-6-anshuman.khandual@xxxxxxx/
---
drivers/perf/Kconfig | 11 +
drivers/perf/Makefile | 1 +
drivers/perf/arm_brbe.c | 794 +++++++++++++++++++++++++++++++++++++++++++
drivers/perf/arm_brbe.h | 47 +++
drivers/perf/arm_pmu.c | 15 +-
drivers/perf/arm_pmuv3.c | 87 ++++-
include/linux/perf/arm_pmu.h | 8 +
7 files changed, 958 insertions(+), 5 deletions(-)
+bool brbe_branch_attr_valid(struct perf_event *event)
+{
+ u64 branch_type = event->attr.branch_sample_type;
+
+ /*
+ * Ensure both perf branch filter allowed and exclude
+ * masks are always in sync with the generic perf ABI.
+ */
+ BUILD_BUG_ON(BRBE_PERF_BRANCH_FILTERS != (PERF_SAMPLE_BRANCH_MAX - 1));
+
+ if (branch_type & BRBE_EXCLUDE_BRANCH_FILTERS) {
+ pr_debug_once("requested branch filter not supported 0x%llx\n", branch_type);
+ return false;
+ }
+
+ /* Ensure at least 1 branch type is enabled */
+ if (!(branch_type & BRBE_ALLOWED_BRANCH_TYPES)) {
+ pr_debug_once("no branch type enabled 0x%llx\n", branch_type);
+ return false;
+ }
+
+ /*
+ * No branches are recorded in guests nor nVHE hypervisors, so
+ * excluding the host or both kernel and user is invalid.
+ *
+ * Ideally we'd just require exclude_guest and exclude_hv, but setting
+ * event filters with perf for kernel or user don't set exclude_guest.
+ * So effectively, exclude_guest and exclude_hv are ignored.
+ */
+ if (event->attr.exclude_host || (event->attr.exclude_user && event->attr.exclude_kernel))
+ return false;
Is there a reason to do the pr_debugs for the two cases above, but not
for the remaining ones? Seems like it should be all or nothing.
Shrug. Anshuman wrote the pr_debugs. I wrote this part. Honestly, I
don't know why you'd want them only once if they are gated off by
debug. I guess since other cases of rejecting events outside this
function have pr_debug() we should do the same here.
+
+ /*
+ * Require that the event filter and branch filter permissions match.
+ *
+ * The event and branch permissions can only mismatch if the user set
+ * at least one of the privilege branch filters in PERF_SAMPLE_BRANCH_PLM_ALL.
+ * Otherwise, the core will set the branch sample permissions in
+ * perf_copy_attr().
+ */
+ if ((event->attr.exclude_user != !(branch_type & PERF_SAMPLE_BRANCH_USER)) ||
+ (event->attr.exclude_kernel != !(branch_type & PERF_SAMPLE_BRANCH_KERNEL)) ||
I don't think this one is right. By default perf_copy_attr() copies the
exclude_ settings into the branch settings so this works, but if the
user sets any _less_ permissive branch setting this fails. For example:
# perf record -j any,u -- true
Error:
cycles:PH: PMU Hardware or event type doesn't support branch stack
sampling.
Here we want the default sampling permissions (exclude_kernel == 0,
exclude_user == 0), but only user branch records, which doesn't match.
It should be allowed because it doesn't include anything that we're not
allowed to see.
I know it is allowed(on x86), but why would we want that? If you do
something even more restricted:
perf record -e cycles:k -j any,u -- true
That's allowed on x86 and gives you samples with user addresses. But
all the events happened in the kernel. How does that make any sense?
I suppose in your example, we could avoid attaching branch stack on
samples from the kernel. However, given how my example works, I'm
pretty sure that's not what x86 does.
There's also combinations that are allowed, but record no samples.
Though I think that was with guest events. I've gone with reject
non-sense combinations as much as possible. We can easily remove those
restrictions later if needed. Changing the behavior later (for the
same configuration) wouldn't be good.
This also makes the Perf branch test skip because it uses
any,save_type,u to see if BRBE exists.
Yes, I plan to update that if we keep this behavior.
+ (!is_kernel_in_hyp_mode() &&[...]
+ (event->attr.exclude_hv != !(branch_type & PERF_SAMPLE_BRANCH_HV))))
+ return false;
+
+ event->hw.branch_reg.config = branch_type_to_brbfcr(event->attr.branch_sample_type);
+ event->hw.extra_reg.config = branch_type_to_brbcr(event->attr.branch_sample_type);
+
+ return true;
+}
+
+static const int brbe_type_to_perf_type_map[BRBINFx_EL1_TYPE_DEBUG_EXIT + 1][2] = {
+ [BRBINFx_EL1_TYPE_DIRECT_UNCOND] = { PERF_BR_UNCOND, 0 },
Does the second field go into 'new_type'? They all seem to be zero so
I'm not sure why new_type isn't ignored instead of having it mapped.
Well, left over from when all the Arm specific types were supported.
So yeah, that can be simplified.
+ [BRBINFx_EL1_TYPE_INDIRECT] = { PERF_BR_IND, 0 },
+ [BRBINFx_EL1_TYPE_DIRECT_LINK] = { PERF_BR_CALL, 0 },
+ [BRBINFx_EL1_TYPE_INDIRECT_LINK] = { PERF_BR_IND_CALL, 0 },
+ [BRBINFx_EL1_TYPE_RET] = { PERF_BR_RET, 0 },
+ [BRBINFx_EL1_TYPE_DIRECT_COND] = { PERF_BR_COND, 0 },
+ [BRBINFx_EL1_TYPE_CALL] = { PERF_BR_CALL, 0 },
+ [BRBINFx_EL1_TYPE_ERET] = { PERF_BR_ERET, 0 },
+ [BRBINFx_EL1_TYPE_IRQ] = { PERF_BR_IRQ, 0 },
How do ones that don't map to anything appear in Perf? For example
BRBINFx_EL1_TYPE_TRAP is missing, and the test that was attached to the
previous versions fails because it doesn't see the trap that jumps to
the kernel, but it does still see the ERET back to userspace:
[unknown]/trap_bench+0x20/-/-/-/0/ERET/-
In older versions we'd also have BRBINFx_EL1_TYPE_TRAP mapping to
PERF_BR_SYSCALL so you could see it go into the kernel before the return:
trap_bench+0x1C/[unknown]/-/-/-/0/SYSCALL/-
[unknown]/trap_bench+0x20/-/-/-/0/ERET/-
My read of that was we should see a CALL in this case. Whether SVC
generates a TRAP or CALL depends on HFGITR_EL2.SVC_EL0 (table D18-2).
I assumed "SVC due to HFGITR_EL2.SVC_EL0" means when SVC_EL0 is set
(and set has additional conditions). We have SVC_EL0 cleared, so that
should be a CALL. Maybe the FVP has this wrong?