[PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support
From: Sandipan Das
Date: Thu Aug 11 2022 - 08:30:56 EST
Last Branch Record (LBR) is a feature available on modern processors for
recording branch information. It helps determine the flow of control by
logging branch information to registers in realtime and helps with the
detection of hot code paths.
Add support for using AMD Last Branch Record Extension Version 2 (LbrExtV2)
features on Zen 4 processors. New CPU features are introduced for LbrExtV2
detection. New MSR definitions are added for configuring hardware branch
filtering and for enabling the LBR Freeze on PMI feature.
The LBR Freeze on PMI feature is essential for ensuring that branch records
remain consistent with the point of PMU overflow in order to provide a
precise correlation between the two.
Hardware branch filtering allows users to record only specific types of
branches and can be mapped to most of the existing filters supported by the
perf tool. Additional software filtering ensures that some special branches
(syscall entry and exit) for which direct hardware filters do not exist are
also recorded. This expands the scope of filters like "any_call".
Additionally, the perf UAPI is now extended to provide branch speculation
information, if available. LbrExtV2 provides this information through the
"valid" and "spec" bits in the Branch To registers. The tools-side changes
for this will be submitted as a separate series.
Users of perf tool can now record branches as shown below. The 'div'
workload used here is from https://lwn.net/Articles/680985/.
E.g.
$ perf record -b -e cycles:u ./div
Before:
Error:
cycles:u: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
After:
[ perf record: Woken up 49 times to write data ]
[ perf record: Captured and wrote 12.197 MB perf.data (29601 samples) ]
$ perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 473K of event 'cycles:u'
# Event count (approx.): 473521
#
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ...................... ...................... ..................
#
29.69% div div [.] main [.] main -
23.84% div div [.] compute_flag [.] main -
23.41% div div [.] compute_flag [.] compute_flag -
23.04% div div [.] main [.] compute_flag -
[...]
No additional failures are seen upon running the following:
* perf built-in test suite
* perf_event_tests suite
Sandipan Das (13):
perf/x86/amd/brs: Move feature-specific functions
perf/x86/amd/core: Refactor branch attributes
perf/x86/amd/core: Add generic branch record interfaces
x86/cpufeatures: Add LbrExtV2 feature bit
perf/x86/amd/lbr: Detect LbrExtV2 support
perf/x86/amd/lbr: Add LbrExtV2 branch record support
perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support
perf/x86: Move branch classifier
perf/x86/amd/lbr: Add LbrExtV2 software branch filter support
perf/x86: Make branch classifier fusion-aware
perf/x86/amd/lbr: Use fusion-aware branch classifier
perf/core: Add speculation info to branch entries
perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support
arch/x86/events/Makefile | 2 +-
arch/x86/events/amd/Makefile | 2 +-
arch/x86/events/amd/brs.c | 69 ++++-
arch/x86/events/amd/core.c | 200 +++++++------
arch/x86/events/amd/lbr.c | 435 +++++++++++++++++++++++++++++
arch/x86/events/intel/lbr.c | 273 ------------------
arch/x86/events/perf_event.h | 81 +++++-
arch/x86/events/utils.c | 247 ++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/msr-index.h | 5 +
arch/x86/include/asm/perf_event.h | 3 +-
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/perf_event.h | 1 +
include/uapi/linux/perf_event.h | 15 +-
14 files changed, 952 insertions(+), 384 deletions(-)
create mode 100644 arch/x86/events/amd/lbr.c
create mode 100644 arch/x86/events/utils.c
--
2.34.1