Re: [PATCH 5/5] perf docs: arm_spe: Document new discard mode

From: James Clark
Date: Wed Dec 18 2024 - 05:08:05 EST

On 18/12/2024 12:54 am, Ian Rogers wrote:
On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@xxxxxxxxxx> wrote:

Document the flag, hint what it's used for and give an example with
other useful options to get minimal output.

Signed-off-by: James Clark <james.clark@xxxxxxxxxx>
tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
index de2b0b479249..588eead438bc 100644
--- a/tools/perf/Documentation/perf-arm-spe.txt
+++ b/tools/perf/Documentation/perf-arm-spe.txt
@@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
store_filter=1 - collect stores only (PMSFCR.ST)
ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
+ discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)

+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
than only the execution latency.
@@ -220,6 +221,16 @@ Common errors

Increase sampling interval (see above)

+Discard mode
+SPE PMU events can be used without the overhead of collecting sample data if
+discard mode is supported (optional from Armv8.6). First run a system wide SPE
+session (or on the core of interest) using options to minimize output. Then run
+perf stat:
+ perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
+ perf stat -e SAMPLE_FEED_LD

Perhaps clarify this should be an ARM SPE event? It seems strange to
have one perf command affect a later one, the purpose of things like
event multiplexing is to hide the hardware limits. I'd prefer if the
last bit was like:
Then run perf stat with an SPE event on the same PMU:

perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
perf stat -e arm_spe/SAMPLE_FEED_LD/


Hi Ian,

Confusingly this isn't an SPE event, it is a normal PMU event. The fact that one Perf command affects the other is because these events only count when SPE is enabled. When it's enabled it has an effect on a per-core level which is why in the example I made it simpler by enabling SPE system wide.

SPE is an exclusive PMU like Coresight and some others so it can't be affected by multiplexing or anything like that. The SAMPLE_FEED_LD PMU would be, but as long as SPE stays enabled it will count the right thing regardless of multiplexing.
