On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@xxxxxxxxxx> wrote:
Document the flag, hint what it's used for and give an example with
other useful options to get minimal output.
Signed-off-by: James Clark <james.clark@xxxxxxxxxx>
---
tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
index de2b0b479249..588eead438bc 100644
--- a/tools/perf/Documentation/perf-arm-spe.txt
+++ b/tools/perf/Documentation/perf-arm-spe.txt
@@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
store_filter=1 - collect stores only (PMSFCR.ST)
ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
+ discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
than only the execution latency.
@@ -220,6 +221,16 @@ Common errors
Increase sampling interval (see above)
+Discard mode
+~~~~~~~~~~~~
+
+SPE PMU events can be used without the overhead of collecting sample data if
+discard mode is supported (optional from Armv8.6). First run a system wide SPE
+session (or on the core of interest) using options to minimize output. Then run
+perf stat:
+
+ perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
+ perf stat -e SAMPLE_FEED_LD
Perhaps clarify this should be an ARM SPE event? It seems strange to
have one perf command affect a later one, the purpose of things like
event multiplexing is to hide the hardware limits. I'd prefer if the
last bit was like:
```
Then run perf stat with an SPE event on the same PMU:
perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
perf stat -e arm_spe/SAMPLE_FEED_LD/
``
Thanks,
Ian