[RFC PATCH v2 00/14] perf stat: Decouple and modularize metrics/events output printing API
From: Ian Rogers
Date: Mon May 25 2026 - 19:19:20 EST
This RFC patch series introduces a complete architectural refactoring
to decouple and modularize the event and metric output printing
engine inside 'perf stat'.
======================
Background and Motivation
======================
Historically, 'perf stat' output printing was tightly coupled with
data collection, aggregation math, and shadow metrics calculation.
Formatting logic (Standard Console, CSV, and JSON) was scattered
across util/stat-display.c, featuring massive, complex switch-cases,
temporal adjacency assumptions, and duplicated layout logic. Adding
new metrics, uncore PMUs, or topology-aware CPU aggregation modes
frequently resulted in accidental layout regressions, broken field
counts in CSV linters, or parsing crashes.
This patch series decouples the data-traversal and shadows-metric
calculations from the visual layout rendering, introducing a highly
optimized, modular, and type-safe callback-driven print
architecture.
======================
Decoupled Printing Strategy
======================
1. Format-Agnostic Traversal Driver (util/stat-print.c)
The core display logic is abstracted into a generic traversal
driver, perf_stat__print_cb(). This driver manages the complex
CPU/thread/topology aggregation loops, resolves hybrid wildcard
merges, filters default skipped uncore metrics, and calculates
raw shadow metrics. Once the data points are prepared, the driver
streams them cleanly to formatting callbacks.
- Safety: The core `calculate_and_print_metric` traversal is
fully protected with early-exit checks if formatting callbacks
choose to leave `print_metric` unpopulated.
2. Type-Safe Callbacks Interface (struct perf_stat_print_callbacks)
Output formats communicate with the driver using a clean
streaming interface:
- print_start(): Initializes format-private DOM states.
- print_event(): Buffers or prints raw counter event details.
- print_metric(): Buffers or prints calculated shadow metrics.
- print_end(): Finalizes rendering and cleans up structures.
3. Format-Specific Rendering Engines:
- Standard Console (util/stat-print-std.c):
Buffers events and metrics into standard-private DOM lists.
It resolves default metric-group skipped headers, prepends
formatted interval timestamps, aligns rows dynamically using
aggr_header_lens, and prints them cleanly in print_end().
- Refinement: Cleanly resolves `aggr_idx == -1` global indices
by tracking bounds with a `-2` initialization indicator,
protecting all lookups from out-of-bounds array reads.
It safely resets the active event pointer if a zero counter
is skipped locally, avoiding temporal violation false-positives.
- CSV Printing (util/stat-print-csv.c):
Buffers events and metrics into format-private queues,
formatting rows separated by config->csv_sep. Corrects
metrics continuation padding to print exactly 4 separators,
ensuring column counts are strictly and visually valid.
- Refinement: Decoupled CSV headers now output static
structural labels (e.g. "cpu,", "die,") instead of live
hardware IDs, and prevent redundant header rows in interval
mode by persisting state tracking.
- Streaming JSON Printing (util/stat-print-json.c):
Implements a highly optimized, 100% streaming, zero-allocation
print engine that bypasses dynamic queues and metrics buffering
completely! JSON objects and interval keys are formatted and
streamed directly onto the output file descriptor, maximizing
speed and eliminating heap allocation overhead.
- Refinement: Completely zero-allocation fast-path rendering
inside `json_metric_only_print_metric` by streaming strings
directly without dynamic `asprintf` or `strdup` overheads.
4. Centralized Aggregation Prefix Formatting
Duplicates in CPU/thread aggregation prefix rendering are
completely eliminated by exposing arrays globally and introducing
shared generic helpers in stat-print.c:
- perf_stat__get_aggr_key(): Resolves the JSON key name.
- perf_stat__get_aggr_id_char(): Resolves the unified prefix.
This mathematically guarantees absolute structural and visual
consistency across all formats.
5. Temporal Coupling Sanity Checks
A strict temporal coupling constraint (that the traversal driver
always invokes print_metric() callbacks synchronously and
consecutively for the same PMU/event node immediately after its
print_event() callback) is formally protected by adding a
runtime evsel matching check inside both STD and CSV engines:
if (evsel != ps->current_event->evsel) abort_print();
======================
Verification and Testing
======================
All automated shell linters (stat+std_output.sh, stat+csv_output.sh,
stat+json_output.sh) have been extended to run their entire
aggregation suites a second time under the new printer flag
(--new), passing with 100% success. The PMU metrics value Python
validation script and stat_metrics_values.sh have also been
extended with --new flag testing, ensuring complete mathematical
correctness of calculated metric values.
- Test Quality: JSON linter checks define dynamic `api_label`
indicators to generate highly distinguishable and descriptive
output logs between legacy and `--new` passes.
======================
Changes since v1:
======================
- calculate_and_print_metric: added safe print_metric NULL callback check.
- should_skip_zero_counter: added safe aggr_idx bounds check to avoid
out-of-bounds mapping array access when aggr_idx is negative.
- std_print_event: reset ps->current_event pointer on skipped zero counters
to avoid temporal coupling mismatch violations.
- std_metric_only_print_end: only print metric headers once in
interval mode, and print dynamic spacing padding to perfectly
align columns.
- csv_metric_only_print_end: only print CSV headers once in
interval mode, print static aggregation labels instead of live
hardware IDs, and fix column misalignment under AGGR_GLOBAL by
initializing current_aggr to -2 sentinel.
- json_metric_only_print_metric: completely zero-allocation fast-path
rendering by streaming combined keys directly without dynamic heap string
allocations, and resolve AGGR_GLOBAL indices by initializing
last_aggr_idx to -2.
- stat+json_output.sh: define dynamic api_label to generate highly
distinguishable and descriptive output logs between legacy and
--new passes.
- merged duplicate skip_test block structures inside linter shell scripts.
- documented -2 sentinel choices as C comments inside standard, CSV,
and JSON print engines.
We would highly appreciate reviews, comments, and feedback on this
decoupled output printing strategy.
Assisted-by: Antigravity:gemini-3.5-flash
***
Ian Rogers (14):
perf stat: Introduce core generic print traversal engine and header
stubs
perf stat: Implement standard console (STD) formatting callbacks
perf stat: Extend STD output linter to test basic New API checks
perf stat: Extend STD output linter to test core aggregation checks
perf stat: Extend STD output linter to test advanced PMU checks
perf stat: Extend STD output linter to test metric-only checks
perf stat: Implement CSV formatting callbacks
perf stat: Extend CSV output linter to test core aggregation checks
perf stat: Extend CSV output linter to test advanced PMU and
metric-only checks
perf stat: Implement streaming JSON formatting callbacks
perf stat: Extend JSON output linter to test core aggregation checks
perf stat: Extend JSON output linter to test advanced PMU and
metric-only checks
perf stat: Add --new support to PMU metrics Python validator
perf stat: Extend PMU metrics value linter to validate --new outputs
tools/perf/builtin-stat.c | 261 +++---
.../tests/shell/lib/perf_metric_validation.py | 12 +-
tools/perf/tests/shell/stat+csv_output.sh | 19 +
tools/perf/tests/shell/stat+json_output.sh | 74 +-
tools/perf/tests/shell/stat+std_output.sh | 18 +
tools/perf/tests/shell/stat_metrics_values.sh | 13 +-
tools/perf/util/Build | 4 +
tools/perf/util/stat-display.c | 28 +-
tools/perf/util/stat-print-csv.c | 534 ++++++++++++
tools/perf/util/stat-print-json.c | 330 ++++++++
tools/perf/util/stat-print-std.c | 773 ++++++++++++++++++
tools/perf/util/stat-print.c | 490 +++++++++++
tools/perf/util/stat-print.h | 133 +++
tools/perf/util/stat.h | 2 +
14 files changed, 2519 insertions(+), 172 deletions(-)
create mode 100644 tools/perf/util/stat-print-csv.c
create mode 100644 tools/perf/util/stat-print-json.c
create mode 100644 tools/perf/util/stat-print-std.c
create mode 100644 tools/perf/util/stat-print.c
create mode 100644 tools/perf/util/stat-print.h
--
2.54.0.794.g4f17f83d09-goog