Re: [RFC PATCH v2 00/14] perf stat: Decouple and modularize metrics/events output printing API

From: Chun-Tse Shao

Date: Fri Jun 05 2026 - 14:05:45 EST


Hi Ian,

On Mon, May 25, 2026 at 4:19 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> This RFC patch series introduces a complete architectural refactoring
> to decouple and modularize the event and metric output printing
> engine inside 'perf stat'.

I really like this change. Historically, fixing printing format issues
in perf stat has been painful due to the tight coupling of printing
logic with aggregation and math in util/stat-display.c. Decoupling
these logics makes the codebase much easier to maintain and simplifies
future changes to the print format.

Acked-by: Chun-Tse Shao <ctshao@xxxxxxxxxx>

>
>
> ======================
> Background and Motivation
> ======================
> Historically, 'perf stat' output printing was tightly coupled with
> data collection, aggregation math, and shadow metrics calculation.
> Formatting logic (Standard Console, CSV, and JSON) was scattered
> across util/stat-display.c, featuring massive, complex switch-cases,
> temporal adjacency assumptions, and duplicated layout logic. Adding
> new metrics, uncore PMUs, or topology-aware CPU aggregation modes
> frequently resulted in accidental layout regressions, broken field
> counts in CSV linters, or parsing crashes.
>
> This patch series decouples the data-traversal and shadows-metric
> calculations from the visual layout rendering, introducing a highly
> optimized, modular, and type-safe callback-driven print
> architecture.
>
> ======================
> Decoupled Printing Strategy
> ======================
> 1. Format-Agnostic Traversal Driver (util/stat-print.c)
> The core display logic is abstracted into a generic traversal
> driver, perf_stat__print_cb(). This driver manages the complex
> CPU/thread/topology aggregation loops, resolves hybrid wildcard
> merges, filters default skipped uncore metrics, and calculates
> raw shadow metrics. Once the data points are prepared, the driver
> streams them cleanly to formatting callbacks.
> - Safety: The core `calculate_and_print_metric` traversal is
> fully protected with early-exit checks if formatting callbacks
> choose to leave `print_metric` unpopulated.
>
> 2. Type-Safe Callbacks Interface (struct perf_stat_print_callbacks)
> Output formats communicate with the driver using a clean
> streaming interface:
> - print_start(): Initializes format-private DOM states.
> - print_event(): Buffers or prints raw counter event details.
> - print_metric(): Buffers or prints calculated shadow metrics.
> - print_end(): Finalizes rendering and cleans up structures.
>
> 3. Format-Specific Rendering Engines:
> - Standard Console (util/stat-print-std.c):
> Buffers events and metrics into standard-private DOM lists.
> It resolves default metric-group skipped headers, prepends
> formatted interval timestamps, aligns rows dynamically using
> aggr_header_lens, and prints them cleanly in print_end().
> - Refinement: Cleanly resolves `aggr_idx == -1` global indices
> by tracking bounds with a `-2` initialization indicator,
> protecting all lookups from out-of-bounds array reads.
> It safely resets the active event pointer if a zero counter
> is skipped locally, avoiding temporal violation false-positives.
> - CSV Printing (util/stat-print-csv.c):
> Buffers events and metrics into format-private queues,
> formatting rows separated by config->csv_sep. Corrects
> metrics continuation padding to print exactly 4 separators,
> ensuring column counts are strictly and visually valid.
> - Refinement: Decoupled CSV headers now output static
> structural labels (e.g. "cpu,", "die,") instead of live
> hardware IDs, and prevent redundant header rows in interval
> mode by persisting state tracking.
> - Streaming JSON Printing (util/stat-print-json.c):
> Implements a highly optimized, 100% streaming, zero-allocation
> print engine that bypasses dynamic queues and metrics buffering
> completely! JSON objects and interval keys are formatted and
> streamed directly onto the output file descriptor, maximizing
> speed and eliminating heap allocation overhead.
> - Refinement: Completely zero-allocation fast-path rendering
> inside `json_metric_only_print_metric` by streaming strings
> directly without dynamic `asprintf` or `strdup` overheads.
>
> 4. Centralized Aggregation Prefix Formatting
> Duplicates in CPU/thread aggregation prefix rendering are
> completely eliminated by exposing arrays globally and introducing
> shared generic helpers in stat-print.c:
> - perf_stat__get_aggr_key(): Resolves the JSON key name.
> - perf_stat__get_aggr_id_char(): Resolves the unified prefix.
> This mathematically guarantees absolute structural and visual
> consistency across all formats.
>
> 5. Temporal Coupling Sanity Checks
> A strict temporal coupling constraint (that the traversal driver
> always invokes print_metric() callbacks synchronously and
> consecutively for the same PMU/event node immediately after its
> print_event() callback) is formally protected by adding a
> runtime evsel matching check inside both STD and CSV engines:
> if (evsel != ps->current_event->evsel) abort_print();
>
> ======================
> Verification and Testing
> ======================
> All automated shell linters (stat+std_output.sh, stat+csv_output.sh,
> stat+json_output.sh) have been extended to run their entire
> aggregation suites a second time under the new printer flag
> (--new), passing with 100% success. The PMU metrics value Python
> validation script and stat_metrics_values.sh have also been
> extended with --new flag testing, ensuring complete mathematical
> correctness of calculated metric values.
> - Test Quality: JSON linter checks define dynamic `api_label`
> indicators to generate highly distinguishable and descriptive
> output logs between legacy and `--new` passes.
>
> ======================
> Changes since v1:
> ======================
> - calculate_and_print_metric: added safe print_metric NULL callback check.
> - should_skip_zero_counter: added safe aggr_idx bounds check to avoid
> out-of-bounds mapping array access when aggr_idx is negative.
> - std_print_event: reset ps->current_event pointer on skipped zero counters
> to avoid temporal coupling mismatch violations.
> - std_metric_only_print_end: only print metric headers once in
> interval mode, and print dynamic spacing padding to perfectly
> align columns.
> - csv_metric_only_print_end: only print CSV headers once in
> interval mode, print static aggregation labels instead of live
> hardware IDs, and fix column misalignment under AGGR_GLOBAL by
> initializing current_aggr to -2 sentinel.
> - json_metric_only_print_metric: completely zero-allocation fast-path
> rendering by streaming combined keys directly without dynamic heap string
> allocations, and resolve AGGR_GLOBAL indices by initializing
> last_aggr_idx to -2.
> - stat+json_output.sh: define dynamic api_label to generate highly
> distinguishable and descriptive output logs between legacy and
> --new passes.
> - merged duplicate skip_test block structures inside linter shell scripts.
> - documented -2 sentinel choices as C comments inside standard, CSV,
> and JSON print engines.
>
> We would highly appreciate reviews, comments, and feedback on this
> decoupled output printing strategy.
>
> Assisted-by: Antigravity:gemini-3.5-flash
>
> ***
>
> Ian Rogers (14):
> perf stat: Introduce core generic print traversal engine and header
> stubs
> perf stat: Implement standard console (STD) formatting callbacks
> perf stat: Extend STD output linter to test basic New API checks
> perf stat: Extend STD output linter to test core aggregation checks
> perf stat: Extend STD output linter to test advanced PMU checks
> perf stat: Extend STD output linter to test metric-only checks
> perf stat: Implement CSV formatting callbacks
> perf stat: Extend CSV output linter to test core aggregation checks
> perf stat: Extend CSV output linter to test advanced PMU and
> metric-only checks
> perf stat: Implement streaming JSON formatting callbacks
> perf stat: Extend JSON output linter to test core aggregation checks
> perf stat: Extend JSON output linter to test advanced PMU and
> metric-only checks
> perf stat: Add --new support to PMU metrics Python validator
> perf stat: Extend PMU metrics value linter to validate --new outputs
>
> tools/perf/builtin-stat.c | 261 +++---
> .../tests/shell/lib/perf_metric_validation.py | 12 +-
> tools/perf/tests/shell/stat+csv_output.sh | 19 +
> tools/perf/tests/shell/stat+json_output.sh | 74 +-
> tools/perf/tests/shell/stat+std_output.sh | 18 +
> tools/perf/tests/shell/stat_metrics_values.sh | 13 +-
> tools/perf/util/Build | 4 +
> tools/perf/util/stat-display.c | 28 +-
> tools/perf/util/stat-print-csv.c | 534 ++++++++++++
> tools/perf/util/stat-print-json.c | 330 ++++++++
> tools/perf/util/stat-print-std.c | 773 ++++++++++++++++++
> tools/perf/util/stat-print.c | 490 +++++++++++
> tools/perf/util/stat-print.h | 133 +++
> tools/perf/util/stat.h | 2 +
> 14 files changed, 2519 insertions(+), 172 deletions(-)
> create mode 100644 tools/perf/util/stat-print-csv.c
> create mode 100644 tools/perf/util/stat-print-json.c
> create mode 100644 tools/perf/util/stat-print-std.c
> create mode 100644 tools/perf/util/stat-print.c
> create mode 100644 tools/perf/util/stat-print.h
>
> --
> 2.54.0.794.g4f17f83d09-goog
>
>