Add top down metrics to perf stat
From: Andi Kleen
Date: Fri Aug 07 2015 - 21:07:02 EST
This patchkit adds support for TopDown to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.
TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads,
due to out of order effects.
This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)
The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring
that describe the CPU pipeline behavior on a high level.
FrontendBound and BackendBound
BadSpeculation is a higher
The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)
The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.
TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
topdown-total-slots Available slots in the pipeline
topdown-slots-issued Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles Pipeline gaps during recovery
from misspeculation
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.
Example output:
$ ./perf stat --topdown -a ./BC1s
Performance counter stats for 'system wide':
S0-C0 2 19650790 topdown-total-slots (100.00%)
S0-C0 2 4445680.00 topdown-fetch-bubbles # 22.62% frontend bound (100.00%)
S0-C0 2 1743552.00 topdown-slots-retired (100.00%)
S0-C0 2 622954 topdown-recovery-bubbles (100.00%)
S0-C0 2 2025498.00 topdown-slots-issued # 63.90% backend bound
S0-C1 2 16685216540 topdown-total-slots (100.00%)
S0-C1 2 962557931.00 topdown-fetch-bubbles (100.00%)
S0-C1 2 4175583320.00 topdown-slots-retired (100.00%)
S0-C1 2 1743329246 topdown-recovery-bubbles # 22.22% bad speculation (100.00%)
S0-C1 2 6138901193.50 topdown-slots-issued # 46.99% backend bound
1.535832673 seconds time elapsed
On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.
On systems without Hyper Threading it can be used per process.
Full tree available in
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/