Re: [PATCH v1 01/20] perf jevents: Add RAPL metrics for all Intel models

From: Liang, Kan
Date: Thu Feb 29 2024 - 16:11:04 EST




On 2024-02-28 7:17 p.m., Ian Rogers wrote:
> Add a 'cpu_power' metric group that computes the power consumption
> from RAPL events if they are present.
>
> Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> ---
> tools/perf/pmu-events/intel_metrics.py | 45 ++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> index 4fbb31c9eccd..5827f555005f 100755
> --- a/tools/perf/pmu-events/intel_metrics.py
> +++ b/tools/perf/pmu-events/intel_metrics.py
> @@ -1,9 +1,10 @@
> #!/usr/bin/env python3
> # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> -from metric import (JsonEncodeMetric, JsonEncodeMetricGroupDescriptions, LoadEvents,
> - MetricGroup)
> +from metric import (d_ratio, has_event, Event, JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
> + LoadEvents, Metric, MetricGroup, Select)
> import argparse
> import json
> +import math
> import os
>
> parser = argparse.ArgumentParser(description="Intel perf json generator")
> @@ -14,7 +15,45 @@ args = parser.parse_args()
> directory = f"{os.path.dirname(os.path.realpath(__file__))}/arch/x86/{args.model}/"
> LoadEvents(directory)
>
> -all_metrics = MetricGroup("",[])
> +interval_sec = Event("duration_time")
> +
> +def Rapl() -> MetricGroup:
> + """Processor socket power consumption estimate.
> +
> + Use events from the running average power limit (RAPL) driver.
> + """
> + # Watts = joules/second
> + pkg = Event("power/energy\-pkg/")
> + cond_pkg = Select(pkg, has_event(pkg), math.nan)
> + cores = Event("power/energy\-cores/")
> + cond_cores = Select(cores, has_event(cores), math.nan)
> + ram = Event("power/energy\-ram/")
> + cond_ram = Select(ram, has_event(ram), math.nan)
> + gpu = Event("power/energy\-gpu/")
> + cond_gpu = Select(gpu, has_event(gpu), math.nan)
> + psys = Event("power/energy\-psys/")
> + cond_psys = Select(psys, has_event(psys), math.nan)
> + scale = 2.3283064365386962890625e-10
> + metrics = [
> + Metric("cpu_power_pkg", "",
> + d_ratio(cond_pkg * scale, interval_sec), "Watts"),
> + Metric("cpu_power_cores", "",
> + d_ratio(cond_cores * scale, interval_sec), "Watts"),
> + Metric("cpu_power_ram", "",
> + d_ratio(cond_ram * scale, interval_sec), "Watts"),
> + Metric("cpu_power_gpu", "",
> + d_ratio(cond_gpu * scale, interval_sec), "Watts"),
> + Metric("cpu_power_psys", "",
> + d_ratio(cond_psys * scale, interval_sec), "Watts"),
> + ]
> +
> + return MetricGroup("cpu_power", metrics,
> + description="Processor socket power consumption estimates")

As far as I know, the RAPL counters are to monitor energy consumption
across different domains. The scope may not always be a socket. I think
the description may brings confusions.
Maybe we just call it "RAPL power consumption estimates", or "Running
Average Power Limit (RAPL) power consumption estimates".

Thanks,
Kan
> +
> +
> +all_metrics = MetricGroup("", [
> + Rapl(),
> +])
>
> if args.metricgroups:
> print(JsonEncodeMetricGroupDescriptions(all_metrics))