Re: [PATCH V2 0/5] Support metric group constraint

From: Jiri Olsa
Date: Wed Feb 26 2020 - 10:36:05 EST


On Mon, Feb 24, 2020 at 01:59:19PM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>
> Changes since V1:
> - Remove global static flag violate_nmi_constraint, and add a new
> function metricgroup___watchdog_constraint_hint() for all
> watchdog constraint hints in patch 4.
> The rest of the patches are not changed.

Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx>

thanks,
jirka

>
> Some metric groups, e.g. Page_Walks_Utilization, will never count when
> NMI watchdog is enabled.
>
> $echo 1 > /proc/sys/kernel/nmi_watchdog
> $perf stat -M Page_Walks_Utilization
>
> Performance counter stats for 'system wide':
>
> <not counted> itlb_misses.walk_pending (0.00%)
> <not counted> dtlb_load_misses.walk_pending (0.00%)
> <not counted> dtlb_store_misses.walk_pending (0.00%)
> <not counted> ept.walk_pending (0.00%)
> <not counted> cycles (0.00%)
>
> 2.343460588 seconds time elapsed
>
> Some events weren't counted. Try disabling the NMI watchdog:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> The events in group usually have to be from the same PMU. Try
> reorganizing the group.
>
> A metric group is a weak group, which relies on group validation
> code in the kernel to determine whether to be opened as a group or
> a non-group. However, group validation code may return false-positives,
> especially when NMI watchdog is enabled. (The metric group is allowed
> as a group but will never be scheduled.)
>
> The attempt to fix the group validation code has been rejected.
> https://lore.kernel.org/lkml/20200117091341.GX2827@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> Because we cannot accurately predict whether the group can be scheduled
> as a group, only by checking current status.
>
> This patch set provides another solution to mitigate the issue.
> Add "MetricConstraint" in event list, which provides a hint for perf tool,
> e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
> metric group to non-group (standalone metrics) if NMI watchdog is enabled.
>
> After applying the patch,
>
> $echo 1 > /proc/sys/kernel/nmi_watchdog
> $perf stat -M Page_Walks_Utilization
> Splitting metric group Page_Walks_Utilization into standalone metrics.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
>
> Performance counter stats for 'system wide':
>
> 18,253,454 itlb_misses.walk_pending # 0.0
> Page_Walks_Utilization (50.55%)
> 78,051,525 dtlb_load_misses.walk_pending (50.55%)
> 29,213,063 dtlb_store_misses.walk_pending (50.55%)
> 0 ept.walk_pending (50.55%)
> 2,542,132,364 cycles (49.92%)
>
> 1.037095993 seconds time elapsed
>
> Kan Liang (5):
> perf jevents: Support metric constraint
> perf metricgroup: Factor out metricgroup__add_metric_weak_group()
> perf util: Factor out sysctl__nmi_watchdog_enabled()
> perf metricgroup: Support metric constraint
> perf vendor events: Add NO_NMI_WATCHDOG metric constraint
>
> .../arch/x86/cascadelakex/clx-metrics.json | 3 +-
> .../pmu-events/arch/x86/skylake/skl-metrics.json | 3 +-
> .../pmu-events/arch/x86/skylakex/skx-metrics.json | 3 +-
> tools/perf/pmu-events/jevents.c | 19 ++--
> tools/perf/pmu-events/jevents.h | 2 +-
> tools/perf/pmu-events/pmu-events.h | 1 +
> tools/perf/util/metricgroup.c | 109 ++++++++++++++++-----
> tools/perf/util/stat-display.c | 6 +-
> tools/perf/util/util.c | 18 ++++
> tools/perf/util/util.h | 2 +
> 10 files changed, 128 insertions(+), 38 deletions(-)
>
> --
> 2.7.4
>