Re: [PATCH V2 0/5] Support metric group constraint

From: Arnaldo Carvalho de Melo
Date: Tue Mar 10 2020 - 14:04:31 EST


Em Wed, Feb 26, 2020 at 04:35:49PM +0100, Jiri Olsa escreveu:
> On Mon, Feb 24, 2020 at 01:59:19PM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
> > From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> >
> > Changes since V1:
> > - Remove global static flag violate_nmi_constraint, and add a new
> > function metricgroup___watchdog_constraint_hint() for all
> > watchdog constraint hints in patch 4.
> > The rest of the patches are not changed.
>
> Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx>

Thanks, tested, applied,

- Arnaldo

> thanks,
> jirka
>
> >
> > Some metric groups, e.g. Page_Walks_Utilization, will never count when
> > NMI watchdog is enabled.
> >
> > $echo 1 > /proc/sys/kernel/nmi_watchdog
> > $perf stat -M Page_Walks_Utilization
> >
> > Performance counter stats for 'system wide':
> >
> > <not counted> itlb_misses.walk_pending (0.00%)
> > <not counted> dtlb_load_misses.walk_pending (0.00%)
> > <not counted> dtlb_store_misses.walk_pending (0.00%)
> > <not counted> ept.walk_pending (0.00%)
> > <not counted> cycles (0.00%)
> >
> > 2.343460588 seconds time elapsed
> >
> > Some events weren't counted. Try disabling the NMI watchdog:
> > echo 0 > /proc/sys/kernel/nmi_watchdog
> > perf stat ...
> > echo 1 > /proc/sys/kernel/nmi_watchdog
> > The events in group usually have to be from the same PMU. Try
> > reorganizing the group.
> >
> > A metric group is a weak group, which relies on group validation
> > code in the kernel to determine whether to be opened as a group or
> > a non-group. However, group validation code may return false-positives,
> > especially when NMI watchdog is enabled. (The metric group is allowed
> > as a group but will never be scheduled.)
> >
> > The attempt to fix the group validation code has been rejected.
> > https://lore.kernel.org/lkml/20200117091341.GX2827@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > Because we cannot accurately predict whether the group can be scheduled
> > as a group, only by checking current status.
> >
> > This patch set provides another solution to mitigate the issue.
> > Add "MetricConstraint" in event list, which provides a hint for perf tool,
> > e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
> > metric group to non-group (standalone metrics) if NMI watchdog is enabled.
> >
> > After applying the patch,
> >
> > $echo 1 > /proc/sys/kernel/nmi_watchdog
> > $perf stat -M Page_Walks_Utilization
> > Splitting metric group Page_Walks_Utilization into standalone metrics.
> > Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
> > echo 0 > /proc/sys/kernel/nmi_watchdog
> > perf stat ...
> > echo 1 > /proc/sys/kernel/nmi_watchdog
> >
> > Performance counter stats for 'system wide':
> >
> > 18,253,454 itlb_misses.walk_pending # 0.0
> > Page_Walks_Utilization (50.55%)
> > 78,051,525 dtlb_load_misses.walk_pending (50.55%)
> > 29,213,063 dtlb_store_misses.walk_pending (50.55%)
> > 0 ept.walk_pending (50.55%)
> > 2,542,132,364 cycles (49.92%)
> >
> > 1.037095993 seconds time elapsed
> >
> > Kan Liang (5):
> > perf jevents: Support metric constraint
> > perf metricgroup: Factor out metricgroup__add_metric_weak_group()
> > perf util: Factor out sysctl__nmi_watchdog_enabled()
> > perf metricgroup: Support metric constraint
> > perf vendor events: Add NO_NMI_WATCHDOG metric constraint
> >
> > .../arch/x86/cascadelakex/clx-metrics.json | 3 +-
> > .../pmu-events/arch/x86/skylake/skl-metrics.json | 3 +-
> > .../pmu-events/arch/x86/skylakex/skx-metrics.json | 3 +-
> > tools/perf/pmu-events/jevents.c | 19 ++--
> > tools/perf/pmu-events/jevents.h | 2 +-
> > tools/perf/pmu-events/pmu-events.h | 1 +
> > tools/perf/util/metricgroup.c | 109 ++++++++++++++++-----
> > tools/perf/util/stat-display.c | 6 +-
> > tools/perf/util/util.c | 18 ++++
> > tools/perf/util/util.h | 2 +
> > 10 files changed, 128 insertions(+), 38 deletions(-)
> >
> > --
> > 2.7.4
> >
>

--

- Arnaldo