On Wed, Feb 19, 2020 at 11:08:35AM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
Some metric groups, e.g. Page_Walks_Utilization, will never count when
NMI watchdog is enabled.
$echo 1 > /proc/sys/kernel/nmi_watchdog
$perf stat -M Page_Walks_Utilization
Performance counter stats for 'system wide':
<not counted> itlb_misses.walk_pending (0.00%)
<not counted> dtlb_load_misses.walk_pending (0.00%)
<not counted> dtlb_store_misses.walk_pending (0.00%)
<not counted> ept.walk_pending (0.00%)
<not counted> cycles (0.00%)
2.343460588 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try
reorganizing the group.
A metric group is a weak group, which relies on group validation
code in the kernel to determine whether to be opened as a group or
a non-group. However, group validation code may return false-positives,
especially when NMI watchdog is enabled. (The metric group is allowed
as a group but will never be scheduled.)
The attempt to fix the group validation code has been rejected.
https://lore.kernel.org/lkml/20200117091341.GX2827@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
Because we cannot accurately predict whether the group can be scheduled
as a group, only by checking current status.
This patch set provides another solution to mitigate the issue.
Add "MetricConstraint" in event list, which provides a hint for perf tool,
e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
metric group to non-group (standalone metrics) if NMI watchdog is enabled.
the problem is in the missing counter, that's taken by NMI watchdog, right?
and it's problem for any metric that won't fit to the available
counters.. shouldn't we rather do this workaround for any metric
that wouldn't fit in available counters?
ones..?
thanks,
jirka