Re: [PATCH 0/5] Support metric group constraint

From: Liang, Kan
Date: Thu Feb 20 2020 - 11:03:40 EST

On 2/20/2020 6:39 AM, Jiri Olsa wrote:
On Wed, Feb 19, 2020 at 11:08:35AM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

Some metric groups, e.g. Page_Walks_Utilization, will never count when
NMI watchdog is enabled.

$echo 1 > /proc/sys/kernel/nmi_watchdog
$perf stat -M Page_Walks_Utilization

Performance counter stats for 'system wide':

<not counted> itlb_misses.walk_pending (0.00%)
<not counted> dtlb_load_misses.walk_pending (0.00%)
<not counted> dtlb_store_misses.walk_pending (0.00%)
<not counted> ept.walk_pending (0.00%)
<not counted> cycles (0.00%)

2.343460588 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try
reorganizing the group.

A metric group is a weak group, which relies on group validation
code in the kernel to determine whether to be opened as a group or
a non-group. However, group validation code may return false-positives,
especially when NMI watchdog is enabled. (The metric group is allowed
as a group but will never be scheduled.)

The attempt to fix the group validation code has been rejected.
Because we cannot accurately predict whether the group can be scheduled
as a group, only by checking current status.

This patch set provides another solution to mitigate the issue.
Add "MetricConstraint" in event list, which provides a hint for perf tool,
e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
metric group to non-group (standalone metrics) if NMI watchdog is enabled.

the problem is in the missing counter, that's taken by NMI watchdog, right?

and it's problem for any metric that won't fit to the available
counters.. shouldn't we rather do this workaround for any metric
that wouldn't fit in available counters?

I think current perf already did this.
All metric groups are weak group. Kernel (validate_group()) tells perf tool whether a metric group fit to available counters.
If yes, the metric group will be scheduled as a group.
If no, perf tool will not using a group and re-try. The code is as below.

if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {

/* Weak group failed. Reset the group. */
if ((errno == EINVAL || errno == EBADF) &&
counter->leader != counter &&
counter->weak_group) {
counter = perf_evlist__reset_weak_group(evsel_list, counter);
goto try_again;

This patch set is to workaroud the false-positives from the kernel.
Kernel only validate the group itself. It assumes that all counters are available. So, when any counter is permanently occupied by others, e.g. watchdog, the validate_group() may return false-positives.

? not just for chosen

For now, I think we only need to workaround the Page_Walks_Utilization metric group. Because it has 5 events, and one of the events is cycles.
The cycles has to be scheduled on fixed counter 2. But it's already occupied by watchdog.
The false-positives of validate_group() will trigger the issue (metric group never be scheduled.).

For other metric groups, even they have cycles, the issue should not be triggered.
For example, if they have 4 or less events, the cycles can be scheduled to GP counter instead.
If they have 6 or more events, the weak group will be reject anyway.
Perf tool will open it as non-group (standalone metrics).

I think we only need to apply the constraint for the Page_Walks_Utilization metric group for now.