Re: [PATCH] perf stat: Fix false NMI watchdog warning in aggregation modes
From: Chun-Tse Shao
Date: Thu Jun 11 2026 - 18:07:32 EST
On Wed, Jun 10, 2026 at 10:35 AM Chun-Tse Shao <ctshao@xxxxxxxxxx> wrote:
>
> In aggregation modes (e.g. --per-socket, --per-die, etc.), a
> counter might not be scheduled or counted on specific aggregate
> groups if it was not assigned to the CPUs belonging to those
> groups. However, the printout() check triggers the
> "print_free_counters_hint" logic unconditionally for any
> supported counter with a missing count. This results in a false
> "Some events weren't counted. Try disabling the NMI watchdog"
> warning.
>
> This warning was originally introduced in commit 02d492e5dcb7
> ("perf stat: Issue a HW watchdog disable hint").
>
> To fix this, add the helper evsel__should_run_on_aggr() to
> verify if the counter was supposed to run on the aggregate CPU
> ID before triggering the hint. Additionally, correctly handle
> per-thread/per-process execution (which uses a dummy CPU map
> with a single -1 entry) by immediately returning true,
> ensuring legitimate warnings are still reported.
>
> Example before/after:
>
> $ perf stat -M lpm_miss_lat --metric-only --per-socket -a -- sleep 1
>
> Before:
> Performance counter stats for 'system wide':
>
> ns lpm_miss_lat_rem ns lpm_miss_lat_loc
> S0 126 202.3 207.9
> S1 126 231.9 259.3
>
> 1.006029831 seconds time elapsed
>
> Some events weren't counted. Try disabling the NMI watchdog:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
>
> After:
> Performance counter stats for 'system wide':
>
> ns lpm_miss_lat_rem ns lpm_miss_lat_loc
> S0 126 202.3 207.9
> S1 126 231.9 259.3
>
> 1.006029831 seconds time elapsed
>
> Assisted-by: Gemini:gemini-next
> Signed-off-by: Chun-Tse Shao <ctshao@xxxxxxxxxx>
> ---
> tools/perf/util/stat-display.c | 25 ++++++++++++++++++++++++-
> 1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 2b69d238858c..e50557964916 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -792,6 +792,28 @@ static bool evlist__has_hybrid_pmus(struct evlist *evlist)
> return false;
> }
>
> +static bool evsel__should_run_on_aggr(struct perf_stat_config *config,
> + struct evsel *counter,
> + const struct aggr_cpu_id *id)
> +{
> + struct perf_cpu cpu;
> + unsigned int idx;
> +
> + if (!config->aggr_map || !config->aggr_get_id)
> + return true;
> +
> + perf_cpu_map__for_each_cpu(cpu, idx, counter->core.cpus) {
> + struct aggr_cpu_id own_id;
> +
> + if (cpu.cpu < 0)
> + return true;
> +
> + own_id = config->aggr_get_id(config, cpu);
> + if (aggr_cpu_id__equal(id, &own_id))
> + return true;
> + }
> + return false;
> +}
> static void printout(struct perf_stat_config *config, struct outstate *os,
> double uval, u64 run, u64 ena, double noise, int aggr_idx)
> {
> @@ -822,7 +844,8 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
>
> if (counter->supported) {
> if (!evlist__has_hybrid_pmus(counter->evlist)) {
Had an offline discussion with Ian, and it turns out the NMI watchdog
only uses core events. Therefore, we can simply exclude uncore events
from triggering the watchdog hint. Core PMU events do not trigger this
false warning in aggregation modes because they are active across all
cores.
v2: https://lore.kernel.org/20260611215632.562900-1-ctshao@xxxxxxxxxx/T/#u
Thanks,
CT
> - config->print_free_counters_hint = 1;
> + if (evsel__should_run_on_aggr(config, counter, &os->id))
> + config->print_free_counters_hint = 1;
> }
> }
> }
> --
> 2.54.0.1099.g489fc7bff1-goog
>