Re: [RFC PATCH v2 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active

From: Peter Zijlstra
Date: Mon Mar 18 2019 - 06:40:22 EST


On Fri, Mar 15, 2019 at 08:41:00PM +0000, Lendacky, Thomas wrote:
> This issue is different from a previous issue related to perf NMIs that
> was fixed in commit:
> 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")
>
> The difference here is that the NMI latency can contribute to what appear
> to be spurious NMIs during the handling of PMC counter overflow while the
> counter is active as opposed to when the counter is being disabled.

But could we not somehow merge these two cases? The cause is similar
IIUC. The PMI gets triggered, but then:

- a previous PMI handler handled our overflow, or
- our event gets disabled.

and when our PMI triggers it finds nothing to do.

> +/*
> + * Because of NMI latency, if multiple PMC counters are active or other sources
> + * of NMIs are received, the perf NMI handler can handle one or more overflowed
> + * PMC counters outside of the NMI associated with the PMC overflow. If the NMI
> + * doesn't arrive at the LAPIC in time to become a pending NMI, then the kernel
> + * back-to-back NMI support won't be active. This PMC handler needs to take into
> + * account that this can occur, otherwise this could result in unknown NMI
> + * messages being issued. Examples of this is PMC overflow while in the NMI
> + * handler when multiple PMCs are active or PMC overflow while handling some
> + * other source of an NMI.
> + *
> + * Attempt to mitigate this by using the number of active PMCs to determine
> + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset
> + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of the
> + * number of active PMCs or 2. The value of 2 is used in case an NMI does not
> + * arrive at the LAPIC in time to be collapsed into an already pending NMI.
> + */
> +static int amd_pmu_handle_irq(struct pt_regs *regs)
> +{
> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> + int active, handled;
> +
> + active = __bitmap_weight(cpuc->active_mask, X86_PMC_IDX_MAX);
> + handled = x86_pmu_handle_irq(regs);
> +
> + /*
> + * If no counters are active, reset perf_nmi_counter and return
> + * NMI_DONE
> + */
> + if (!active) {
> + this_cpu_write(perf_nmi_counter, 0);
> + return NMI_DONE;
> + }

This will actively render 63e6be6d98e1 void I think. Because that can
return !0 while !active -- that's rather the whole point of it.

> + /*
> + * If a counter was handled, record the number of possible remaining
> + * NMIs that can occur.
> + */
> + if (handled) {
> + this_cpu_write(perf_nmi_counter,
> + min_t(unsigned int, 2, active));
> +
> + return handled;
> + }
> +
> + if (!this_cpu_read(perf_nmi_counter))
> + return NMI_DONE;
> +
> + this_cpu_dec(perf_nmi_counter);
> +
> + return NMI_HANDLED;
> +}