Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC
From: Peter Zijlstra
Date: Fri Mar 15 2019 - 06:50:14 EST
On Mon, Mar 11, 2019 at 04:48:44PM +0000, Lendacky, Thomas wrote:
> On AMD processors, the detection of an overflowed counter in the NMI
> handler relies on the current value of the counter. So, for example, to
> check for overflow on a 48 bit counter, bit 47 is checked to see if it
> is 1 (not overflowed) or 0 (overflowed).
>
> There is currently a race condition present when disabling and then
> updating the PMC. Increased NMI latency in newer AMD processors makes this
> race condition more pronounced.
Increased NMI latency also makes the results less useful :/ What amount
of skid are we talking about, and is there anything AMD is going to do
about this?
> If the counter value has overflowed, it is
> possible to update the PMC value before the NMI handler can run.
Arguably the WRMSR should sync against the PMI. That is the beahviour
one would expect.
Isn't that something you can fix in ucode? And could you very please
tell the hardware people this is disguisting?
> The updated PMC value is not an overflowed value, so when the perf NMI
> handler does run, it will not find an overflowed counter. This may
> appear as an unknown NMI resulting in either a panic or a series of
> messages, depending on how the kernel is configured.
>
> To eliminate this race condition, the PMC value must be checked after
> disabling the counter in x86_pmu_disable_all(), and, if overflowed, must
> wait for the NMI handler to reset the value before continuing. Add a new,
> optional, callable function that can be used to test for and resolve this
> condition.
>
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.14.x-
> Signed-off-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
> +static void amd_pmu_wait_on_overflow(int idx, u64 config)
> +{
> + unsigned int i;
> + u64 counter;
> +
> + /*
> + * We shouldn't be calling this from NMI context, but add a safeguard
> + * here to return, since if we're in NMI context we can't wait for an
> + * NMI to reset an overflowed counter value.
> + */
> + if (in_nmi())
> + return;
> +
> + /*
> + * If the interrupt isn't enabled then we won't get the NMI that will
> + * reset the overflow condition, so return.
> + */
> + if (!(config & ARCH_PERFMON_EVENTSEL_INT))
> + return;
> +
> + /*
> + * Wait for the counter to be reset if it has overflowed. This loop
> + * should exit very, very quickly, but just in case, don't wait
> + * forever...
> + */
> + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
> + rdmsrl(x86_pmu_event_addr(idx), counter);
> + if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
> + break;
> +
> + /* Might be in IRQ context, so can't sleep */
> + udelay(1);
> + }
> +}
Argh.. that's horrible, as I'm sure you fully appreciate :/
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index b684f0294f35..f1d2f70000cd 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void)
> continue;
> val &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
> wrmsrl(x86_pmu_config_addr(idx), val);
> +
> + if (x86_pmu.wait_on_overflow)
> + x86_pmu.wait_on_overflow(idx, val);
> }
> }
One alternative is adding amd_pmu_disable_all() to amd/core.c and using
that. Then you can also change the loop to do the wait after all the
WRMSRs, if that helps with latency.