Re: [PATCH] perf/arm-cmn: Fix DTC reset

From: Geoff Blake
Date: Thu Apr 06 2023 - 17:26:30 EST


Ran this patch on an AWS C6g.metal and unfortunately still see the
spurious IRQs trigger quickly (within 10 tries) when using the following
flow:

perf stat -a -e arm_cmn_0/event=0x5,type=0x5/ -- sleep 600
kexec -e

Adding in the simple shutdown routine, I have run over 100 of the
above cycles and the spurious IRQs haven't triggered. I think we still
need both for now.

-Geoff

On Thu, 6 Apr 2023, Robin Murphy wrote:

> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> It turns out that my naive DTC reset logic fails to work as intended,
> since clearing PMCR.PMU_EN appears to result in writes to PMOVSR_CLR
> being ignored, while some hard-to-characterise combination of conditions
> (differently between DTC0 and secondary DTCs) also appears to result in
> PMOVSR reading as zero even when an overflow remains asserted. Thus
> rather than resetting the PMU to a nice clean state, we can currently
> end up with screaming spurious interrupts from secondary DTCs which we
> can neither see nor clear. This behaviour is of course not documented.
>
> Resetting PMCR to disable the interrupt output but enable the PMU itself
> seems to at least make the PMOVSR_CLR write work as expected on DTC0
> (although it looks like writing to PMCR twice has actually been having
> some hidden side-effect of clearing any pending overflows there).
> Unfortunately this still does not seem to help secondary DTCs, but going
> beyond PMU scope and additionally resetting DTC_CTL does seems to make
> everything work out, and superficially looks sensible. Therefore pile
> that onto the house of empirical cards too, until I can check with the
> hardware team whether there's actually any proper recommended way of
> recovering from an arbitrary PMU state after an oops/kexec/whatever.
>
> Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
> Reported-by: Geoff Blake <blakgeof@xxxxxxxxxx>
> Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>
> ---
> This supersedes the previous shutdown/IRQ patches, now that I've
> finally managed to make *some* sense of what's really going on. If
> anyone's interested, this is the contrivance I used for testing:
>
> https://gitlab.arm.com/linux-arm/linux-rm/-/commit/d8f1035c5bc510516d6e4f0b7bf0b875a749daf7
> ---
> drivers/perf/arm-cmn.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index 144cc08d9e04..81fe01171e33 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -1899,7 +1899,10 @@ static int arm_cmn_init_dtc(struct arm_cmn *cmn, struct arm_cmn_node *dn, int id
> if (dtc->irq < 0)
> return dtc->irq;
>
> - writel_relaxed(0, dtc->base + CMN_DT_PMCR);
> + if (idx == 0)
> + writel_relaxed(0, dtc->base + CMN_DT_DTC_CTL);
> +
> + writel_relaxed(CMN_DT_PMCR_PMU_EN, dtc->base + CMN_DT_PMCR);
> writel_relaxed(0x1ff, dtc->base + CMN_DT_PMOVSR_CLR);
> writel_relaxed(CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);
>
> --
> 2.39.2.101.g768bb238c484.dirty
>
>