Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

From: Suthikulpanit, Suravee
Date: Sun May 09 2021 - 22:15:11 EST

On 5/5/2021 8:05 PM, Peter Zijlstra wrote:
On Wed, May 05, 2021 at 07:39:14PM +0700, Suthikulpanit, Suravee wrote:

On 5/4/2021 7:13 PM, Peter Zijlstra wrote:
On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:

On 5/4/2021 4:39 PM, Peter Zijlstra wrote:
On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.

This relies on the hardware counter being the full 64bit wide, is it?

The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
I might be missing some points here? Could you please describe?

How do you deal with the 48bit overflow if you don't use the interrupt?

The IOMMU Perf driver does not currently handle counter overflow since the overflow
notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the IOMMU event log,
and generate an IOMMU MSI interrupt to signal IOMMU driver to process the event.) is not
currently supported. When counter overflows, the counter becomes zero, and Perf
reports value zero for the event.

Alternatively, to detect overflow, we might start counting with value 1 so that
we can detect overflow when the value becomes zero in which case the Perf driver
could generate error message.

Urgh.. the intel uncore driver programs an hrtimer to periodically fold
deltas. That way the counter will never be short.

Thanks for the info. I'll look into ways to detect and handle counter overflow for this.
So far, with the current power-gating, it has several restrictions regarding when
the HW counter can be accessed, which makes it more difficult to handle this.