Re: [PATCH 6/6] arm64: perf: Add support for chaining counters

From: Robin Murphy
Date: Mon May 21 2018 - 10:35:17 EST


On 21/05/18 15:41, Suzuki K Poulose wrote:
On 21/05/18 15:00, Robin Murphy wrote:
On 21/05/18 14:42, Suzuki K Poulose wrote:
On 18/05/18 16:57, Suzuki K Poulose wrote:
Hi Robin,

On 18/05/18 14:49, Robin Murphy wrote:
On 18/05/18 11:22, Suzuki K Poulose wrote:
Add support for chained event counters. PMUv3 allows chaining
a pair of adjacent PMU counters (with the lower counter number
being always "even"). The low counter is programmed to count
the event of interest and the high counter(odd numbered) is
programmed with a special event code (0x1e - Chain). Thus
we need special allocation schemes to make the full use of
available counters. So, we allocate the counters from either
ends. i.e, chained counters are allocated from the lower
end in pairs of two and the normal counters are allocated
from the higher number. Also makes necessary changes to
handle the chained events as a single event with 2 counters.

Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Will Deacon <will.deacon@xxxxxxx>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx>


 /*
@@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ &armv8_pmuv3_perf_cache_map,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ARMV8_PMU_EVTYPE_EVENT);
-ÂÂÂ if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
+ÂÂÂ if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
+ÂÂÂÂÂÂÂ /* Prevent chaining for cycle counter */

Why? Sure, we want to avoid executing the chaining logic if we're scheduling a cycles event in the dedicated counter (which is perhaps what the comment above wanted to say), but if one ends up allocated into a regular counter (e.g. if the user asks for multiple cycle counts with different filters), then I don't see any reason to forbid that being chained.

Ah, I didn't think about that case. I was under the assumption that the
cycles are *only* placed on the cycle counter. I will take care of that.
Thanks for the review.

Robin, Mark, Will

One potential problem I see with allowing chaining of the cycle counter
*and* the promotion of cycle event to 64bit by default is when the user
may actually be able to count 1 less event (due to the promotion of
cycle event to 64bit and thus forcing to use chain, if the cycle counter
is unavailable).

Right, I didn't mean to imply we should inject the "chain" attr automatically for all cycles events, just that we shouldn't be rejecting it if the user does explicitly set it (but then just ignore it if using the dedicated counter).

Right, I was not talking about the automatic "chain" for cycles events. The
problem is we don't know if we would get the "cycle" counter for the given
event until it is "added", at which point we have already decided
whether the event is 32bit or 64bit. So, we cannot really delay the decision
until that. Thats where this comes up. Given a cycle event (without an explicit
chain request), do we treat it as a 64bit event or not ? If we do, we could

1) get the Cycle counter, all is fine.

2) If not, fallback to Chaining. The user looses a counter.

Ah, I think I see where our wires might be getting crossed here - taking a second look at patch #4 I see you're inherently associating ARMPMU_EVT_LONG with the ARMV8_PMUV3_PERFCTR_CPU_CYCLES event ID. What I'm thinking of is that we would only set that flag if and when we've allocated the cycle counter or a chained pair of regular counters (i.e. in get_event_idx() or later). In other words, it becomes a property of the counter(s) backing the event, rather than of the hardware event itself, which I think makes logical sense.

So one option is to drop automatic promotion of the cycle counter to
64bit and do it only when it is requested by the user and use either the
Cycle counter (preferred) or fall back to chaining. That way, the user
has the control over the number of events he can count using the given
set of counters.

Naively, there doesn't seem to be any inherent harm in always using 64-bit arithmetic for the dedicated counter, but it would mean that with multiple (non-chained) cycles events, some would be taking an interrupt every few seconds while one would effectively never overflow. I guess the question is whether that matters or not.


The problem is we can't have a mask per counter as we don't know where
the event would be placed until it is added. Or we should delay the
period initialisation/update to post get_event_idx().

...but does indeed mean that we can't initialise period stuff until we know where the event has been placed. I reckon that is a reasonable thing to do, but let's see what Mark and Will think.

(I guess there's also an ugly compromise in which we could re-run the sample_period setup logic as a special case when allocating the cycle counter, but even I don't like the idea of that)

Robin.