Re: [PATCH 8/8] drivers/perf: Add Apple icestorm/firestorm CPU PMU driver

From: Dougall
Date: Sat Nov 13 2021 - 21:43:39 EST


Apple distributes names (and descriptions and affinity masks) for 55
of the events with macOS in the file /usr/share/kpep/a14.plist
(exposed to users in Instruments.app). Many of those 55 events were
added in macOS 12, so it's good to check the latest version. I use
the command "plutil -convert json -o - /usr/share/kpep/a14.plist" to
get these as JSON.

There are many more events that I have discovered experimentally,
but this work is unusually hard to verify, so I'd be inclined to
stick with what's documented.

However, I have observed a few oddities that might be of interest.

The counter 0x9B (INST_LDST) works on PMCs 5, 6 and 7, but gives
different results for paired AMX instructions on PMC 7 (7 counts
instructions, while 5 and 6 count pairs as one). Apple addresses
this by restricting the affinity mask to PMC 7. This is also seen
on undocumented counter 0x96, which counts integer stores. (For
context, microarchitecturally non-load-store AMX operations appear
as stores, as they just need to be posted to the AMX coprocessor on
commit. Consecutive non-load-store AMX operations can be paired
(fused), such that they issue as one uop, which is where this
anomaly can be seen.)

Undocumented counters 0xF1 through 0xFF appear to be operation
counters, meaning their result depends on events selected on other
counters. There are three threshold registers (PMTRHLD2, PMTRHLD4,
PMTRHLD6) which can specify a threshold (in number of cycles) for
the operation counter on the PMC with the same number. There is also
a mapping register (PMMAP), which contains a 3-bit field for each
counter from PMC2 to PMC7, each specifying a PMC index which can be
used as an input to the operation. Binary operations only use
PMC2/4/6 and use PMC(n+1) as their other input. These operation
counters may also behave differently depending on the value
currently in the corresponding PMC (specifically counters F9/FA
which implement shortest/longest run of non-zero counts).

This is complicated, and it's not exposed to the user by macOS, so I
wouldn't worry about supporting it for now. Despite all this, the
events and features on the P and E cores seem to be the same, so I
don't expect a need to distinguish between them in the future.

(I've been meaning to write all this up properly, but haven't got
around to it, sorry!)

Dougall

On Sun, Nov 14, 2021 at 12:04 AM Alyssa Rosenzweig <alyssa@xxxxxxxxxxxxx> wrote:
>
> Cc'ing Dougall who has worked with the CPU performance counters
> extensively and might be able to shine light on the interpretations.