[PATCH 0/2] perf/x86/amd: Add support for Large Increment per Cycle Events

From: Kim Phillips
Date: Thu Nov 14 2019 - 13:37:33 EST


This patchseries adds support for Large Increment per Cycle Events,
which is needed to count events like Retired SSE/AVX FLOPs.
The first patch constrains Large Increment events to the even PMCs,
and the second patch changes the scheduler to accommodate and
program the new Merge event needed on the odd counters.

The RFC was posted here:

https://lkml.org/lkml/2019/8/26/828

Changes since then include mostly fixing interoperation with the
watchdog, splitting, rewording, and addressing Peter Zijlstra's
comments:

- Mentioned programming the odd counter before the even counter
in the commit text, as is now also done in the code.

- Do the programming of the counters in the enable/disable paths
instead of the commit_scheduler hook.

- Instead of the loop re-counting all large increment events,
have collect_events() and a new amd_put_event_constraints_f17h
update a new cpuc variable 'n_lg_inc'. Now the scheduler
does a simple subtraction to get the target gpmax value.

- Amend the fastpath's used_mask code to fix a problem where
counter programming was being overwritten when running with
the watchdog enabled.

- Omit the superfluous __set_bit(idx + 1) in __perf_sched_find_counter
and clear the large increment's sched->state.used bit in the
path where a failure to schedule is determined due to the
next counter already being used (thanks Nathan Fontenot).

- Broaden new PMU initialization code to run on families 17h and
above.

- Have new is_large_inc(strcut perf_event) common to all x86 paths
as is is_pebs_pt(). That way, the raw event code checker
amd_is_lg_inc_event_code() can stay in its vendor-specific area
events/amd/core.c.

- __set_bit, WARN_ON(!gpmax), all addressed.

- WRT changing the naming to PAIR, etc. I dislike the idea because
h/w documentation consistently calls this now relatively old
feature for "Large Increment per Cycle" events, and the secondary
event needed, specifically the "Merge event (0xFFF)". When I
started this project the biggest problem was disambiguating
between the Large Increment event (FLOPs, or others), and the
Merge event (0xFFF) itself. Different phases had "Merge" for
the Merge event vs. "Merged" for the Large Increment event(s),
or "Mergee", which made reading the source code too easy to
mistake one for the other. So I opted for two distinctly
different base terms/stem-words: Large increment (lg_inc) and
Merge, to match the documentation, which basically has it right.
Changing the term to "pair" would have created the same "pair" vs.
"paired" vs. "pairer" etc. confusion, so I dropped it.

- WRT the comment "How about you make __perf_sched_find_count() set
the right value? That already knows it did this.", I didn't see
how I'd get away from still having to do the constraints flag &
LARGE_INC check in perf_assign_events(), to re-adjust the assignment
in the assign array, or sched.state.counter. This code really
is only needed after the counter assignment is made, in order to
program the h/w correctly.

Kim Phillips (2):
perf/x86/amd: Constrain Large Increment per Cycle events
perf/x86/amd: Add support for Large Increment per Cycle Events

arch/x86/events/amd/core.c | 110 +++++++++++++++++++++++++----------
arch/x86/events/core.c | 46 ++++++++++++++-
arch/x86/events/perf_event.h | 21 +++++++
3 files changed, 145 insertions(+), 32 deletions(-)

--
2.24.0