Re: [PATCH 2/3] perf/x86/amd/uncore: Dynamically allocate uncore counters

From: Peter Zijlstra
Date: Fri Jan 13 2017 - 03:59:19 EST


On Thu, Jan 12, 2017 at 06:48:40PM -0600, Natarajan, Janakarajan wrote:
>
> On 1/12/2017 3:20 AM, Peter Zijlstra wrote:
> >On Wed, Jan 11, 2017 at 10:02:17AM -0600, Janakarajan Natarajan wrote:
> >>This patch updates the AMD uncore driver to support AMD Family17h
> >>processors. In Family17h, there are two extra last level cache counters.
> >>The counters are, therefore, allocated dynamically based on the family.
> >>
> >>The cpu hotplug up callback function is refactored to better manage
> >>failure conditions.
> >>
> >>Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@xxxxxxx>
> >>---
> >> arch/x86/events/amd/uncore.c | 141 +++++++++++++++++++++++++++++++------------
> >> 1 file changed, 104 insertions(+), 37 deletions(-)
> >>
> >>diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
> >>index 24c8537..7ab92f7 100644
> >>--- a/arch/x86/events/amd/uncore.c
> >>+++ b/arch/x86/events/amd/uncore.c
> >>@@ -22,13 +22,16 @@
> >> #define NUM_COUNTERS_NB 4
> >> #define NUM_COUNTERS_L2 4
> >>-#define MAX_COUNTERS NUM_COUNTERS_NB
> >>+#define NUM_COUNTERS_L3 6
> >> #define RDPMC_BASE_NB 6
> >> #define RDPMC_BASE_LLC 10
> >> #define COUNTER_SHIFT 16
> >>+static int num_counters_llc;
> >>+static int num_counters_nb;
> >>+
> >> static HLIST_HEAD(uncore_unused_list);
> >> struct amd_uncore {
> >>@@ -40,7 +43,7 @@ struct amd_uncore {
> >> u32 msr_base;
> >> cpumask_t *active_mask;
> >> struct pmu *pmu;
> >>- struct perf_event *events[MAX_COUNTERS];
> >>+ struct perf_event **events;
> >> struct hlist_node node;
> >> };
> >Why bother with the dynamic allocation crud? Why not simply set
> >MAX_COUNTERS to 6 and be happy?
> My reasoning behind using dynamic allocation was to prevent memory from
> being allocated when not needed on a per cpu basis. If memory isn't a
> consideration, I can send a v2 without the dynamic memory allocation.

Generally a sensible thing to consider, but here we're talking about 16
bytes bytes or so and are adding quite a bit of logic just to save that
on older hardware.

Also, doing that allocation comes at the cost of having to do an extra
pointer dereference every time you use these things.

I'd just keep static sized array and bump the max to 6 and not worry too
much.

(fwiw, the generic x86 cpu pmu does static arrays of 64 entries, even
through there's not a single PMU to actually have that many counters
on).