Re: [PATCH 1/3 v2] perf/amd/uncore: Prepare L3 thread mask code for Family 19h support
From: Stephane Eranian
Date: Tue Mar 17 2020 - 22:09:52 EST
On Fri, Mar 13, 2020 at 4:10 PM Kim Phillips <kim.phillips@xxxxxxx> wrote:
>
> In order to better accommodate the upcoming Family 19h support,
> given the 80-char line limit, move the existing code into a new
> l3_thread_slice_mask function.
>
> Signed-off-by: Kim Phillips <kim.phillips@xxxxxxx>
> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Michael Petlan <mpetlan@xxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: x86@xxxxxxxxxx
> ---
> v2: split into two parts, this one being the mechanical
> move based on Boris' review comments:
>
> https://lkml.org/lkml/2020/3/12/581
>
> arch/x86/events/amd/uncore.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
> index a6ea07f2aa84..dcdac001431e 100644
> --- a/arch/x86/events/amd/uncore.c
> +++ b/arch/x86/events/amd/uncore.c
> @@ -180,6 +180,20 @@ static void amd_uncore_del(struct perf_event *event, int flags)
> hwc->idx = -1;
> }
>
> +/*
> + * Convert logical cpu number to L3 PMC Config ThreadMask format
> + */
> +static u64 l3_thread_slice_mask(int cpu)
> +{
> + int thread = 2 * (cpu_data(cpu).cpu_core_id % 4);
> +
> + if (smp_num_siblings > 1)
> + thread += cpu_data(cpu).apicid & 1;
> +
> + return (1ULL << (AMD64_L3_THREAD_SHIFT + thread) &
> + AMD64_L3_THREAD_MASK) | AMD64_L3_SLICE_MASK;
> +}
> +
> static int amd_uncore_event_init(struct perf_event *event)
> {
> struct amd_uncore *uncore;
> @@ -209,15 +223,8 @@ static int amd_uncore_event_init(struct perf_event *event)
> * SliceMask and ThreadMask need to be set for certain L3 events in
> * Family 17h. For other events, the two fields do not affect the count.
> */
> - if (l3_mask && is_llc_event(event)) {
> - int thread = 2 * (cpu_data(event->cpu).cpu_core_id % 4);
> -
> - if (smp_num_siblings > 1)
> - thread += cpu_data(event->cpu).apicid & 1;
> -
> - hwc->config |= (1ULL << (AMD64_L3_THREAD_SHIFT + thread) &
> - AMD64_L3_THREAD_MASK) | AMD64_L3_SLICE_MASK;
> - }
> + if (l3_mask && is_llc_event(event))
> + hwc->config |= l3_thread_slice_mask(event->cpu);
>
By looking at this code, I realized that even on Zen2 this is wrong.
It does not work well.
You are basically saying that the L3 event is tied to the CPU the
event is programmed to.
But this does not work with the cpumask programmed for the amd_l3 PMU. This mask
shows, as it should, one CPU/CCX. So that means that when I do:
$ perf stat -a amd_l3/event=llc_event/
This only collects on the CPUs listed in the cpumask: 0,4,8,12 ....
That means that L3 events generated by the other CPUs on the CCX are
not monitored.
I can easily see the problem by pinning a memory bound program to
CPU64, for instance.
I think the thread mask should be exposed to the user. If not
specified, then set the mask to
cover all CPUs of the CCX. That way you can pick and choose what you
want. And with one event/CCX
you can monitor for all CPUs. I can send a patch that does that.
With what you have now, you have to force the list of CPUs with -C to
work around
the cpumask. And forcing the cpumask to 0-255 does not make sense because not
all L3 events necessarily need the L3 mask, so you don't want to program them on
all CPUs especially with 8 cpus/CCX and only 6 counters.
On Fri, Mar 13, 2020 at 4:10 PM Kim Phillips <kim.phillips@xxxxxxx> wrote:
>
> In order to better accommodate the upcoming Family 19h support,
> given the 80-char line limit, move the existing code into a new
> l3_thread_slice_mask function.
>
> Signed-off-by: Kim Phillips <kim.phillips@xxxxxxx>
> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Michael Petlan <mpetlan@xxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: x86@xxxxxxxxxx
> ---
> v2: split into two parts, this one being the mechanical
> move based on Boris' review comments:
>
> https://lkml.org/lkml/2020/3/12/581
>
> arch/x86/events/amd/uncore.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
> index a6ea07f2aa84..dcdac001431e 100644
> --- a/arch/x86/events/amd/uncore.c
> +++ b/arch/x86/events/amd/uncore.c
> @@ -180,6 +180,20 @@ static void amd_uncore_del(struct perf_event *event, int flags)
> hwc->idx = -1;
> }
>
> +/*
> + * Convert logical cpu number to L3 PMC Config ThreadMask format
> + */
> +static u64 l3_thread_slice_mask(int cpu)
> +{
> + int thread = 2 * (cpu_data(cpu).cpu_core_id % 4);
> +
> + if (smp_num_siblings > 1)
> + thread += cpu_data(cpu).apicid & 1;
> +
> + return (1ULL << (AMD64_L3_THREAD_SHIFT + thread) &
> + AMD64_L3_THREAD_MASK) | AMD64_L3_SLICE_MASK;
> +}
> +
> static int amd_uncore_event_init(struct perf_event *event)
> {
> struct amd_uncore *uncore;
> @@ -209,15 +223,8 @@ static int amd_uncore_event_init(struct perf_event *event)
> * SliceMask and ThreadMask need to be set for certain L3 events in
> * Family 17h. For other events, the two fields do not affect the count.
> */
> - if (l3_mask && is_llc_event(event)) {
> - int thread = 2 * (cpu_data(event->cpu).cpu_core_id % 4);
> -
> - if (smp_num_siblings > 1)
> - thread += cpu_data(event->cpu).apicid & 1;
> -
> - hwc->config |= (1ULL << (AMD64_L3_THREAD_SHIFT + thread) &
> - AMD64_L3_THREAD_MASK) | AMD64_L3_SLICE_MASK;
> - }
> + if (l3_mask && is_llc_event(event))
> + hwc->config |= l3_thread_slice_mask(event->cpu);
>
> uncore = event_to_amd_uncore(event);
> if (!uncore)
> --
> 2.25.1
>