Re: [PATCH 2/2 v2 RESEND] x86/cpu/amd: Enable the fixed intructions retired free counter IRPERF

From: Borislav Petkov
Date: Tue Feb 11 2020 - 08:42:17 EST


On Fri, Feb 07, 2020 at 05:04:27PM -0600, Kim Phillips wrote:
> commit aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions
> Retired) performance counter") added support for 'perf -e msr/irperf/',
> but when exercised, we always get a 0 count:
>
> BEFORE:
>
> $ sudo perf stat -e instructions,msr/irperf/ true
>
> Performance counter stats for 'true':
>
> 624,833 instructions
> # 0.00 stalled cycles per insn
> 0 msr/irperf/
>
> It turns out it simply needs its enable bit - HWCR bit 30 - set. This patch

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> does just that.
>
> Enablement is restricted to all machines advertising IRPERF capability,
> except those susceptible to an erratum that makes the IRPERF return
> bad values.
>
> That erratum occurs in Family 17h models 00-1fh [1], but not in F17h
> models 20h and above [2].
>
> AFTER (on a family 17h model 31h machine):
>
> $ sudo perf stat -e instructions,msr/irperf/ true
>
> Performance counter stats for 'true':
>
> 621,690 instructions
> # 0.00 stalled cycles per insn
> 622,490 msr/irperf/
>
> [1] "Revision Guide for AMD Family 17h Models 00h-0Fh Processors",
> currently available here:
>
> https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf
>
> [2] "Revision Guide for AMD Family 17h Models 30h-3Fh Processors",
> currently available here:
>
> https://developer.amd.com/wp-content/resources/56323-PUB_0.74.pdf

How stable are those links? Past experience shows not very.

Please upload those to a bugzilla.kernel.org entry and add that URL here
with a Link: tag.

> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Babu Moger <babu.moger@xxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> Cc: Frank van der Linden <fllinden@xxxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Huang Rui <ray.huang@xxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@xxxxxxx>
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> Cc: Jiaxun Yang <jiaxun.yang@xxxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Luwei Kang <luwei.kang@xxxxxxxxx>
> Cc: Martin LiÅka <mliska@xxxxxxx>
> Cc: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
> Cc: Michael Petlan <mpetlan@xxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@xxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Tom Lendacky <thomas.lendacky@xxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter")
> Signed-off-by: Kim Phillips <kim.phillips@xxxxxxx>
> ---
> RESEND, adding Michael Petlan to cc. Original v2:
>
> https://lore.kernel.org/lkml/20200121171232.28839-2-kim.phillips@xxxxxxx/
>
> v2: Based on Andi Kleen's review:
>
> https://lore.kernel.org/lkml/20200116040324.GI302770@xxxxxxxxxxxxxxxxxxxx/
>
> Add an amd_erratum_1054 and use cpu_has_bug infrastructure instead of
> open-coding the {family,model} check.
>
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/msr-index.h | 2 ++
> arch/x86/kernel/cpu/amd.c | 17 +++++++++++++++++
> 3 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index f3327cb56edf..1c1600e7476b 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -404,5 +404,6 @@
> #define X86_BUG_SWAPGS X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
> #define X86_BUG_TAA X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */
> #define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
> +#define X86_BUG_AMD_E1054 X86_BUG(24) /* CPU is among the affected by Erratum 1054 */

That is visible in /proc/cpuinfo and the string "amd_e1054" means
nothing. Call that

X86_BUG_IRPERF

or so to at least give some hint as to what the bug is.

>
> #endif /* _ASM_X86_CPUFEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index ebe1685e92dd..d5e517d1c3dd 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -512,6 +512,8 @@
> #define MSR_K7_HWCR 0xc0010015
> #define MSR_K7_HWCR_SMMLOCK_BIT 0
> #define MSR_K7_HWCR_SMMLOCK BIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT)
> +#define MSR_K7_HWCR_IRPERF_EN_BIT 30
> +#define MSR_K7_HWCR_IRPERF_EN BIT_ULL(MSR_K7_HWCR_IRPERF_EN_BIT)
> #define MSR_K7_FID_VID_CTL 0xc0010041
> #define MSR_K7_FID_VID_STATUS 0xc0010042
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 62c30279be77..c067234a271f 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -28,6 +28,7 @@
>
> static const int amd_erratum_383[];
> static const int amd_erratum_400[];
> +static const int amd_erratum_1054[];
> static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
>
> /*
> @@ -701,6 +702,9 @@ static void early_init_amd(struct cpuinfo_x86 *c)
> if (cpu_has_amd_erratum(c, amd_erratum_400))
> set_cpu_bug(c, X86_BUG_AMD_E400);
>
> + if (cpu_has_amd_erratum(c, amd_erratum_1054))
> + set_cpu_bug(c, X86_BUG_AMD_E1054);
> +
> early_detect_mem_encrypt(c);
>
> /* Re-enable TopologyExtensions if switched off by BIOS */
> @@ -978,6 +982,15 @@ static void init_amd(struct cpuinfo_x86 *c)
> /* AMD CPUs don't reset SS attributes on SYSRET, Xen does. */
> if (!cpu_has(c, X86_FEATURE_XENPV))
> set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
> +
> + /*
> + * Turn on the Instructions Retired free counter on machines not
> + * susceptible to erratum #1054 "Instructions Retired Performance
> + * Counter May Be Inaccurate"
.
^
|--- fullstop.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette