Re: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0

From: Sean Christopherson
Date: Wed Apr 21 2021 - 21:39:07 EST


On Thu, Apr 22, 2021, Like Xu wrote:
> On 2021/4/21 23:30, Sean Christopherson wrote:
> > On Wed, Apr 21, 2021, Like Xu wrote:
> > > The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
> > > When ARCH_LBR we don't set lbr_tos, the failure from the
> > > check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
> > > thereby preventing the initialization of the guest LBR.
> > >
> > > Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
> > > Signed-off-by: Like Xu <like.xu@xxxxxxxxxxxxxxx>
> > > Reviewed-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> > > ---
> > > arch/x86/events/intel/core.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index 5272f349dca2..5036496caa60 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
> > > u64 val_old, val_new, val_tmp;
> > > /*
> > > - * Disable the check for real HW, so we don't
> > > + * Disable the check for real HW or non-sense msr, so we don't
> >
> > I think this should be "undefined MSR" or something along those lines. MSR 0x0
> > is a "real" MSR, on Intel CPUs it's an alias for IA32_MC0_ADDR; at least it's
> > supposed to be, most/all Intel CPUs incorrectly alias it to IA32_MC0_CTL.
>
> Thank you, Sean.
>
> <idle>-0 [000] dN.. 38980.032347: read_msr: 0, value fff
>
> Do we have a historic story or specification for this kind of alias ?

It's kinda documented in the SDM under "2.1 ARCHITECTURAL MSRS"

0H 0 IA32_P5_MC_ADDR (P5_MC_ADDR) Pentium Processor (05_01H)
1H 1 IA32_P5_MC_TYPE (P5_MC_TYPE) DF_DM = 05_01H

The history is that very early machine check support only had a single "bank",
with MSR 0x0 holding the address and MSR 0x1 holding the type. When the MSRs were
relocated to the 0x400 range, presumably to have room to grow the list, the MSRs
were aliased to maintain backwards compatibility (again, an assumption).

Unfortunately, that backwards compatibility apparently didn't get tested, and MSR
0x0 ended up aliased to 0x400 instead of 0x402.

The only reason I'm aware of all this because SGX is soft disabled by ucode if
any of the machine check banks are disabled by writing MCn_CTL. Some folks found
out the hard way way doing WRMSR with an uninitialized index, i.e. WRMSR(0),
would disable SGX.

If you want a good giggle, you can verify on pretty much any Intel silicon:

$ rdmsr 0x400
ff
$ wrmsr 0x0 0
$ rdmsr 0x400
0

> #define MSR_IA32_MC0_ADDR 0x00000402
> #define MSR_IA32_MC0_CTL 0x00000400