Re: [PATCH] KVM: x86: Fix the initial value of mcg_cap

From: Tony Luck
Date: Tue Oct 25 2022 - 12:22:19 EST


On Mon, Oct 24, 2022 at 09:37:59AM +0800, Xiaoyao Li wrote:
> On 10/22/2022 2:35 AM, Sean Christopherson wrote:
> > On Fri, Oct 21, 2022, Xiaoyao Li wrote:
> > > On 10/21/2022 12:32 AM, Sean Christopherson wrote:
> > > > If we really want to clean up this code, I think the correct approach would be to
> > > > inject #GP on all relevant MSRs if CPUID.MCA==0, e.g.
> > >
> > > It's what I thought of as well. But I didn't find any statement in SDM of
> > > "Accessing Machine Check MSRs gets #GP if no CPUID.MCA"
> >
> > Ugh, stupid SDM. Really old SDMs, e.g. circa 1997, explicity state in the
> > CPUID.MCA entry that:
> >
> > Processor supports the MCG_CAP MSR.
> >
> > But, when Intel introduced the "Architectural MSRs" section (2001 or so), the
> > wording was changed to be less explicit:
> >
> > The Machine Check Architecture, which provides a compatible mechanism for error
> > reporting in P6 family, Pentium 4, and Intel Xeon processors, and future processors,
> > is supported. The MCG_CAP MSR contains feature bits describing how many banks of
> > error reporting MSRs are supported.
> >
> > and the entry in the MSR index just lists P6 as the dependency:
> >
> > IA32_MCG_CAP (MCG_CAP) Global Machine Check Capability (R/O) 06_01H
> >
> > So I think it's technically true that MCG_CAP is supposed to exist iff CPUID.MCA=1,
> > but we'd probably need an SDM change to really be able to enforce that :-(
>
> I'll talk to Intel architects for this. :)

[I'm not a h/w architect ... but I do write/support the Linux machine
check code]

Current edition of the SDM describes the MCA bit in CPUID(EAX=1).EDX in
volume 2, Table 3-11:

Machine Check Architecture. A value of 1 indicates the Machine Check
Architecture of reporting machine errors is supported. The MCG_CAP MSR
contains feature bits describing how many banks of error reporting MSRs
are supported

So a value of 0 would mean Machine check architecture is NOT supported.

The only rationale meaning for "Machine check architecture is supported"
is you get everything in Vol3B chapter 15 if MCA is supported, and you
don't get it if it isn't. The unsupported behaviour is not explicitly
defined ... so if you want the do something other than #GP, you could do
so ... but that sounds like s silly choice.

Ditto for accessing a machine check bank with number greater than that
specified in IA32_MCG_CAP.count. SDM doesn't say that this must #GP,
but #GP would be a sane and reasonble response. You could also read as
all zero and drop writes.

-Tony