Re: [PATCH v2 1/3] KVM: x86: Allow CPU to force vendor-specific TDP level

From: Yu Zhang
Date: Tue Aug 10 2021 - 03:40:48 EST


On Mon, Aug 09, 2021 at 03:30:08PM +0000, Sean Christopherson wrote:
> On Mon, Aug 09, 2021, Yu Zhang wrote:
> > On Sun, Aug 08, 2021 at 11:33:44PM -0500, Wei Huang wrote:
> > >
> > > On 8/8/21 11:27 PM, Yu Zhang wrote:
> > > > On Sun, Aug 08, 2021 at 11:11:40PM -0500, Wei Huang wrote:
> > > > >
> > > > >
> > > > > On 8/8/21 10:58 PM, Yu Zhang wrote:
> > > > > > On Sun, Aug 08, 2021 at 02:26:56PM -0500, Wei Huang wrote:
> > > > > > > AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
> > > > > >
> > > > > > Sorry, but why? NPT is not indexed by HVA.
> > > > >
> > > > > NPT is not indexed by HVA - it is always indexed by GPA. What I meant is NPT
> > > > > page table level has to be the same as the host OS page table: if 5-level
> > > > > page table is enabled in host OS (CR4.LA57=1), guest NPT has to 5-level too.
> > > >
> > > > I know what you meant. But may I ask why?
> > >
> > > I don't have a good answer for it. From what I know, VMCB doesn't have a
> > > field to indicate guest page table level. As a result, hardware relies on
> > > host CR4 to infer NPT level.
> >
> > I guess you mean not even in the N_CR3 field of VMCB?
>
> Correct, nCR3 is a basically a pure representation of a regular CR3.
>
> > Then it's not a broken design - it's a limitation of SVM. :)
>
> That's just a polite way of saying it's a broken design ;-)
>
> Joking aside, NPT opted for a semblance of backwards compatibility at the cost of
> having to carry all the baggage that comes with a legacy design. Keeping the core
> functionality from IA32 paging presumably miminizes design and hardware costs, and
> required minimal enabling in hypervisors. The downside is that it's less flexible
> than EPT and has a few warts, e.g. shadowing NPT is gross because the host can't
> easily mirror L1's desired paging mode.

Thanks for your explaination, Sean. Everything has a cost, and now it's time to pay
the price. :)

As to the max level, it is calculated in kvm_init(). Though I do not see any chance
that host paging mode will be changed after kvm_init(), or any case that Linux uses
different paging levels in different pCPUs, I am wondering, should we do something,
e.g., to document this as an open?

About "host can't easily mirror L1's desired paging mode", could you please elaborate?
Thanks!

B.R.
Yu