Re: [ANNOUNCE] PUCK Notes - 2024.04.03 - TDX Upstreaming Strategy

From: Huang, Kai
Date: Thu Apr 11 2024 - 09:46:28 EST


On Wed, 2024-04-10 at 20:46 -0700, Isaku Yamahata wrote:
> On Wed, Apr 10, 2024 at 06:03:52PM -0700,
> Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote:
>
> > On Wed, Apr 10, 2024 at 02:03:26PM +0000,
> > "Huang, Kai" <kai.huang@xxxxxxxxx> wrote:
> >
> > > On Tue, 2024-04-09 at 18:12 -0700, Isaku Yamahata wrote:
> > > > On Mon, Apr 08, 2024 at 06:51:40PM +0000,
> > > > Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > >
> > > > > On Mon, Apr 08, 2024, Edgecombe, Rick P wrote:
> > > > > > On Mon, 2024-04-08 at 09:20 -0700, Sean Christopherson wrote:
> > > > > > > > Another option is that, KVM doesn't allow userspace to configure
> > > > > > > > CPUID(0x8000_0008).EAX[7:0]. Instead, it provides a gpaw field in struct
> > > > > > > > kvm_tdx_init_vm for userspace to configure directly.
> > > > > > > >
> > > > > > > > What do you prefer?
> > > > > > >
> > > > > > > Hmm, neither.  I think the best approach is to build on Gerd's series to have KVM
> > > > > > > select 4-level vs. 5-level based on the enumerated guest.MAXPHYADDR, not on
> > > > > > > host.MAXPHYADDR.
> > > > > >
> > > > > > So then GPAW would be coded to basically best fit the supported guest.MAXPHYADDR within KVM. QEMU
> > > > > > could look at the supported guest.MAXPHYADDR and use matching logic to determine GPAW.
> > > > >
> > > > > Off topic, any chance I can bribe/convince you to wrap your email replies closer
> > > > > to 80 chars, not 100? Yeah, checkpath no longer complains when code exceeds 80
> > > > > chars, but my brain is so well trained for 80 that it actually slows me down a
> > > > > bit when reading mails that are wrapped at 100 chars.
> > > > >
> > > > > > Or are you suggesting that KVM should look at the value of CPUID(0X8000_0008).eax[23:16] passed from
> > > > > > userspace?
> > > > >
> > > > > This. Note, my pseudo-patch incorrectly looked at bits 15:8, that was just me
> > > > > trying to go off memory.
> > > > >
> > > > > > I'm not following the code examples involving struct kvm_vcpu. Since TDX
> > > > > > configures these at a VM level, there isn't a vcpu.
> > > > >
> > > > > Ah, I take it GPAW is a VM-scope knob? I forget where we ended up with the ordering
> > > > > of TDX commands vs. creating vCPUs. Does KVM allow creating vCPU structures in
> > > > > advance of the TDX INIT call? If so, the least awful solution might be to use
> > > > > vCPU0's CPUID.
> > > >
> > > > The current order is, KVM vm creation (KVM_CREATE_VM),
> > > > KVM vcpu creation(KVM_CREATE_VCPU), TDX VM initialization (KVM_TDX_INIT_VM).
> > > > and TDX VCPU initialization(KVM_TDX_INIT_VCPU).
> > > > We can call KVM_SET_CPUID2 before KVM_TDX_INIT_VM. We can remove cpuid part
> > > > from struct kvm_tdx_init_vm by vcpu0 cpuid.
> > >
> > > What's the reason to call KVM_TDX_INIT_VM after KVM_CREATE_VCPU?
> >
> > The KVM_TDX_INIT_VM (it requires cpuids) doesn't requires any order between two,
> > KVM_TDX_INIT_VM and KVM_CREATE_VCPU. We can call KVM_TDX_INIT_VM before or
> > after KVM_CREATE_VCPU because there is no limitation between two.
> >
> > The v5 TDX QEMU happens to call KVM_CREATE_VCPU and then KVM_TDX_INIT_VM
> > because it creates CPUIDs for KVM_TDX_INIT_VM from qemu vCPU structures after
> > KVM_GET_CPUID2. Which is after KVM_CREATE_VCPU.
>
> Sorry, let me correct it. QEMU creates QEMU's vCPU struct with its CPUIDs.
> KVM_TDX_INIT_VM, KVM_CREATE_VCPU, and KVM_SET_CPUID2. QEMU passes CPUIDs as is
> to KVM_SET_CPUID2.
>
> The v19 KVM_TDX_INIT_VM checks if the KVM vCPU is not created yet. But it's can
> be relaxed.

OK. So in current implementation KVM_TDX_INIT_VM must be done before
KVM_CREATE_VCPU.

Which seems more reasonable conceptually, IMHO.

Doing KVM_TDX_INIT_VM after KVM_CREATE_VCPU (well KVM_SET_CPUID2 actually) looks
more like a workaround, and it isn't sufficient actually, if we want
KVM_TDX_INIT_VM to use vCPU0's CPUID.

For that we need to explicitly say:

KVM_TDX_INIT_VM must be called after KVM_SET_CPUID2 is called for vCPU0.

Which is kinda odd because AFAICT userspace should be able to legally do
KVM_SET_CPUID2 in random order all vCPUs.

In other words, making KVM_TDX_INIT_VM use vCPU0's CPUID looks more like a
workaround, and it is a little bit odd.

When something looks odd, it's error-prone.

A less odd way is to make KVM_TDX_INIT_VM must be called after KVM_SET_CPUID2 is
called for _ALL_ vCPUs. It's _looks_ better, but in practice I am not sure KVM
is able to do such check.

Also, because CPUID is just one part of the KVM_TDX_INITT_VM, to me we also need
ask questions like whether it should be done after other vCPU IOCTL()s (e.g.,
KVM_SET_MSR etc) because there are more than 1 IOCTLs() to initialize vCPU from
userspace.

Again, although probably all of above issues are theoretical thing, to me looks
it's better to just make KVM_TDX_INIT_VM must be called before creating any
vCPU.

For CPUID(0x8000_0008), we can request KVM_TDX_INIT_VM to pass it, and KVM can
get GPAW/guest.MAXPHYADDR from it. KVM will need to convert this CPUID entry to
what TDH.MNG.INIT can accept, but this is what KVM needs to do anyway even it
uses vCPU0's CPUID.

Sean, any feedback?