Re: [PATCH v3 00/21] TDX host kernel support

From: Kai Huang
Date: Mon May 02 2022 - 06:18:48 EST


On Fri, 2022-04-29 at 11:34 -0700, Dave Hansen wrote:
> On 4/29/22 10:48, Dan Williams wrote:
> > > But, neither of those really help with, say, a device-DAX mapping of
> > > TDX-*IN*capable memory handed to KVM. The "new syscall" would just
> > > throw up its hands and leave users with the same result: TDX can't be
> > > used. The new sysfs ABI for NUMA nodes wouldn't clearly apply to
> > > device-DAX because they don't respect the NUMA policy ABI.
> > They do have "target_node" attributes to associate node specific
> > metadata, and could certainly express target_node capabilities in its
> > own ABI. Then it's just a matter of making pfn_to_nid() do the right
> > thing so KVM kernel side can validate the capabilities of all inbound
> > pfns.
>
> Let's walk through how this would work with today's kernel on tomorrow's
> hardware, without KVM validating PFNs:
>
> 1. daxaddr mmap("/dev/dax1234")
> 2. kvmfd = open("/dev/kvm")
> 3. ioctl(KVM_SET_USER_MEMORY_REGION, { daxaddr };
> 4. guest starts running
> 5. guest touches 'daxaddr'
> 6. Page fault handler maps 'daxaddr'
> 7. KVM finds new 'daxaddr' PTE
> 8. TDX code tries to add physical address to Secure-EPT
> 9. TDX "SEAMCALL" fails because page is not convertible
> 10. Guest dies
>
> All we can do to improve on that is call something that pledges to only
> map convertible memory at 'daxaddr'. We can't *actually* validate the
> physical addresses at mmap() time or even
> KVM_SET_USER_MEMORY_REGION-time because the memory might not have been
> allocated.
>
> Those pledges are hard for anonymous memory though. To fulfill the
> pledge, we not only have to validate that the NUMA policy is compatible
> at KVM_SET_USER_MEMORY_REGION, we also need to decline changes to the
> policy that might undermine the pledge.

Hi Dave,

There's another series done by Chao "KVM: mm: fd-based approach for supporting
KVM guest private memory" which essentially allows KVM to ask guest memory
backend to allocate page w/o having to mmap() to userspace. Please see my reply
below:

https://lore.kernel.org/lkml/cover.1649219184.git.kai.huang@xxxxxxxxx/T/#mf9bf10a63eaaf0968c46ab33bdaf06bd2cfdd948

My understanding is for TDX guest KVM will be enforced to use the new mechanism.
So when TDX supports NVDIMM in the future, dax can be extended to support the
new mechanism to support using it as TD guest backend.

Sean, Paolo, Isaku, Chao,

Please correct me if I am wrong?

--
Thanks,
-Kai