Re: [PATCH v1 1/3] x86/tdx: Check for TDX partitioning during early TDX init

From: Huang, Kai
Date: Fri Dec 08 2023 - 05:51:36 EST


On Thu, 2023-12-07 at 20:35 +0100, Jeremi Piotrowski wrote:
> On 07/12/2023 18:21, Jeremi Piotrowski wrote:
> > On 07/12/2023 13:58, Huang, Kai wrote:
> > > >
> > > > That's how it currently works - all the enlightenments are in hypervisor/paravisor
> > > > specific code in arch/x86/hyperv and drivers/hv and the vm is not marked with
> > > > X86_FEATURE_TDX_GUEST.
> > >
> > > And I believe there's a reason that the VM is not marked as TDX guest.
> > Yes, as Elena said:
> > """
> > OK, so in your case it is a decision of L1 VMM not to set the TDX_CPUID_LEAF_ID
> > to reflect that it is a tdx guest and it is on purpose because you want to
> > drop into a special tdx guest, i.e. partitioned guest.
> > """
> > TDX does not provide a means to let the partitioned guest know that it needs to
> > cooperate with the paravisor (e.g. because TDVMCALLs are routed to L0) so this is
> > exposed in a paravisor specific way (cpuids in patch 1).
> >
> > >
> > > >
> > > > But without X86_FEATURE_TDX_GUEST userspace has no unified way to discover that an
> > > > environment is protected by TDX and also the VM gets classified as "AMD SEV" in dmesg.
> > > > This is due to CC_ATTR_GUEST_MEM_ENCRYPT being set but X86_FEATURE_TDX_GUEST not.
> > >
> > > Can you provide more information about what does _userspace_ do here?
> >
> > I gave one usecase in a different email. A workload scheduler like Kubernetes might want to
> > place a workload in a confidential environment, and needs a way to determine that a VM is
> > TDX protected (or SNP protected) to make that placement decision.
> >
> > >
> > > What's the difference if it sees a TDX guest or a normal non-coco guest in
> > > /proc/cpuinfo?
> > >
> > > Looks the whole purpose of this series is to make userspace happy by advertising
> > > TDX guest to /proc/cpuinfo. But if we do that we will have bad side-effect in
> > > the kernel so that we need to do things in your patch 2/3.
> > >
> >
> > Yes, exactly. It's unifying the two approaches so that userspace doesn't have to
> > care.
> >
> > > That doesn't seem very convincing.
> >
> > Why not?
> > The whole point of the kernel is to provide a unified interface to userspace and
> > abstract away these small differences. 
> >

Agree it's better.

> > Yes it requires some kernel code to do,
> > thats not a reason to force every userspace to implement its own logic. This is
> > what the flags in /proc/cpuinfo are for.
> >

Agree /proc/cpuinfo _can_ be one choice, but I am not sure whether it is the
best choice (for your case).

>
> So I feel like we're finally getting to the gist of the disagreements in this thread.
>
> Here's something I think we should all agree on (both a) and b)). X86_FEATURE_TDX_GUEST:
> a) is visible to userspace and not just some kernel-only construct
> b) means "this is a guest running in an Intel TDX Trust Domain, and said guest is aware
> of TDX"
>
> a) is obvious but I think needs restating. b) is what userspace expects, and excludes legacy
> (/unmodified) guests running in a TD. That's a reasonable definition.

I don't understand b). Firstly, a guest with X86_FEATURE_TDX_GUEST can just be
a TDX guest. How can we distinguish a normal TDX guest vs an enlightened guest
inside TDX hypervisor/paravisor (L2)?

From userspace ABI's point of view, a "TDX guest" flag in /proc/cpuinfo just
means it is a TDX guest, nothing more.

Maybe in your case, the userspace only cares about L2, but what if in other
cases userspace wants to do different things for the above two cases?

The point is, from userspace ABI's view, I think each individual feature the
kernel provides should have its own ABI (if userspace needs that).

Your case is more like a normal L2 VM running inside TDX L1 hypervisor/paravisor
+ some L1 hypervisor/paravisor specific enlightenments. Therefore IMHO it's
more reasonable to introduce some L1 hypervisor/paravisor specific entries as
userspace ABI.

For example, can we have something below in /sysfs (again, just example)?

# cat /sys/kernel/coco/hyperv/isolation_type
# tdx
# cat /sys/kernel/coco/hyperv/paravisor
# true

Then your userspace can certainly do the desired things if it observes above.

I am not familiar with the SEV/SEV-ES/SEV-SNP guest, e.g., how they expose flag
in /proc/cpuinfo or whether there's any other /sysfs entries, but the above
should cover SEV* too I guess (e.g., the isolation_type can just return "sev-
snp").

>
> For kernel only checks we can rely on platform-specific CC_ATTRS checked through
> intel_cc_platform_has.

IMHO CC_ATTR_TDX_MODULE_CALLS actually doesn't make a lot sense, because "TDX
module call" obviously is TDX specific. It doesn't make a lot sense to abuse
adding vendor specific CC attributes unless there's no better way.  

But this is just my opinion that CC attributes should be something generic if
possible.

>
> @Borislav: does that sound reasonable to you?
> @Kai, @Kirill, @Elena: can I get you to agree with this compromise, for userspace' sake?
>

As said above, I totally agree ideally the kernel should provide some unified
way to allow userspace query whether it's a confidential VM or not.  

Sure /proc/cpuinfo can certainly be one of the ABI, but my concern is somehow I
feel you just want to use it for your convenience at the cost of adding
something arguable to it, and what's worse, then needing to hack the kernel to
workaround the problems that can be avoided at the first place.