Re: [PATCH v8 0/8] x86: Show in sysfs if a memory node is able to do encryption

From: Dave Hansen
Date: Mon May 09 2022 - 14:47:49 EST


... adding some KVM/TDX folks

On 5/6/22 12:02, Boris Petkov wrote:
>> This node attribute punts the problem back out to userspace. It
>> gives userspace the ability to steer allocations to compatible NUMA
>> nodes. If something goes wrong, they can use other NUMA ABIs to
>> inspect the situation, like /proc/$pid/numa_maps.
> That's all fine and dandy but I still don't see the *actual*,
> real-life use case of why something would request memory of
> particular encryption capabilities. Don't get me wrong - I'm not
> saying there are not such use cases - I'm saying we should go all the
> way and fully define properly *why* we're doing this whole hoopla.

Let's say TDX is running on a system with mixed encryption
capabilities*. Some NUMA nodes support TDX and some don't. If that
happens, your guest RAM can come from anywhere. When the host kernel
calls into the TDX module to add pages to the guest (via
TDH.MEM.PAGE.ADD) it might get an error back from the TDX module. At
that point, the host kernel is stuck. It's got a partially created
guest and no recourse to fix the error.

This new ABI provides a way to avoid that situation in the first place.
Userspace can look at sysfs to figure out which NUMA nodes support
"encryption" (aka. TDX) and can use the existing NUMA policy ABI to
avoid TDH.MEM.PAGE.ADD failures.

So, here's the question for the TDX folks: are these mixed-capability
systems a problem for you? Does this ABI help you fix the problem?
Will your userspace (qemu and friends) actually use consume from this ABI?

* There are three ways we might hit a system with this issue:
1. NVDIMMs that don't support TDX, like lack of memory integrity
protection.
2. CXL-attached memory controllers that can't do encryption at all
3. Nominally TDX-compatible memory that was not covered/converted by
the kernel for some reason (memory hot-add, or ran out of TDMR
resources)