Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO

From: Kirill A. Shutemov
Date: Thu Mar 10 2022 - 11:48:59 EST


On Wed, Mar 09, 2022 at 05:06:11PM -0800, Dave Hansen wrote:
> On 3/9/22 16:51, Kirill A. Shutemov wrote:
> > On Tue, Mar 08, 2022 at 01:26:28PM -0800, Dave Hansen wrote:
> >> Memory encryption has zero to do with this. The TDX isolation
> >> mechanisms are totally discrete from memory encryption, although they
> >> are "neighbors" of sorts.
> >
> > Hm. I don't see why you say encryption is not relevant. VMM (host kernel)
> > has ultimate access to guest memory cypher text. It can read it as
> > cypher text without any issue (using KeyID-0).
> >
> > Could you elaborate on the point?
>
> I think you're just confusing what TDX has with MKTME. The whitepaper says:
>
> > The TD-bit associated with the line in memory seeks to
> > detect software or devices attempting to read memory
> > encrypted with private KeyID, using a shared KeyID, to reveal
> > the ciphertext. On such accesses, the MKTME returns a fixed
> > pattern to prevent ciphertext analysis.
>
> I think several firstborn were sacrificed to get that bit. Let's not
> forget why we have it. :)

Okay, I missed the memo. I will drop reference to encryption:

- The CPU disallows software other than the TDX module and TDs from
making memory accesses using the private key. Without the correct
key VMM has no way to access TD-private memory.

> >>> Rather than touching the entire kernel, it might also be possible to
> >>> just go after drivers that use MMIO in TDX guests. Right now, that's
> >>> limited only to virtio and some x86-specific drivers.
> >>>
> >>> All virtio MMIO appears to be done through a single function, which
> >>> makes virtio eminently easy to patch.
> >>>
> >>> This approach will be adopted in the future, removing the bulk of
> >>> MMIO #VEs. The #VE-based MMIO will remain serving non-virtio use cases.
> >>
> >> This still doesn't *quite* do it for me for a justification. Why can't
> >> the non-virtio cases be converted as well? Why doesn't the "patching
> >> MMIO sites" work for x86 code too?
> >>
> >> You really need to convince us that *this* approach will be required
> >> forever.
> >
> > What if I add:
> >
> > Many drivers can potentially be used inside TDX guest (e.g. via device
> > passthough or random device emulation by VMM), but very few will.
> > Patching every possible driver is not practical. #VE-based MMIO provides
> > functionality for everybody. Performance-critical cases can be optimized
> > as needed.
>
> This problem was laid out as having three cases:
> 1. virtio
> 2. x86-specific drivers
> 3. random drivers (everything else)
>
> #1 could be done with paravirt
> #2 is unspecified and unknown
> #3 use doesn't as far as I know exist in TDX guests today

#2 doesn't matter from performance point of view and there is no
convenient place where they can be intercepted as they are scattered
across the tree. Patching them doesn't bring any benefit, only pain.

#3 some customers already declared that they will use device passthough
(yes, it is not safe). CSP may want to emulate random device, depending on
setup. Like, a power button or something.

> > BUG() here makes it clear that the handler itself is buggy. Returning
> > false and kicking in #GP-like logic indicates that something wrong with
> > the code that triggered #VE. I think it is an important distinction.
>
> OK, then how about a WARN_ON() which is followed by the #GP?

You folks give mixed messages. Thomas was very unhappy when I tried to add
code that recovers from WBINVD:

https://lore.kernel.org/all/87y22uujkm.ffs@tglx

It is exactly the same scenario: kernel code is buggy and has to be fixed.

So, what the policy?

> Let's say insn_decode_mmio() does something insane like:
>
> return -EINVAL;
>
> Should we really be killing the kernel for that?

Note that #GP is most likely kill kernel as well. We handle in-kernel
MMIO. There are no many chances for recover.

Is it really the big deal?

--
Kirill A. Shutemov