Re: [RFC v2-fix 1/1] x86/tdx: Handle in-kernel MMIO

From: Andi Kleen
Date: Tue May 18 2021 - 16:28:12 EST



On 5/18/2021 11:22 AM, Sean Christopherson wrote:
On Tue, May 18, 2021, Andi Kleen wrote:
The extra bytes for .altinstructions is very different than the extra bytes for
the code itself. The .altinstructions section is freed after init, so yes it
bloats the kernel size a bit, but the runtime footprint is unaffected by the
patching metadata.

IIRC, patching read/write{b,w,l,q}() can be done with 3 bytes of .text overhead.

The other option to explore is to hook/patch IO_COND(), which can be done with
neglible overhead because the helpers that use IO_COND() are not inlined. In a
TDX guest, redirecting IO_COND() to a paravirt helper would likely cover the
majority of IO/MMIO since virtio-pci exclusively uses the IO_COND() wrappers.
And if there are TDX VMMs that want to deploy virtio-mmio, hooking
drivers/virtio/virtio_mmio.c directly would be a viable option.
Yes but what's the point of all that?
Patching IO_COND() is relatively low effort. With some clever refactoring, I
suspect the net lines of code added would be less than 10. That seems like a
worthwhile effort to avoid millions of faults over the lifetime of the guest.

AFAIK IO_COND is only for iomap users. But most drivers don't even use iomap. virtio doesn't for example, and that's really the only case we currently care about.

Also millions of faults is nothing for a CPU.

The only case I can see it making sense is the virtio (and vmbus) door bells. Everything else should be slow path anyways.

But doing that now would be premature optimization and that's usually a bad idea. If it's a problem we can fix it later.



Even if it's only 3 bytes we still have a lot of MMIO all over the kernel
which never needs it.

And I don't even see what TDX (or SEV which already does the decoding and
has been merged) would get out of it. We handle all the #VEs just fine. And
the instruction handling code is fairly straight forward too.

Besides instruction decoding works fine for all the existing hypervisors.
All we really want to do is to do the same thing as KVM would do.
Heh, trust me, you don't want to do the same thing KVM does :-)

We want the same behavior.

Yes probably not the same code.


-Andi