The extra bytes for .altinstructions is very different than the extra bytes for
the code itself. The .altinstructions section is freed after init, so yes it
bloats the kernel size a bit, but the runtime footprint is unaffected by the
patching metadata.
IIRC, patching read/write{b,w,l,q}() can be done with 3 bytes of .text overhead.
The other option to explore is to hook/patch IO_COND(), which can be done with
neglible overhead because the helpers that use IO_COND() are not inlined. In a
TDX guest, redirecting IO_COND() to a paravirt helper would likely cover the
majority of IO/MMIO since virtio-pci exclusively uses the IO_COND() wrappers.
And if there are TDX VMMs that want to deploy virtio-mmio, hooking
drivers/virtio/virtio_mmio.c directly would be a viable option.