Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO

From: Dave Hansen
Date: Fri Mar 11 2022 - 13:01:57 EST


On 3/11/22 09:18, Kirill A. Shutemov wrote:
> On Thu, Mar 10, 2022 at 09:53:01AM -0800, Dave Hansen wrote:
>> On 3/10/22 08:48, Kirill A. Shutemov wrote:
>> I think this is good enough:
>>
>> - All guest code is expected to be in TD-private memory. Being
>> private to the TD, VMMs have no way to access TD-private memory and
>> no way to read the instruction to decode and emulate it.
>
> Looks good.
>
> One remark: executing from shared memory (or walking page tables in shared
> memory) triggers #PF.

Good point. I thought that little nugget was AMD only. Thanks for the
reminder.

- TDX does not allow guests to execute from shared memory. All
executed instructions are in TD-private memory. Being private to
the TD, VMMs have no way to access TD-private memory and no way to
read the instruction to decode and emulate it.

...
>> I'd feel a lot better if this was slightly better specified. Even
>> booting with a:
>>
>> printf("rip: %lx\n", regs->rip);
>>
>> in the #VE handler would give some hard data about these. This still
>> feels to me like something that Sean got booting two years ago and
>> nobody has really reconsidered.
>
> Here the list I see on boot. It highly depends on QEMU setup. Any form of
> device filtering will cut the further.
>
> MMIO: ahci_enable_ahci
> MMIO: ahci_freeze
> MMIO: ahci_init_controller
> MMIO: ahci_port_resume
> MMIO: ahci_postreset
> MMIO: ahci_reset_controller
> MMIO: ahci_save_initial_config
> MMIO: ahci_scr_read
> MMIO: ahci_scr_write
> MMIO: ahci_set_em_messages
> MMIO: ahci_start_engine
> MMIO: ahci_stop_engine
> MMIO: ahci_thaw

OK, so this is one of the random drivers that will probably be replaced
with virtio in practice in real TD guests.

> MMIO: ioapic_set_affinity
> MMIO: ioread16
> MMIO: ioread32
> MMIO: ioread8
> MMIO: iowrite16
> MMIO: iowrite32
> MMIO: iowrite8
> MMIO: mask_ioapic_irq
> MMIO: mp_irqdomain_activate
> MMIO: mp_irqdomain_deactivate
> MMIO: native_io_apic_read
> MMIO: __pci_enable_msix_range
> MMIO: pci_mmcfg_read
> MMIO: pci_msi_mask_irq
> MMIO: pci_msi_unmask_irq
> MMIO: __pci_write_msi_msg
> MMIO: restore_ioapic_entries
> MMIO: startup_ioapic_irq
> MMIO: update_no_reboot_bit_mem
>
> ioread*/iowrite* comes from virtio.

I think we *actually* have some real facts to go on now instead of just
random hand waving about unknown MMIO in the the depths of arch/x86.
Thanks for doing that.

Just spot-checking these, I think all of these end up going through some
->ops function pointers:

irq_chip
x86_apic_ops
pci_raw_ops

That doesn't really help your case, though. Presumably, with some
amount of work, we could paravirtualize these users. The list of things
doesn't seem very large at all. But, each of those things will need
TDX-specific code.

So, do we go patching those things with TDX-specific code? Or do we
just do this one, universal, slow #VE thing for now?

Kirill, I know what camp you are in. :)

Looking at the *actual* work that would be required, I'm a little more
on the fence and leaning toward being ok with the universal #VE.

Does anybody feel differently?