Re: [PATCH] x86/tdx: Generate SIGBUS on userspace MMIO

From: Dave Hansen
Date: Mon Jun 10 2024 - 10:00:47 EST


On 5/28/24 03:09, Kirill A. Shutemov wrote:
> Currently, attempting to perform MMIO from userspace in a TDX guest
> leads to a warning about an unexpected #VE and SIGSEGV being delivered
> to the process.

Does it _always_ result in a #VE? Or is this only when guests mmap()
something like from a driver and the host doesn't back the shared memory?

> Enlightened userspace may choose to handle MMIO on their own if the
> kernel does not emulate it.
>
> Handle the EPT_VIOLATION exit reason for userspace and deliver SIGBUS
> instead of SIGSEGV. SIGBUS is more appropriate for the MMIO situation.

Is any userspace _actually_ doing this? Sure, SIGBUS is more
appropriate but in practice unprepared userspace crashes either way.

> @@ -641,17 +647,20 @@ static int virt_exception_user(struct pt_regs *regs, struct ve_info *ve)
> switch (ve->exit_reason) {
> case EXIT_REASON_CPUID:
> return handle_cpuid(regs, ve);
> + case EXIT_REASON_EPT_VIOLATION:
> + if (is_private_gpa(ve->gpa))
> + panic("Unexpected EPT-violation on private memory.");
> +
> + force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)ve->gla);
> +
> + /* Return 0 to avoid incrementing RIP */
> + return 0;

This _really_ needs a comment, probably even a helper function where you
can actually explain what is going on.

I could barely remember what this is for today. There's no hope for me
in a couple of years.

Just thinking through the possibilities here:

Private=> Private : no #VE
Private=> Anything else : fatal shutdown
Shared => Shared : no #VE
Shared => Private : #VE (end up here)
Shared => !Present : #VE (end up here)

So I think you're trying to differentiate between the last 2 cases.
"Shared => !Present" is the normal case where today the VM wants to
generate a VMEXIT. We'll probably get these from setups where somebody
is trying to do good ol' device emulation but in TDX.

"Shared => Private" is an actual kernel bug. Why panic() though? Do we
*know* the system is unstable at this point? Why not just dump an
error, send a fatal signal, and move on?