Re: [PATCHv5 03/30] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions

From: Kirill A. Shutemov
Date: Thu Mar 10 2022 - 16:20:54 EST


On Thu, Mar 10, 2022 at 04:30:57PM +0100, Borislav Petkov wrote:
> On Wed, Mar 02, 2022 at 05:27:39PM +0300, Kirill A. Shutemov wrote:
> > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> >
> > Guests communicate with VMMs with hypercalls. Historically, these
> > are implemented using instructions that are known to cause VMEXITs
> > like VMCALL, VMLAUNCH, etc. However, with TDX, VMEXITs no longer
> > expose the guest state to the host. This prevents the old hypercall
> > mechanisms from working. So, to communicate with VMM, TDX
> > specification defines a new instruction called TDCALL.
> >
> > In a TDX based VM, since the VMM is an untrusted entity, an intermediary
> > layer -- TDX module -- facilitates secure communication between the host
> > and the guest. TDX module is loaded like a firmware into a special CPU
> > mode called SEAM. TDX guests communicate with the TDX module using the
> > TDCALL instruction.
> >
> > A guest uses TDCALL to communicate with both the TDX module and VMM.
> > The value of the RAX register when executing the TDCALL instruction is
> > used to determine the TDCALL type. A variant of TDCALL used to communicate
> > with the VMM is called TDVMCALL.
> >
> > Add generic interfaces to communicate with the TDX module and VMM
> > (using the TDCALL instruction).
> >
> > __tdx_hypercall() - Used by the guest to request services from the
> > VMM (via TDVMCALL).
> > __tdx_module_call() - Used to communicate with the TDX module (via
> > TDCALL).
>
> Ok, you need to fix this: this sounds to me like there are two insns:
> TDCALL and TDVMCALL. But there's only TDCALL.
>
> And I'm not even clear on how the differentiation is done - I guess
> with %r11 which contains the VMCALL subfunction number in the
> __tdx_hypercall() case but I'm not sure.

TDVMCALL is a leaf of TDCALL. The leaf number is encoded in RAX: RAX==0 is
TDVMCALL.

I'm not completely sure what has to be fixed. Make it clear that TDVMCALL
is leaf of TDCALL? Something like this:

__tdx_module_call() - Used to communicate with the TDX module (via
TDCALL instruction).
__tdx_hypercall() - Used by the guest to request services from the
VMM (via TDVMCALL leaf of TDCALL).

?

> And when explaining this, pls put it in the comment over the function so
> that it is clear how the distinction is made.

But it's already there:

/*
* __tdx_module_call() - Used by TDX guests to request services from
* the TDX module (does not include VMM services).
*
* Transforms function call register arguments into the TDCALL
* register ABI. After TDCALL operation, TDX module output is saved
* in @out (if it is provided by the user)
*
...
*/

and

/*
* __tdx_hypercall() - Make hypercalls to a TDX VMM.
*
* Transforms values in function call argument struct tdx_hypercall_args @args
* into the TDCALL register ABI. After TDCALL operation, VMM output is saved
* back in @args.
*
*-------------------------------------------------------------------------
* TD VMCALL ABI:
*-------------------------------------------------------------------------
*
* Input Registers:
*
* RAX - TDCALL instruction leaf number (0 - TDG.VP.VMCALL)
.....
*/

Hm?

> > Also define an additional wrapper _tdx_hypercall(), which adds error
> > handling support for the TDCALL failure.
> >
> > The __tdx_module_call() and __tdx_hypercall() helper functions are
> > implemented in assembly in a .S file. The TDCALL ABI requires
> > shuffling arguments in and out of registers, which proved to be
> > awkward with inline assembly.
> >
> > Just like syscalls, not all TDVMCALL use cases need to use the same
> > number of argument registers. The implementation here picks the current
> > worst-case scenario for TDCALL (4 registers). For TDCALLs with fewer
> > than 4 arguments, there will end up being a few superfluous (cheap)
> > instructions. But, this approach maximizes code reuse.
> >
> > For registers used by the TDCALL instruction, please check TDX GHCI
> > specification, the section titled "TDCALL instruction" and "TDG.VP.VMCALL
> > Interface".
> >
> > Based on previous patch by Sean Christopherson.
> >
> > Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
> > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > ---
> > arch/x86/coco/Makefile | 2 +-
> > arch/x86/coco/tdcall.S | 188 ++++++++++++++++++++++++++++++++++
> > arch/x86/coco/tdx.c | 18 ++++
>
> Those should be
>
> arch/x86/coco/tdx/tdcall.S
> arch/x86/coco/tdx/tdx.c
>
> like we said:
>
> "- confidential computing guest stuff: arch/x86/coco/{sev,tdx}"

Okay, will change. But it is not what we agreed about before:

https://lore.kernel.org/all/Yg5q742GsjCRHXZL@xxxxxxx

> > arch/x86/include/asm/tdx.h | 27 +++++
> > arch/x86/kernel/asm-offsets.c | 10 ++
> > 5 files changed, 244 insertions(+), 1 deletion(-)
> > create mode 100644 arch/x86/coco/tdcall.S
>
> ...
>
> > +SYM_FUNC_START(__tdx_hypercall)
> > + FRAME_BEGIN
> > +
> > + /* Save callee-saved GPRs as mandated by the x86_64 ABI */
> > + push %r15
> > + push %r14
> > + push %r13
> > + push %r12
> > +
> > + /* Mangle function call ABI into TDCALL ABI: */
> > + /* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
> > + xor %eax, %eax
> > +
> > + /* Copy hypercall registers from arg struct: */
> > + movq TDX_HYPERCALL_r10(%rdi), %r10
> > + movq TDX_HYPERCALL_r11(%rdi), %r11
> > + movq TDX_HYPERCALL_r12(%rdi), %r12
> > + movq TDX_HYPERCALL_r13(%rdi), %r13
> > + movq TDX_HYPERCALL_r14(%rdi), %r14
> > + movq TDX_HYPERCALL_r15(%rdi), %r15
> > +
> > + movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
> > +
> > + tdcall
> > +
> > + /*
> > + * RAX==0 indicates a failure of the TDVMCALL mechanism itself and that
> > + * something has gone horribly wrong with the TDX module.
> > + *
> > + * The return status of the hypercall operation is in a separate
> > + * register (in R10). Hypercall errors are a part of normal operation
> > + * and are handled by callers.
> > + */
> > + testq %rax, %rax
> > + jne .Lpanic
>
> Hm, can this call a C function which does the panic so that a proper
> error message is dumped to the user so that at least she knows where the
> panic comes from?

Sure we can. But it would look somewhat clunky.
Wouldn't backtrace be enough for this (never to be seen) case?

So far the panic would look like this:

invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc5-00181-geb9d1dde679a-dirty #1883
RIP: 0010:__tdx_hypercall+0x6d/0x70
Code: 00 00 74 17 4c 89 17 4c 89 5f 08 4c 89 67 10 4c 89 6f 18 4c 89 77 20 4c 89 7f 28 45 31 d2 45 31 db 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 00 55 53 41 54 41 55 41 56 41 57 48 89 a7 98 1b 00 00 48 8b
RSP: 0000:ffffffff82803e80 EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000fc00
RDX: ff11000004c3d448 RSI: 0000000000000002 RDI: ffffffff82803ea8
RBP: ffffffff8283e840 R08: 0000000000000000 R09: 000000005eee98b1
R10: 0000000000000000 R11: 000000000000000c R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ff1100001c400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff1100001c9ff000 CR3: 0000000002834001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
tdx_safe_halt+0x46/0x80
default_idle_call+0x4f/0x90
do_idle+0xeb/0x250
cpu_startup_entry+0x15/0x20
start_kernel+0x372/0x3d1
secondary_startup_64_no_verify+0xe5/0xeb
</TASK>

To me "invalid opcode" at "RIP: 0010:__tdx_hypercall+0x6d/0x70" is pretty
clear where it comes from, no?

> > +
> > + /* TDVMCALL leaf return code is in R10 */
> > + movq %r10, %rax
> > +
> > + /* Copy hypercall result registers to arg struct if needed */
> > + testq $TDX_HCALL_HAS_OUTPUT, %rsi
> > + jz .Lout
> > +
> > + movq %r10, TDX_HYPERCALL_r10(%rdi)
> > + movq %r11, TDX_HYPERCALL_r11(%rdi)
> > + movq %r12, TDX_HYPERCALL_r12(%rdi)
> > + movq %r13, TDX_HYPERCALL_r13(%rdi)
> > + movq %r14, TDX_HYPERCALL_r14(%rdi)
> > + movq %r15, TDX_HYPERCALL_r15(%rdi)
> > +.Lout:
> > + /*
> > + * Zero out registers exposed to the VMM to avoid speculative execution
> > + * with VMM-controlled values. This needs to include all registers
> > + * present in TDVMCALL_EXPOSE_REGS_MASK (except R12-R15). R12-R15
> > + * context will be restored.
> > + */
> > + xor %r10d, %r10d
> > + xor %r11d, %r11d
> > +
> > + /* Restore callee-saved GPRs as mandated by the x86_64 ABI */
> > + pop %r12
> > + pop %r13
> > + pop %r14
> > + pop %r15
> > +
> > + FRAME_END
> > +
> > + retq
> > +.Lpanic:
> > + ud2
> > +SYM_FUNC_END(__tdx_hypercall)
>
> ...
>
> > diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> > index 7dca52f5cfc6..0b465e7d0a2f 100644
> > --- a/arch/x86/kernel/asm-offsets.c
> > +++ b/arch/x86/kernel/asm-offsets.c
> > @@ -74,6 +74,16 @@ static void __used common(void)
> > OFFSET(TDX_MODULE_r10, tdx_module_output, r10);
> > OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
> >
> > +#ifdef CONFIG_INTEL_TDX_GUEST
>
> Those have ifdeffery around them - why don't the TDX_MODULE_* ones need
> it too?

I will drop the #ifdef. There's no harm in generating it for !TDX build.

> > + BLANK();
> > + OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_args, r10);
> > + OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_args, r11);
> > + OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_args, r12);
> > + OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_args, r13);
> > + OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_args, r14);
> > + OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_args, r15);
> > +#endif
> > +
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

--
Kirill A. Shutemov