Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

From: Dave Hansen
Date: Tue Apr 20 2021 - 15:59:09 EST


On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>> approach is, it adds a few extra instructions for every
>>> TDCALL use case when compared to distributed checks. Although
>>> it's a bit less efficient, it's worth it to make the code more
>>> readable.
>>
>> What's a "distributed check"?
>
> It should be "distributed TDVMCALL/TDCALL inline assembly calls"

It's still not clear to what that refers.

>> This also doesn't talk at all about why this approach was chosen versus
>> inline assembly.  You're going to be asked "why not use inline asm?"
> "To make the core more readable and less error prone." I have added this
> info in above paragraph. Do you think we need more argument to
> justify our approach?

Yes, you need much more justification. That's pretty generic and
non-specific.

>>> +    /*
>>> +     * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
>>> +     * defined in the GHCI.  Note, RAX and RCX are consumed, but
>>> only by
>>> +     * TDX-Module and so don't need to be listed in the mask.
>>> +     */
>>
>> "GCHI" is out of the blue here.  So is "TDX-Module".  There needs to be
>> more context.
> ok. will add it. Do you want GHCI spec reference with section id here?

Absolutely not. I dislike all of the section references as-is. Doesn't
a comment like this say what you said above and also add context?

Expose every register currently used in the
guest-to-host communication interface (GHCI).

>>> +    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>>> +
>>> +    tdcall
>>> +
>>> +    /* Panic if TDCALL reports failure. */
>>> +    test %rax, %rax
>>> +    jnz 2f
>>
>> Why panic?
> As per spec, TDCALL should never fail. Note that it has nothing to do
> with TDVMCALL function specific failure (which is reported via R10).

You've introduced two concepts here, without differentiating them. You
need to work to differentiate these two kinds of failure somewhere. You
can't simply refer to both as "failure".

>> Also, do you *REALLY* need to do this from assembly?  Can't it be done
>> in the C wrapper?
> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
> so added
> it here.

That's not a good reason. You could just as easily have a C wrapper
which all uses of TDVMCALL go through.

>>> +    /* Propagate TDVMCALL success/failure to return value. */
>>> +    mov %r10, %rax
>>
>> You just said it panic's on failure.  How can this propagate failure?
> we use panic for TDCALL failure. But, R10 content used to identify
> whether given
> TDVMCALL function operation is successful or not.

As I said above, please endeavor to differentiate the two classes of
failures.

Also, if the spec is violated, do you *REALLY* want to blow up the whole
world with a panic? I guess you can argue that it could have been
something security-related that failed. But, either way, you owe a
description of why panic'ing is a good idea, not just blindly deferring
to "the spec says this can't happen".

>>> +    xor %r10d, %r10d
>>> +    xor %r11d, %r11d
>>> +    xor %r12d, %r12d
>>> +    xor %r13d, %r13d
>>> +    xor %r14d, %r14d
>>> +    xor %r15d, %r15d
>>> +
>>> +    pop %r12
>>> +    pop %r13
>>> +    pop %r14
>>> +    pop %r15
>>> +
>>> +    FRAME_END
>>> +    ret
>>> +2:
>>> +    ud2
>>> +.endm
>>> +
>>> +SYM_FUNC_START(__tdvmcall)
>>> +    xor %r10, %r10
>>> +    tdvmcall_core
>>> +SYM_FUNC_END(__tdvmcall)
>>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>>> index 0d00dd50a6ff..1147e7e765d6 100644
>>> --- a/arch/x86/kernel/tdx.c
>>> +++ b/arch/x86/kernel/tdx.c
>>> @@ -3,6 +3,36 @@
>>>     #include <asm/tdx.h>
>>>   +/*
>>> + * Wrapper for the common case with standard output value (R10).
>>> + */
>>
>> ... and oddly enough there is no explicit mention of R10 anywhere.  Why?
> For Guest to Host call -> R10 holds TDCALL function id (which is 0 for
> TDVMCALL). so
> we don't need special argument.
> After TDVMCALL execution, R10 value is returned via RAX.

OK... so this is how it works. But why mention R10 in the comment? Why
is *THAT* worth mentioning?

>>> +static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
>>> +{
>>> +    u64 err;
>>> +
>>> +    err = __tdvmcall(fn, r12, r13, r14, r15, NULL);
>>> +
>>> +    WARN_ON(err);
>>> +
>>> +    return err;
>>> +}
>>
>> Are there really *ZERO* reasons for a TDVMCALL to return an error?
> No. Its useful for debugging TDVMCALL failures.
>> Won't this let a malicious VMM spew endless warnings into the guest
>> console?
> As per GHCI spec, R10 will hold error code details which can be used to
> determine
> the type of TDVMCALL failure.

I would encourage you to stop citing the GCCI spec. In all of these
conversations, you seem to rely on it without considering the underlying
reasons. The fact that R10 is the error code is 100% irrelevant for
this conversation.

It's also entirely possible that the host would have bugs, or forget to
clear a bit somewhere, even if the spec says, "don't do it".

> More warnings at-least show that we are working
> with malicious VMM.

That argument does not hold water for me.

You can argue that a counter can be kept, or that a WARN_ON_ONCE() is
appropriate, or that a printk_ratelimited() is nice. But, allowing an
untrusted software component to write unlimited warnings to the kernel
console is utterly nonsensical.

By the same argument, any userspace exploit attempts could spew warnings
to the console also so that we can tell we are working with malicious
userspace.

>>> +/*
>>> + * Wrapper for the semi-common case where we need single output
>>> value (R11).
>>> + */
>>> +static inline u64 tdvmcall_out_r11(u64 fn, u64 r12, u64 r13, u64
>>> r14, u64 r15)
>>> +{
>>> +
>>> +    struct tdvmcall_output out = {0};
>>> +    u64 err;
>>> +
>>> +    err = __tdvmcall(fn, r12, r13, r14, r15, &out);
>>> +
>>> +    WARN_ON(err);
>>> +
>>> +    return out.r11;
>>> +}
>>> +
>>
>> But you introduced __tdvmcall and __tdcall assembly functions.  Why
>> aren't both of them used?
> This patch just adds helper functions. Its used by other patches in the
> series. __tdvmcall is used in this patch because we need to add more
> wrappers for it.

That needs to be mentioned in the changelog at least.