Re: [RFC PATCH v5 09/29] KVM: selftests: TDX: Add report_fatal_error test

From: Yan Zhao
Date: Mon Apr 15 2024 - 06:11:10 EST


On Mon, Apr 15, 2024 at 08:05:49AM +0000, Ackerley Tng wrote:
> Yan Zhao <yan.y.zhao@xxxxxxxxx> writes:
>
> > On Fri, Apr 12, 2024 at 04:56:36AM +0000, Ackerley Tng wrote:
> >> Yan Zhao <yan.y.zhao@xxxxxxxxx> writes:
> >>
> >> > ...
> >> >> diff --git a/tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h b/tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
> >> >> index b570b6d978ff..6d69921136bd 100644
> >> >> --- a/tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
> >> >> +++ b/tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
> >> >> @@ -49,4 +49,23 @@ bool is_tdx_enabled(void);
> >> >> */
> >> >> void tdx_test_success(void);
> >> >>
> >> >> +/**
> >> >> + * Report an error with @error_code to userspace.
> >> >> + *
> >> >> + * Return value from tdg_vp_vmcall_report_fatal_error is ignored since execution
> >> >> + * is not expected to continue beyond this point.
> >> >> + */
> >> >> +void tdx_test_fatal(uint64_t error_code);
> >> >> +
> >> >> +/**
> >> >> + * Report an error with @error_code to userspace.
> >> >> + *
> >> >> + * @data_gpa may point to an optional shared guest memory holding the error
> >> >> + * string.
> >> >> + *
> >> >> + * Return value from tdg_vp_vmcall_report_fatal_error is ignored since execution
> >> >> + * is not expected to continue beyond this point.
> >> >> + */
> >> >> +void tdx_test_fatal_with_data(uint64_t error_code, uint64_t data_gpa);
> >> > I found nowhere is using "data_gpa" as a gpa, even in patch 23, it's
> >> > usage is to pass a line number ("tdx_test_fatal_with_data(ret, __LINE__)").
> >> >
> >> >
> >>
> >> This function tdx_test_fatal_with_data() is meant to provide a generic
> >> interface for TDX tests to use TDG.VP.VMCALL<ReportFatalError>, and so
> >> the parameters of tdx_test_fatal_with_data() generically allow error_code and
> >> data_gpa to be specified.
> >>
> >> The tests just happen to use the data_gpa parameter to pass __LINE__ to
> >> the host VMM, but other tests in future that use the
> >> tdx_test_fatal_with_data() function in the TDX testing library could
> >> actually pass a GPA through using data_gpa.
> >>
> >> >> #endif // SELFTEST_TDX_TEST_UTIL_H
> >> >> diff --git a/tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c b/tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
> >> >> index c2414523487a..b854c3aa34ff 100644
> >> >> --- a/tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
> >> >> +++ b/tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
> >> >> @@ -1,8 +1,31 @@
> >> >> // SPDX-License-Identifier: GPL-2.0-only
> >> >>
> >> >> +#include <string.h>
> >> >> +
> >> >> #include "tdx/tdcall.h"
> >> >> #include "tdx/tdx.h"
> >> >>
> >> >> +void handle_userspace_tdg_vp_vmcall_exit(struct kvm_vcpu *vcpu)
> >> >> +{
> >> >> + struct kvm_tdx_vmcall *vmcall_info = &vcpu->run->tdx.u.vmcall;
> >> >> + uint64_t vmcall_subfunction = vmcall_info->subfunction;
> >> >> +
> >> >> + switch (vmcall_subfunction) {
> >> >> + case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
> >> >> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> >> >> + vcpu->run->system_event.ndata = 3;
> >> >> + vcpu->run->system_event.data[0] =
> >> >> + TDG_VP_VMCALL_REPORT_FATAL_ERROR;
> >> >> + vcpu->run->system_event.data[1] = vmcall_info->in_r12;
> >> >> + vcpu->run->system_event.data[2] = vmcall_info->in_r13;
> >> >> + vmcall_info->status_code = 0;
> >> >> + break;
> >> >> + default:
> >> >> + TEST_FAIL("TD VMCALL subfunction %lu is unsupported.\n",
> >> >> + vmcall_subfunction);
> >> >> + }
> >> >> +}
> >> >> +
> >> >> uint64_t tdg_vp_vmcall_instruction_io(uint64_t port, uint64_t size,
> >> >> uint64_t write, uint64_t *data)
> >> >> {
> >> >> @@ -25,3 +48,19 @@ uint64_t tdg_vp_vmcall_instruction_io(uint64_t port, uint64_t size,
> >> >>
> >> >> return ret;
> >> >> }
> >> >> +
> >> >> +void tdg_vp_vmcall_report_fatal_error(uint64_t error_code, uint64_t data_gpa)
> >> >> +{
> >> >> + struct tdx_hypercall_args args;
> >> >> +
> >> >> + memset(&args, 0, sizeof(struct tdx_hypercall_args));
> >> >> +
> >> >> + if (data_gpa)
> >> >> + error_code |= 0x8000000000000000;
> >> >>
> >> > So, why this error_code needs to set bit 63?
> >> >
> >> >
> >>
> >> The Intel GHCI Spec says in R12, bit 63 is set if the GPA is valid. As a
> > But above "__LINE__" is obviously not a valid GPA.
> >
> > Do you think it's better to check "data_gpa" is with shared bit on and
> > aligned in 4K before setting bit 63?
> >
>
> I read "valid" in the spec to mean that the value in R13 "should be
> considered as useful" or "should be passed on to the host VMM via the
> TDX module", and not so much as in "validated".
>
> We could validate the data_gpa as you suggested to check alignment and
> shared bit, but perhaps that could be a higher-level function that calls
> tdg_vp_vmcall_report_fatal_error()?
>
> If it helps, shall we rename "data_gpa" to "data" for this lower-level,
> generic helper function and remove these two lines
>
> if (data_gpa)
> error_code |= 0x8000000000000000;
>
> A higher-level function could perhaps do the validation as you suggested
> and then set bit 63.
This could be all right. But I'm not sure if it would be a burden for
higher-level function to set bit 63 which is of GHCI details.

What about adding another "data_gpa_valid" parameter and then test
"data_gpa_valid" rather than test "data_gpa" to set bit 63?

>
> Are you objecting to the use of R13 to hold extra test information, such
> as __LINE__?
>
> I feel that R13 is just another register that could be used to hold
> error information, and in the case of this test, we can use it to send
> __LINE__ to aid in debugging selftests. On the host side of the
> selftest we can printf() :).
>
Hmm, I just feel it's confusing to use R13 as error code holder and set
gpa_valid bit on at the same time.
As it looks complicated to convert __LINE__ to a string in a shared GPA,
maybe it's ok to pass it in R13 :)

> >> generic TDX testing library function, this check allows the user to use
> >> tdg_vp_vmcall_report_fatal_error() with error_code and data_gpa and not
> >> worry about setting bit 63 before calling
> >> tdg_vp_vmcall_report_fatal_error(), though if the user set bit 63 before
> >> that, there is no issue.
> >>
> >> >> + args.r11 = TDG_VP_VMCALL_REPORT_FATAL_ERROR;
> >> >> + args.r12 = error_code;
> >> >> + args.r13 = data_gpa;
> >> >> +
> >> >> + __tdx_hypercall(&args, 0);
> >> >> +}
> >>
> >> >> <snip>
> >>