Re: [RFC PATCH v5 09/29] KVM: selftests: TDX: Add report_fatal_error test
From: Sean Christopherson
Date: Mon Apr 22 2024 - 17:24:09 EST
On Thu, Apr 18, 2024, Yan Zhao wrote:
> On Tue, Apr 16, 2024 at 11:50:19AM -0700, Sean Christopherson wrote:
> > On Mon, Apr 15, 2024, Yan Zhao wrote:
> > > On Mon, Apr 15, 2024 at 08:05:49AM +0000, Ackerley Tng wrote:
> > > > >> The Intel GHCI Spec says in R12, bit 63 is set if the GPA is valid. As a
> > > > > But above "__LINE__" is obviously not a valid GPA.
> > > > >
> > > > > Do you think it's better to check "data_gpa" is with shared bit on and
> > > > > aligned in 4K before setting bit 63?
> > > > >
> > > >
> > > > I read "valid" in the spec to mean that the value in R13 "should be
> > > > considered as useful" or "should be passed on to the host VMM via the
> > > > TDX module", and not so much as in "validated".
> > > >
> > > > We could validate the data_gpa as you suggested to check alignment and
> > > > shared bit, but perhaps that could be a higher-level function that calls
> > > > tdg_vp_vmcall_report_fatal_error()?
> > > >
> > > > If it helps, shall we rename "data_gpa" to "data" for this lower-level,
> > > > generic helper function and remove these two lines
> > > >
> > > > if (data_gpa)
> > > > error_code |= 0x8000000000000000;
> > > >
> > > > A higher-level function could perhaps do the validation as you suggested
> > > > and then set bit 63.
> > > This could be all right. But I'm not sure if it would be a burden for
> > > higher-level function to set bit 63 which is of GHCI details.
> > >
> > > What about adding another "data_gpa_valid" parameter and then test
> > > "data_gpa_valid" rather than test "data_gpa" to set bit 63?
> >
> > Who cares what the GHCI says about validity? The GHCI is a spec for getting
> > random guests to play nice with random hosts. Selftests own both, and the goal
> > of selftests is to test that KVM (and KVM's dependencies) adhere to their relevant
> > specs. And more importantly, KVM is NOT inheriting the GHCI ABI verbatim[*].
> >
> > So except for the bits and bobs that *KVM* (or the TDX module) gets involved in,
> > just ignore the GHCI (or even deliberately abuse it). To put it differently, use
> > selftests verify *KVM's* ABI and functionality.
> >
> > As it pertains to this thread, while I haven't looked at any of this in detail,
> > I'm guessing that whether or not bit 63 is set is a complete "don't care", i.e.
> > KVM and the TDX Module should pass it through as-is.
> >
> > [*] https://lore.kernel.org/all/Zg18ul8Q4PGQMWam@xxxxxxxxxx
> Ok. It makes sense to KVM_EXIT_TDX.
> But what if the TDVMCALL is handled in TDX specific code in kernel in future?
> (not possible?)
KVM will "handle" ReportFatalError, and will do so before this code lands[*], but
I *highly* doubt KVM will ever do anything but forward the information to userspace,
e.g. as KVM_SYSTEM_EVENT_CRASH with data[] filled in with the raw register information.
> Should guest set bits correctly according to GHCI?
No. Selftests exist first and foremost to verify KVM behavior, not to verify
firmware behavior. We can and should use selftests to verify that *KVM* doesn't
*violate* the GHCI, but that doesn't mean that selftests themselves can't ignore
and/or abuse the GCHI, especially since the GHCI definition for ReportFatalError
is frankly awful.
E.g. the GHCI prescibes actual behavior for R13, but then doesn't say *anything*
about what's in the data page. Why!?!?! If the format in the data page is
completely undefined, what's the point of restricting R13 to only be allowed to
hold a GPA?
And the wording is just as awful:
The VMM must validate that this GPA has the Shared bit set. In other words,
that a shared-mapping is used, and that this is a valid mapping for the TD.
I'm pretty sure it's just saying that the TDX module isn't going to verify the
operate, i.e. that the VMM needs to protect itself, but it would be so much
better to simply state "The TDX Module does not verify this GPA", because saying
the VMM "must" do something leads to pointless discussions like this one, where
we're debating over whether or *our* VMM should inject an error into *our* guest.
Anyways, we should do what makes sense for selftests and ignore the stupidity of
the GHCI when doing so yields better code. If that means abusing R13, go for it.
If it's a sticking point for anyone, just use one of the "optional" registers.
Whatever we do, bury the host and guest side of selftests behind #defines or helpers
so that there are at most two pieces of code that care which register holds which
piece of information.
[*] https://lore.kernel.org/all/20240404230247.GU2444378@xxxxxxxxxxxxxxxxxxxxx