Re: [PATCH v11 018/113] KVM: TDX: create/destroy VM structure

From: Zhi Wang
Date: Tue Jan 17 2023 - 16:27:08 EST


On Tue, 17 Jan 2023 15:55:53 +0000
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

> On Sat, Jan 14, 2023, Zhi Wang wrote:
> > On Fri, 13 Jan 2023 15:16:08 +0000 > Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > > On Fri, Jan 13, 2023, Zhi Wang wrote:
> > > > Better add a FIXME: here as this has to be fixed later.
> > >
> > > No, leaking the page is all KVM can reasonably do here. An improved
> > > comment would be helpful, but no code change is required.
> > > tdx_reclaim_page() returns an error if and only if there's an
> > > unexpected, fatal error, e.g. a SEAMCALL with bad params, incorrect
> > > concurrency in KVM, a TDX Module bug, etc. Retrying at a later point is
> > > highly unlikely to be successful.
> >
> > Hi:
> >
> > The word "leaking" sounds like a situation left unhandled temporarily.
> >
> > I checked the source code of the TDX module[1] for the possible reason to
> > fail when reviewing this patch:
> >
> > tdx-module-v1.0.01.01.zip\src\vmm_dispatcher\api_calls\tdh_phymem_page_reclaim.c
> > tdx-module-v1.0.01.01.zip\src\vmm_dispatcher\api_calls\tdh_phymem_page_wbinvd.c
> >
> > a. Invalid parameters. For example, page is not aligned, PA HKID is not zero...
> >
> > For invalid parameters, a WARN_ON_ONCE() + return value is good enough as
> > that is how kernel handles similar situations. The caller takes the
> > responsibility.
> >
> > b. Locks has been taken in TDX module. TDR page has been locked due to another
> > SEAMCALL, another SEAMCALL is doing PAMT walk and holding PAMT lock...
> >
> > This needs to be improved later either by retry or taking tdx_lock to avoid
> > TDX module fails on this.
>
> No, tdx_reclaim_page() already retries TDH.PHYMEM.PAGE.RECLAIM if the target page
> is contended (though I'd question the validity of even that), and TDH.PHYMEM.PAGE.WBINVD
> is performed only when reclaiming the TDR. If there's contention when reclaiming
> the TDR, then KVM effectively has a use-after-free bug, i.e. leaking the page is
> the least of our worries.
>

Hi:

Thanks for the reply. "Leaking" is the consquence of even failing in retry. I
agree with this. But I was questioning if "retry" is really a correct and only
solution when encountering lock contention in the TDX module as I saw that there
are quite some magic numbers are going to be introduced because of "retry" and
there were discussions about times of retry should be 3 or 1000 in TDX guest
on hyper-V patches. It doesn't sound right.

Compare to an typical *kernel lock* case, an execution path can wait on a
waitqueue and later will be woken up. We usually do contention-wait-and-retry
and we rarely just do contention and retry X times. In TDX case, I understand
that it is hard for the TDX module to provide similar solutions as an execution
path can't stay long in the TDX module.

1) We can always take tdx_lock (linux kernel lock) when calling a SEAMCALL
that touch the TDX internal locks. But the downside is we might lose some
concurrency.

2) As TDX module doesn't provide contention-and-wait, I guess the following
approach might have been discussed when designing this "retry".

KERNEL TDX MODULE

SEAMCALL A -> PATH A: Taking locks

SEAMCALL B -> PATH B: Contention on a lock

<- Return "operand busy"

SEAMCALL B -|
| <- Wait on a kernel waitqueue
SEAMCALL B <-|

SEAMCALL A <- PATH A: Return

SEAMCALL A -|
| <- Wake up the waitqueue
SEMACALL A <-|

SEAMCALL B -> PATH B: Taking the locks
...

Why not this scheme wasn't chosen?

>
> On Thu, Jan 12, 2023 at 8:34 AM <isaku.yamahata@xxxxxxxxx> wrote:
> > +static int tdx_reclaim_page(hpa_t pa, bool do_wb, u16 hkid)
> > +{
> > +       struct tdx_module_output out;
> > +       u64 err;
> > +
> > +       do {
> > +               err = tdh_phymem_page_reclaim(pa, &out);
> > +               /*
> > +                * TDH.PHYMEM.PAGE.RECLAIM is allowed only when TD is shutdown.
> > +                * state.  i.e. destructing TD.
> > +                * TDH.PHYMEM.PAGE.RECLAIM  requires TDR and target page.
> > +                * Because we're destructing TD, it's rare to contend with TDR.
> > +                */
> > +       } while (err == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX));
> > +       if (WARN_ON_ONCE(err)) {
> > +               pr_tdx_error(TDH_PHYMEM_PAGE_RECLAIM, err, &out);
> > +               return -EIO;
> > +       }
> > +
> > +       if (do_wb) {
> > +               /*
> > +                * Only TDR page gets into this path.  No contention is expected
> > +                * because of the last page of TD.
> > +                */
> > +               err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(pa, hkid));
> > +               if (WARN_ON_ONCE(err)) {
> > +                       pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
> > +                       return -EIO;
> > +               }
> > +       }
> > +
> > +       tdx_clear_page(pa);
> > +       return 0;
> > +}