Re: TDX/non-ACT: failed TDH.PHYMEM.PAGE.WBINVD after successful page remove can leave a page unreset
From: Edgecombe, Rick P
Date: Wed Apr 01 2026 - 10:04:56 EST
On Wed, 2026-04-01 at 19:51 +0800, 裴辰举 wrote:
>
> On non-ACT platforms, TDH.MEM.PAGE.REMOVE does not flush cachelines
> or initialize the removed page. KVM handles that by calling
> TDH.PHYMEM.PAGE.WBINVD
> after a private page is removed.
> The problem is the failure path after a successful remove:
> KVM drops a private page.
> TDH.MEM.PAGE.REMOVE succeeds, so the page is no longer
> assigned to the TD.
> KVM then calls TDH.PHYMEM.PAGE.WBINVD.
> If TDH.PHYMEM.PAGE.WBINVD fails, KVM marks the VM/TD dead and
> teardown follows.
> At that point, TDH.PHYMEM.PAGE.RECLAIM will not process the page
> that hit the WBINVD failure, because that page has already been
> removed from the TD. Normally TDH.PHYMEM.PAGE.RECLAIM
> clears/reinitializes TD pages during teardown, but this page is no
> longer in that set. This seems to create a state hole: the page has
> been
> removed from the TD, but it may never be fully reset/cleared for safe
> host reuse if the WBINVD step failed. Depending on later host-side
> handling, this can become
> either a leaked page or an unsafe page reuse issue.
Not every SEAMCALL error is expected, based on the constraints in the
code. So the code deliberately does not handle all documented errors.
As in, the code is written in a way to guarantee some operations will
succeed. If the code sees any weird behavior it does a KVM_BUG_ON(), as
a best effort kind of thing. It is not intended to be part of a system
to cleanly handle all possible bugs.
Instead, if the kernel does allow a specific KVM_BUG_ON() scenario to
trigger, the kernel should be fixed. If the TDX module starts to return
an unexpected error, then the TDX module should be fixed.