Re: [PATCH v2 00/21] Runtime TDX Module update support

From: Sean Christopherson

Date: Tue Oct 28 2025 - 20:56:27 EST


On Tue, Oct 28, 2025, Erdem Aktas wrote:
> On Mon, Oct 27, 2025 at 7:14 PM <dan.j.williams@xxxxxxxxx> wrote:
> >
> > Vishal Annapurve wrote:
> > [..]
> > > Problem 2 should be solved in the TDX module as it is the state owner
> > > and should be given a chance to ensure that nothing else can affect
> > > it's state. Kernel is just opting-in to toggle the already provided
> > > TDX module ABI. I don't think this is adding complexity to the kernel.
> >
> > It makes the interface hard to reason about, that is complexity.
>
> Could you clarify what you mean here? What interface do you need to
> reason about? TDX module has a feature as described in its spec, this
> is nothing to do with the kernel. Kernel executes the TDH.SYS.SHUTDOWN
> and if it fails, it will return the error code back to the user space.
> There is nothing here to reason about and it is not clear how it is
> adding the complexity to the kernel.

Userspace needs to reason about error codes and potential sources of those error
codes. That said, I agree that having the kernel set AVOID_COMPAT_SENSITIVE by
default (I vote for setting it unconditionally), doesn't add meaningful
complexity; the kernel would just need to document that the update mechanism can
return -EBUSY (or whatever), and why/when.

For me, that seems far less daunting/complex than attempting to document what all
can go wrong if the kernel _doesn't_ set AVOID_COMPAT_SENSITIVE. Because IMO,
regardless of whether or not the kernel sets AVOID_COMPAT_SENSITIVE, the kernel
is making a decision and defining behavior, and that behavior needs to be
documented. If AVOID_COMPAT_SENSITIVE didn't exist, then I would agree this is
purely a userspace vs. TDX-Module problem, but it does exist, and not setting the
flag defines ABI just as much as setting the flag does.

The failure mode also matters, a lot. "Sorry dear customer, we corrupted your VM"
is very, very different than "A handful of machines in our fleet haven't completed
an (optional?) update".

> > Consider an urgent case where update is more important than the
> > consistency of ongoing builds. The kernel's job is its own self
> > consistency and security model, when that remains in tact root is
> > allowed to make informed decisions.
> >
> The whole update is initiated by the userspace, imo, it is not the
> kernel's job to decide what to do.

I think you and Dan are in violent agreement. I _think_ what Dan is saying that
the kernel needs to protect itself, e.g. by rejecting an update if the kernel knows
the system is in a bad state. But other than that, userspace can do whatever.

AFAICT, the only disagreement is whether or not to set AVOID_COMPAT_SENSITIVE.

> It should try to update the TDX module and return error code back to the
> userspace if it fails.

+1. Unless there's a wrinkle I'm missing, failing with -EBUSY seems like the
obvious choice.

> > You might say, well add a --force option for that, and that is also
> > userspace prerogative to perform otherwise destructive operations with
> > the degrees of freedom the kernel allows.
>
> IMO, It is something userspace should decide, kernel's job is to
> provide the necessary interface about it.

I disagree, I don't think userspace should even get the option. IMO, not setting
AVOID_COMPAT_SENSITIVE is all kinds of crazy.