Re: [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step

From: Edgecombe, Rick P

Date: Mon May 18 2026 - 22:35:04 EST

On Wed, 2026-05-13 at 08:09 -0700, Chao Gao wrote:
> A TDX module update is a multi-step process, and any step can fail.
>
> The current update flow continues to later steps after an error.
> Continuing after a failure can leave the TDX module in an unrecoverable
> state.

I get what you are saying here, but "continuing" vs "leaving" is a tiny bit
confusing to me. Maybe: Continuing with subsequent update steps after a failure
can cause the TDX module to enter an unrecoverable state?

>
> One failure case must remain recoverable: update contention with an ongoing
> TD build. The agreed kernel behavior for this case [1] is to fail the
> update with -EBUSY so userspace can retry later.

The link to the discussion is nice, but the explanation of just that there was
an agreement is not saying much. But the reasoning around AVOID_COMPAT_SENSITIVE
*is* handled in later patch. So can we say future changes will want to return
errors to userspace for certain update failures? Then we can discuss the
specifics when code is actual error is added?

And why talk about EBUSY specifically? It is not in this patch. Stale log?

>
> Abort the update on any failure. This also makes the TD-build contention
> case recoverable, because that failure occurs before any TDX module state
> is changed.
>

Oh, maybe I didn't get what you meant above actually. The contention case is
only recoverable because we detect it at the first step? Does "Continuing after
a failure can leave the TDX module in an unrecoverable" really mean that any
failure after the first step is unrecoverable? Or can we put it in some other
more specific terms like that. Terms which are more specific but still not
overly complex description of TDX module update flows?

> Apply the same rule to all errors instead of special-casing
> -EBUSY.

It seems like actually it is not special cased...? The error returned is
whatever is returned from the step.

>
> Track per-step failures, stop the update loop once a failure is observed,
> and do not advance the state machine to the next step.

Hmm, so this is actually a bunch of generic handling for each step, that really
only works for the first one? Is the generic handling really needed?

>
> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
> Reviewed-by: Xu Yilun <yilun.xu@xxxxxxxxxxxxxxx>
> Reviewed-by: Tony Lindgren <tony.lindgren@xxxxxxxxxxxxxxx>
> Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>
> Reviewed-by: Kiryl Shutsemau (Meta) <kas@xxxxxxxxxx>
> Link: https://lore.kernel.org/linux-coco/aQFmOZCdw64z14cJ@xxxxxxxxxx/ # [1]
> ---
> v9:
> - Avoid nested if/else by deferring failure accounting to ack_state().
> - Reduce indentation of the main flow.
> - Convert the failed flag into a counter. This avoids a conditional
> update of the flag; the counter can simply accumulate failures.
> ---
> arch/x86/virt/vmx/tdx/seamldr.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
> index 7befe4a08f33..48fe71319fea 100644
> --- a/arch/x86/virt/vmx/tdx/seamldr.c
> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
> @@ -170,6 +170,7 @@ enum module_update_state {
> static struct update_ctrl {
> enum module_update_state state;
> int num_ack;
> + int num_failed;

Was there past discussion on why it keeps a failed count? All we need to know is
if anything failed right? So a bool is fine too?

> /*
> * Protect update_ctrl. Raw spinlock as it will be acquired from
> * interrupt-disabled contexts.
> @@ -187,12 +188,13 @@ static void __set_target_state(struct update_ctrl *ctrl,
> }
>
> /* Last one to ack a state moves to the next state. */
> -static void ack_state(struct update_ctrl *ctrl)
> +static void ack_state(struct update_ctrl *ctrl, int result)
> {
> raw_spin_lock(&ctrl->lock);
>
> + ctrl->num_failed += !!result;
> ctrl->num_ack++;
> - if (ctrl->num_ack == num_online_cpus())
> + if (ctrl->num_ack == num_online_cpus() && !ctrl->num_failed)
> __set_target_state(ctrl, ctrl->state + 1);
>
> raw_spin_unlock(&ctrl->lock);
> @@ -202,6 +204,7 @@ static void init_state(struct update_ctrl *ctrl)
> {
> raw_spin_lock_init(&ctrl->lock);
> __set_target_state(ctrl, MODULE_UPDATE_START + 1);
> + ctrl->num_failed = 0;
> }
>
> /*
> @@ -228,8 +231,8 @@ static int do_seamldr_install_module(void *seamldr_params)
> break;
> }
>
> - ack_state(&update_ctrl);
> - } while (curstate != MODULE_UPDATE_DONE);
> + ack_state(&update_ctrl, ret);
> + } while (curstate != MODULE_UPDATE_DONE && !READ_ONCE(update_ctrl.num_failed));
>
> return ret;
> }