Re: [PATCH v4 11/24] x86/virt/seamldr: Introduce skeleton for TDX Module updates

From: Chao Gao

Date: Fri Mar 13 2026 - 10:02:30 EST

>> > >
>> > > The TDX Module update process consists of several steps as described in
>> > > Intel® Trust Domain Extensions (Intel® TDX) Module Base Architecture
>> > > Specification, Revision 348549-007, Chapter 4.5 "TD-Preserving TDX Module
>> > > Update"
>> > >
>> > > - shut down the old module
>> > > - install the new module
>> > > - global and per-CPU initialization
>> > > - restore state information
>> > >
>> > > Some steps must execute on a single CPU, others must run serially across
>> > > all CPUs, and some can run concurrently on all CPUs. There are also
>> > > ordering requirements between steps, so all CPUs must work in a step-locked
>> > > manner.
>> >
>> > Does the fact that they can run on other CPUs add any synchronization
>> > requirements? If not I'd leave it off.
>>
>> I'm not sure I understand the concern.
>>
>> Lockstep synchronization is needed specifically because we have both multiple
>> CPUs and multiple steps.
>>
>> If updates only required a single CPU, stop_machine() would be sufficient.
>
>The last part "some can run concurrently on all CPUs", how does it affect the
>design? They can run concurrently, but don't have to... So it's a non-
>requirement?
>
>It seems the main argument here is, this thing has lots of complex ordering
>requirements. So we do it lockstep as a simple pattern to bring sanity. It's a
>fine fuzzy argument I think. The way you list the types of requirements all
>specifically has me trying to find the connection between each requirement and
>lockstep. That is where I get lost. If the reader doesn't need to do the work of
>understanding, don't ask them. And if they do, it probably needs to be clearer.

Got it. I'll keep it simple:

The TDX Module update process consists of several steps as described in
Intel® Trust Domain Extensions (Intel® TDX) Module Base Architecture
Specification, Revision 348549-007, Chapter 4.5 "TD-Preserving TDX Module
Update"

- shut down the old module
- install the new module
- global and per-CPU initialization
- restore state information

There are ordering requirements between steps which mandate lockstep
synchronization across all CPUs.

Or the step details might be irrelevant. Perhaps:

TDX module update consists of several steps. Ordering requirements between
steps mandate lockstep synchronization across all CPUs.

>> > > 1. The entire update process must use stop_machine() to synchronize with
>> > > other TDX workloads
>> > > 2. Update steps must be performed in a step-locked manner
>> > >
>> > > To prepare for implementing concrete TDX Module update steps, establish
>> > > the framework by mimicking multi_cpu_stop(), which is a good example of
>> > > performing a multi-step task in step-locked manner.
>> > >
>> >
>> > Offline Chao pointed that Paul suggested this after considering refactoring out
>> > the common code. I think it might still be worth mentioning why you can't use
>> > multi_cpu_stop() directly. I guess there are some differences. what are they.
>>
>> To be clear, Paul didn't actually suggest this approach. His feedback indicated
>> he wasn't concerned about duplicating some of multi_cpu_stop()'s code, i.e., no
>> need to refactor out some common code.
>
>Right, sorry for oversimplifying.
>
>>
>> https://lore.kernel.org/all/a7affba9-0cea-4493-b868-392158b59d83@paulmck-laptop/#t
>>
>> We can't use multi_cpu_stop() directly because it only provides lockstep
>> execution for its own infrastructure, not for the function it runs. If we
>> passed a function that performs steps A, B, and C to multi_cpu_stop(), there's
>> no guarantee that all CPUs complete step A before any CPU begins step B.
>
>If it could be said more concisely, it seems relevant.

How about:

multi_cpu_stop() executes in lockstep but doesn't synchronize steps within the
callback function it takes. So, implement one based on its pattern.