Re: [PATCH v4 11/24] x86/virt/seamldr: Introduce skeleton for TDX Module updates
From: Chao Gao
Date: Thu Mar 12 2026 - 10:11:04 EST
On Thu, Mar 12, 2026 at 10:00:20AM +0800, Edgecombe, Rick P wrote:
>On Thu, 2026-02-12 at 06:35 -0800, Chao Gao wrote:
>> TDX Module updates require careful synchronization with other TDX
>> operations on the host. During updates, only update-related SEAMCALLs are
>> permitted; all other SEAMCALLs must be blocked.
>>
>> However, SEAMCALLs can be invoked from different contexts (normal and IRQ
>> context) and run in parallel across CPUs. And, all TD vCPUs must remain
>> out of guest mode during updates.
>>
>
>Above it says only update-related SEAMCALLs are permitted. Does that not already
>exclude SEAMCALLs that might allow entering the TD?
Those SEAMCALLs would return errors, and TDs would be killed if those errors
aren't handled properly.
One may argue that we can handle errors and retry after updates. But this just
provides a new form of synchronization, which is not as clean as the
well-defined synchronization provided by the kernel.
>
>> No single lock primitive can satisfy
>> all these synchronization requirements, so stop_machine() is used as the
>> only well-understood mechanism that can meet them all.
>>
>> The TDX Module update process consists of several steps as described in
>> Intel® Trust Domain Extensions (Intel® TDX) Module Base Architecture
>> Specification, Revision 348549-007, Chapter 4.5 "TD-Preserving TDX Module
>> Update"
>>
>> - shut down the old module
>> - install the new module
>> - global and per-CPU initialization
>> - restore state information
>>
>> Some steps must execute on a single CPU, others must run serially across
>> all CPUs, and some can run concurrently on all CPUs. There are also
>> ordering requirements between steps, so all CPUs must work in a step-locked
>> manner.
>
>Does the fact that they can run on other CPUs add any synchronization
>requirements? If not I'd leave it off.
I'm not sure I understand the concern.
Lockstep synchronization is needed specifically because we have both multiple
CPUs and multiple steps.
If updates only required a single CPU, stop_machine() would be sufficient.
>
>>
>> In summary, TDX Module updates create two requirements:
>
>The stop_machine() part seems more like a solution then a requirement.
>
>>
>> 1. The entire update process must use stop_machine() to synchronize with
>> other TDX workloads
>> 2. Update steps must be performed in a step-locked manner
>>
>> To prepare for implementing concrete TDX Module update steps, establish
>> the framework by mimicking multi_cpu_stop(), which is a good example of
>> performing a multi-step task in step-locked manner.
>>
>
>Offline Chao pointed that Paul suggested this after considering refactoring out
>the common code. I think it might still be worth mentioning why you can't use
>multi_cpu_stop() directly. I guess there are some differences. what are they.
To be clear, Paul didn't actually suggest this approach. His feedback indicated
he wasn't concerned about duplicating some of multi_cpu_stop()'s code, i.e., no
need to refactor out some common code.
https://lore.kernel.org/all/a7affba9-0cea-4493-b868-392158b59d83@paulmck-laptop/#t
We can't use multi_cpu_stop() directly because it only provides lockstep
execution for its own infrastructure, not for the function it runs. If we
passed a function that performs steps A, B, and C to multi_cpu_stop(), there's
no guarantee that all CPUs complete step A before any CPU begins step B.
>
>> Specifically, use a
>> global state machine to control each CPU's work and require all CPUs to
>> acknowledge completion before proceeding to the next step.
>
>Maybe add a bit more about the reasoning for requiring the other steps to ack.
>Tie it back to the lockstep part.
>
Ok. How about:
Specifically, add a global state machine where each state represents a step in
the above update flow. The state advances only after all CPUs acknowledge
completing their work in the current state. This acknowledgment mechanism is
what ensures lockstep execution.
<snip>
>> +static int do_seamldr_install_module(void *params)
>> +{
>> + enum tdp_state newstate, curstate = TDP_START;
>> + int ret = 0;
>> +
>> + do {
>> + /* Chill out and re-read tdp_data */
>> + cpu_relax();
>> + newstate = READ_ONCE(tdp_data.state);
>> +
>> + if (newstate != curstate) {
>> + curstate = newstate;
>> + switch (curstate) {
>
>Maybe a little comment here like "todo add the steps".
Sure.