[PATCH v2 1/6] x86/microcode: Introduce staging step to reduce late-loading time

From: Chang S. Bae
Date: Thu Mar 20 2025 - 19:41:28 EST


As microcode patch sizes continue to grow, late-loading latency spikes
can lead to timeouts and disruptions in running workloads. This trend of
increasing patch sizes is expected to continue, so a foundational
solution is needed to address the issue.

To mitigate the problem, a new staging feature is introduced. This option
processes most of the microcode update (excluding activation) on a
non-critical path, allowing CPUs to remain operational during the
majority of the update. By offloading work from the critical path,
staging can significantly reduces latency spikes.

Integrate staging as a preparatory step in late-loading. Introduce a new
callback for staging, which is invoked at the beginning of
load_late_stop_cpus(), before CPUs enter the rendezvous phase.

Staging follows an opportunistic model:

* If successful, it reduces CPU rendezvous time
* Even though it fails, the process falls back to the legacy path to
finish the loading process but with potentially higher latency.

Extend struct microcode_ops to incorporate staging properties, which will
be implemented in the vendor code separately.

Signed-off-by: Chang S. Bae <chang.seok.bae@xxxxxxxxx>
---
V1 -> V2:
- Move invocation inside of load_late_stop_cpus() (Boris)
- Add more note about staging (Dave)

RFC-V1 -> V1:
- Rename the function name to the do_something() style (Boris).

Note:

Now the invocation is placed as part of the late-loading core function
following Boris' comment on the last posting [1], which I think can keep
the main late-loading logic simple from load_late_locked().

There was some brief discussion about the necessity of enforcing staging
as mandatory [2]. But it was not clearly justified yet, so I thought it
still makes sense to exclude such policy discussions out of this series.

[1] https://lore.kernel.org/lkml/20250218113634.GGZ7RwwkrrXADX0eRo@fat_crate.local/
[2] https://lore.kernel.org/lkml/526df712-6091-4b04-97d5-9007789dc750@xxxxxxxxx/
---
arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 4 +++-
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b3658d11e7b6..c4aff44a7ffc 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -541,6 +541,17 @@ static int load_late_stop_cpus(bool is_safe)
pr_err("You should switch to early loading, if possible.\n");
}

+ /*
+ * Pre-load the microcode image into a staging device. This
+ * process is preemptible and does not require stopping CPUs.
+ * Successful staging simplifies the subsequent late-loading
+ * process, reducing rendezvous time.
+ *
+ * Even if the transfer fails, the update will proceed as usual.
+ */
+ if (microcode_ops->use_staging)
+ microcode_ops->stage_microcode();
+
atomic_set(&late_cpus_in, num_online_cpus());
atomic_set(&offline_in_nmi, 0);
loops_per_usec = loops_per_jiffy / (TICK_NSEC / 1000);
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 5df621752fef..4b983b4cddbd 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -31,10 +31,12 @@ struct microcode_ops {
* See also the "Synchronization" section in microcode_core.c.
*/
enum ucode_state (*apply_microcode)(int cpu);
+ void (*stage_microcode)(void);
int (*collect_cpu_info)(int cpu, struct cpu_signature *csig);
void (*finalize_late_load)(int result);
unsigned int nmi_safe : 1,
- use_nmi : 1;
+ use_nmi : 1,
+ use_staging : 1;
};

struct early_load_data {
--
2.45.2