[PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature

From: Chang S. Bae
Date: Tue Oct 01 2024 - 12:19:35 EST


Hi all,

I'd like to ask initial feedback on this series enabling the staging
feature. Thanks!

== Latency Spike Issue ==

As microcode images have increased in size, a corresponding rise in load
latency has become inevitable. This latency spike significantly impacts
late loading, which remains in use despite the cautions highlighted in
the documentation [1]. The issue is especially critical for continuously
running workloads and virtual machines, where excessive delays can lead
to timeouts.

== Staging for Latency Reduction ==

Currently, writing to MSR_IA32_UCODE_WRITE triggers the entire update
process -- loading, validating, and activation -- all of which contribute
to the latency during CPU halt. The staging feature mitigates this by
refactoring all but the activation step out of the critical path,
allowing CPUs to continue serving workloads while staging takes place.

== Cache Flush Removal ==

Before resolving this latency spike caused by larger images, another
major latency issue -- cache invalidation [2] -- must first be addressed.
Originally introduced to handle a specific erratum, this cache
invalidation is now unnecessary because the problematic microcode images
have been banned. This cache flush has been found to negate the benefits
of staging, so this patch series begins by removing the WRINVD
instruction.

== Validation ==

We internally established pseudocode to clearly define all essential
steps for interacting with the firmware. Any firmware implementation
supporting staging should adhere to this contract. This patch set
incorporates that staging logic, which I successfully tested on one
firmware implementation. Multiple teams at Intel have also validated the
feature across different implementations.

Preliminary results from a pre-production system show a significant
reduction in latency (about 40%) with the staging approach alone.
Further improvements are possible with additional optimizations [*].

== Call for Review ==

This RFC series aims to present the proposed approach for community
review, to assess its soundness, and to discuss potential alternatives
if necessary. There are several key points to highlight for feedback:

1. Staging Integration Approach

In the core code, the high-level sequence for late loading is:

(1) request_microcode_fw(), and
(2) load_late_stop_cpus()->apply_microcode()

Staging doesn't fit neatly into either steps, as it involves the
loading process but not the activation. Therefore, a new callback is
introduced:

core::load_late_locked()
-> intel::staging_microcode()
-> intel_staging::staging_work()
-> intel_staging::...

2. Code Abstraction

The newly added intel_staging.c file contains all staging-related
code to keep it self-contained. Ideally, the entire firmware
interaction could eventually be abstracted into a single MSR write,
which remains a long-term goal. Fortunately, recent protocol
simplifications have made this more feasible.

3. Staging Policy (TODO)

While staging is always attempted, the system will fall back to the
legacy update method if staging fails. There is an open question
regarding staging policy: should it be mandatory, without fallback,
in certain usage scenarios? This could lead further refinements in
the flow depending on feedback and use cases.

4. Specification Updates

Recent specification updates have simplified the staging protocol
and clarified the behavior of MSR_IA32_UCODE_WRITE in conjunction
with staging:

4.1. Protocol Simplification

The specification update [3] has significantly reduced the
complexity of staging code, trimming the kernel code from ~1K lines
in preliminary implementations. Thanks to Dave for guiding this
redesign effort.

4.2. Clarification of Legacy Update Behavior

Chapter 5 of the specification adds further clarification on
MSR_IA32_UCODE_WRITE. Key points are summarized below:

(a) When staging is not performed or failed, a WRMSR will still load
the patch image, but with higher latency.

(b) During an active staging process, MSR_IA32_UCODE_WRITE can
load a new microcode image, again with higher latency.

(c) If the versions differ between the staged microcode and the
version loaded via MSR_IA32_UCODE_WRITE, the version loaded through
the MSR takes precedence.

I'd also make sure there is no further ambiguity in this documentation
[3]. Feel free to provide feedback if anything seems unclear or
unreasonable.

As noted [*], an additional series focused on further latency
optimizations will follow. However, the staging approach was prioritized
due to its significant first-order impact on latency.

This series is based on 6.12-rc1. You can also find it from this repo:
git://github.com/intel-staging/microcode.git staging_rfc-v1

Thanks,
Chang

[1]: https://docs.kernel.org/arch/x86/microcode.html#why-is-late-loading-dangerous
[2]: https://lore.kernel.org/all/20240701212012.21499-1-chang.seok.bae@xxxxxxxxx/
[3]: https://cdrdv2.intel.com/v1/dl/getContent/782715
[*]: Further latency improvements will be addressed in the upcoming
‘Uniform’ feature series.

Chang S. Bae (7):
x86/microcode/intel: Remove unnecessary cache writeback and
invalidation
x86/microcode: Introduce staging option to reduce late-loading latency
x86/msr-index: Define MSR index and bit for the microcode staging
feature
x86/microcode/intel: Prepare for microcode staging
x86/microcode/intel_staging: Implement staging logic
x86/microcode/intel_staging: Support mailbox data transfer
x86/microcode/intel: Enable staging when available

arch/x86/include/asm/msr-index.h | 9 +
arch/x86/kernel/cpu/microcode/Makefile | 2 +-
arch/x86/kernel/cpu/microcode/core.c | 12 +-
arch/x86/kernel/cpu/microcode/intel.c | 77 ++++++++-
arch/x86/kernel/cpu/microcode/intel_staging.c | 154 ++++++++++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 5 +-
6 files changed, 247 insertions(+), 12 deletions(-)
create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

--
2.43.0