Re: [PATCH v2 0/4] x86/mtrr: Allow MTRR updates on multiple CPUs in parallel
From: Jürgen Groß
Date: Fri Feb 13 2026 - 08:25:12 EST
On 12.02.26 17:54, H. Peter Anvin wrote:
On February 12, 2026 8:24:18 AM PST, "Jürgen Groß" <jgross@xxxxxxxx> wrote:
On 10.02.26 08:28, H. Peter Anvin wrote:
On February 9, 2026 10:51:04 PM PST, "Jürgen Groß" <jgross@xxxxxxxx> wrote:
On 09.02.26 19:37, H. Peter Anvin wrote:
On February 9, 2026 1:12:59 AM PST, Juergen Gross <jgross@xxxxxxxx> wrote:
Ping?
I'd really like to have this in 7.0, as it is fixing a real issue on
some machines ...
Juergen
On 30.01.26 12:36, Juergen Gross wrote:
Today MTRR updates are serialized to not happen on multiple CPUs at the
same time, as the related coding is using global variables.
On huge machines with lots of CPUs this can result in problems, as such
updates are happening through stop_machine(), which will call the MTRR
update function with interrupts off on all CPUs at the same time. The
interrupts will be switched on only after the last CPU has finished
the MTRR update. As the update is required to run in uncached mode, it
can take easily several milliseconds on each CPU, resulting in the
whole process to need several seconds. This in turn can cause the
watchdog to trigger and to recognize a hard system lockup.
This series is changing the behavior by allowing the MTRR update to
happen on all CPUs in parallel.
Changes in V2:
- fix a function comment header in patch 2
Juergen Gross (4):
x86/mtrr: Move cache_enable() and cache_disable() to mtrr/generic.c
x86/mtrr: Introduce MTRR work state structure
x86/mtrr: Add a prepare_set hook to mtrr_ops
x86/mtrr: Drop cache_disable_lock
arch/x86/include/asm/cacheinfo.h | 2 -
arch/x86/include/asm/mtrr.h | 2 -
arch/x86/kernel/cpu/cacheinfo.c | 80 +----------------
arch/x86/kernel/cpu/mtrr/generic.c | 139 ++++++++++++++++++++++++-----
arch/x86/kernel/cpu/mtrr/mtrr.c | 3 +
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 +
6 files changed, 122 insertions(+), 106 deletions(-)
First of all, what machines are even needing MTRR updates these days?
I'm not aware this machine really needed an update.
This isn't a rhetorical question. It is important to understand what the underlying problem is.
It just took several seconds for all CPUs to check if there is an update
needed. It might be an issue with firmware, topology, whatever. It happened
in a test doing 300 cold boots in a row after roughly 70 loop iterations,
always on one of the last CPUs.
The issue shows that there IS a potential problem with doing the MTRR
update one CPU after the other, instead just doing it in parallel (which
is the "official" recommendation anyway). See the comment in
cache_disable(). And it isn't as if the fix would be very complicated.
Juergen
You are assuming that it won't break any fragile systems. I'm much more concerned about why this is happening at all.
I'm having a hard time seeing why my series would break fragile systems.
Its not as if I would change anything regarding the handling on each
cpu.
My main suspect why this is happening is the topology of the system
(8 socket NUMA machine), causing the uncached memory accesses to have a
rather high latency (multiple hops for accessing some memory), causing
each cpu to need some time for checking all MTRRs.
Juergen
Please stop avoiding the issue, which is WHY is this happening AT ALL on a recent production system.
The fastest way to do anything is to not do it at all.
What do the logs look like, with sufficient verbosity, for one thing?
I asked for more detailed logs. And indeed in the log with my patches applied
the following messages could be seen:
[ 72.301321][ T1] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 72.310541][ T1] mtrr: probably your BIOS does not setup all CPUs.
[ 72.318539][ T1] mtrr: corrected configuration.
While this probably is the reason for the long time MTRR setup on the APs
needed, I'm still thinking my patches are making a lot of sense.
I have asked they check their BIOS, of course.
Juergen
Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature