Re: [PATCH v2 01/10] x86/mtrr: fix MTRR fixup on APs

From: Juergen Gross
Date: Mon Aug 22 2022 - 01:17:46 EST


On 21.08.22 23:41, Borislav Petkov wrote:
On Sun, Aug 21, 2022 at 02:25:59PM +0200, Borislav Petkov wrote:
Fix that by using percpu variables for saving the MSR contents.

Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
---
I thought adding a "Fixes:" tag for the kernel's initial git commit
would maybe be entertaining, but without being really helpful.
The percpu variables were preferred over on-stack ones in order to
avoid more code churn in followup patches decoupling PAT from MTRR
support.

So if that thing has been broken for so long and no one noticed, we
could just as well not backport to stable at all...

Yeah, you can't do that.

The whole day today I kept thinking that something's wrong with this
here. As in, why hasn't it been reported until now.

You say above:

"... for all cpus is racy in case the MSR contents differ across cpus."

But they don't differ:

"7.7.5 MTRRs in Multi-Processing Environments

In multi-processing environments, the MTRRs located in all processors
must characterize memory in the same way. Generally, this means that
identical values are written to the MTRRs used by the processors. This
also means that values CR0.CD and the PAT must be consistent across
processors. Failure to do so may result in coherency violations or loss
of atomicity. Processor implementations do not check the MTRR settings
in other processors to ensure consistency. It is the responsibility of
system software to initialize and maintain MTRR consistency across all
processors."

And you can't have different fixed MTRR type on each CPU - that would
lead to all kinds of nasty bugs.

And here's from a big fat box:

$ rdmsr -a 0x2ff | uniq -c
256 c00

All 256 CPUs have the def type set to the same thing.

Now, if all CPUs go write that same deftype_lo variable in the
rendezvous handler, the only issue that could happen is if a read
sees a partial write. BUT, AFAIK, x86 doesn't tear 32-bit writes so I
*think* all CPUs see the same value being corrected by using mtrr_state
previously saved on the BSP.

As I said, we should've seen this exploding left and right otherwise...

And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c
which has a comment saying:

/* Some BIOS's are messed up and don't set all MTRRs the same! */

Yes, the chances are slim to hit such a box, but your reasoning suggests
I should remove the related code?


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature