Re: [PATCH v2] Subject: x86/PAT: Report PAT on CPUs that support PAT without MTRR

From: Chuck Zmudzinski
Date: Wed Jul 13 2022 - 04:51:50 EST

On 7/13/22 2:18 AM, Jan Beulich wrote:
> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> > v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> > *Add the necessary code to incorporate the "nopat" fix
> > *void init_cache_modes(void) -> void __init init_cache_modes(void)
> > *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> > *Expand the commit message to include relevant parts of the commit
> > message of Jan Beulich's proposed patch for this problem
> > *Fix 'else if ... {' placement and indentation
> > *Remove indication the backport to stable branches is only back to 5.17.y
> >
> > I think these changes address all the comments on the original patch
> >
> > I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> > include Jan's idea for fixing "nopat" that was missing from the first
> > version of the patch.
> You've sufficiently altered this change to clearly no longer want my
> S-o-b; unfortunately in fact I think you broke things:

Well, I hope we can come to an agreement so I have
your S-o-b. But that would probably require me to remove
Juergen's R-b.

> > @@ -292,7 +294,7 @@ void init_cache_modes(void)
> > rdmsrl(MSR_IA32_CR_PAT, pat);
> > }
> >
> > - if (!pat) {
> > + if (!pat || pat_force_disabled) {
> By checking the new variable here ...
> > /*
> > * No PAT. Emulate the PAT table that corresponds to the two
> > * cache bits, PWT (Write Through) and PCD (Cache Disable).
> > @@ -313,6 +315,16 @@ void init_cache_modes(void)
> > */
> > pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> > PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> ... you put in place a software view which doesn't match hardware. I
> continue to think that ...
> > + } else if (!pat_bp_enabled) {
> ... the variable wants checking here instead (at which point, yes,
> this comes quite close to simply being a v2 of my original patch).
> By using !pat_bp_enabled here you actually broaden where the change
> would take effect. Iirc Boris had asked to narrow things (besides
> voicing opposition to this approach altogether). Even without that
> request I wonder whether you aren't going to far with this.
> Jan

I thought about checking for the administrator's "nopat"
setting where you suggest which would limit the effect
of "nopat" to not reporting PAT as enabled to device
drivers who query for PAT availability using pat_enabled().
The main reason I did not do that is that due to the fact
that we cannot write to the PAT MSR, we cannot really
disable PAT. But we come closer to respecting the wishes
of the administrator by configuring the caching modes as
if PAT is actually disabled by the hardware or firmware
when in fact it is not.

What would you propose logging as a message when
we report PAT as disabled via pat_enabled()? The main
reason I did not choose to check the new variable in the
new 'else if' block is that I could not figure out what to
tell the administrator in that case. I think we would have
to log something like, "nopat is set, but we cannot disable
PAT, doing our best to disable PAT by not reporting PAT
as enabled via pat_enabled(), but that does not guarantee
that kernel drivers and components cannot use PAT if they
query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
instead of pat_enabled()." However, I acknowledge WC mappings
would still be disabled because arch_can_pci_mmap_wc() will
be false if pat_enabled() is false.

Perhaps we also need to log something if we keep the
check for "nopat" where I placed it. We could say something
like: "nopat is set, but we cannot disable hardware/firmware
PAT support, so we are emulating as if there is no PAT support
which puts in place a software view that does not match

No matter what, because we cannot write to PAT MSR in
the Xen PV case, we probably need to log something to
explain the problems associated with trying to honor the
administrator's request. Also, what log level should it be.
Should it be a pr_warn instead of a pr_info?

I will agree to implement your approach of checking the new
variable where you suggest and limiting the effect of "nopat"
to not reporting PAT as enabled via pat_enabled() if there is a
consensus about what we should log to the administrator in that
case and if Juergen and/or Boris agrees with it.