Re: [PATCH v2] Subject: x86/PAT: Report PAT on CPUs that support PAT without MTRR

From: Thorsten Leemhuis
Date: Fri Jul 15 2022 - 01:01:14 EST

On 15.07.22 04:07, Chuck Zmudzinski wrote:
> On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
>> On 13.07.22 03:36, Chuck Zmudzinski wrote:
>>> The commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>>> incorrectly failed to account for the case in init_cache_modes() when
>>> CPUs do support PAT and falsely reported PAT to be disabled when in
>>> fact PAT is enabled. In some environments, notably in Xen PV domains,
>>> MTRR is disabled but PAT is still enabled, and that is the case
>>> that the aforementioned commit failed to account for.
>>> As an unfortunate consequnce, the pat_enabled() function currently does
>>> not correctly report that PAT is enabled in such environments. The fix
>>> is implemented in init_cache_modes() by setting pat_bp_enabled to true
>>> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
>>> to account for.
>>> This approach arranges for pat_enabled() to return true in the Xen PV
>>> environment without undermining the rest of PAT MSR management logic
>>> that considers PAT to be disabled: Specifically, no writes to the PAT
>>> MSR should occur.
>>> This patch fixes a regression that some users are experiencing with
>>> Linux as a Xen Dom0 driving particular Intel graphics devices by
>>> correctly reporting to the Intel i915 driver that PAT is enabled where
>>> previously it was falsely reporting that PAT is disabled. Some users
>>> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
>>> Dom0 are experiencing reduced graphics performance because the keying of
>>> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
>>> means that in particular graphics frame buffer accesses are quite a bit
>>> less performant than possible without this patch.
>>> Also, with the current code, in the Xen PV environment, PAT will not be
>>> disabled if the administrator sets the "nopat" boot option. Introduce
>>> a new boolean variable, pat_force_disable, to forcibly disable PAT
>>> when the administrator sets the "nopat" option to override the default
>>> behavior of using the PAT configuration that Xen has provided.
>>> For the new boolean to live in, init_cache_modes() also needs
>>> moving to .init.text (where it could/should have lived already before).
>>> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>> BTW, "submitting-patches.rst" says it should just be "the first 12
>> characters of the SHA-1 ID"
> Actually it says "at least",

/me looks a bit confused, as he quoted above directly from that file

Ohh, that "at least" is in a different section of the document. :-D For
the fixes tag is explicitly "12 characters":

If your patch fixes a bug in a specific commit, e.g. you found an issue
using git bisect, please use the ‘Fixes:’ tag with the first 12
characters of the SHA-1 ID, and the one line summary.

> so more that 12 is It is probably safest
> to put the entire SHA-1 ID in because of the possibility of
> a collision.

Yes, sure, but it's a little mess if everybody uses something different
and 12 characters is what was agreed on as "good enough" for now.

> At least that's how I understand what
> submitting-patches.rst.
>>> Co-developed-by: Jan Beulich <jbeulich@xxxxxxxx>
>>> Cc: stable@xxxxxxxxxxxxxxx
>>> Signed-off-by: Chuck Zmudzinski <brchuckz@xxxxxxx>
>> Sorry, have to ask: is this supposed to finally fix this regression?
> Yes that's the first report I saw to lkml about this isssue.

Thx for confirming.

> So if I submit a v3 I will include that.


> But my patch does not have a sign-off from the
> Co-developer as I mentioned in a message earlier today, and the
> discussion that has ensued leads me to think a better solution is to just
> revert the commit in the i915 driver instead, and leave the bigger questions
> for Juergen Gross and his plans to re-work the x86/PAT code in September,
> as he said on this thread in the last couple of days. This patch is dead
> now,
> as far as I can tell, because the Co-developer is not cooperating.

k, thx for the quick summary, this whole thing is a bit hard to follow
for me: I can only briefly look into most regressions, as there is only
so much time in a day...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.