Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

From: Jan Beulich
Date: Tue Jul 12 2022 - 02:04:55 EST


On 11.07.2022 19:41, Chuck Zmudzinski wrote:
> Moreover... (please move to the bottom of the code snippet
> for more information about my tests in the Xen PV environment...)
>
> void init_cache_modes(void)
> {
>     u64 pat = 0;
>
>     if (pat_cm_initialized)
>         return;
>
>     if (boot_cpu_has(X86_FEATURE_PAT)) {
>         /*
>          * CPU supports PAT. Set PAT table to be consistent with
>          * PAT MSR. This case supports "nopat" boot option, and
>          * virtual machine environments which support PAT without
>          * MTRRs. In specific, Xen has unique setup to PAT MSR.
>          *
>          * If PAT MSR returns 0, it is considered invalid and emulates
>          * as No PAT.
>          */
>         rdmsrl(MSR_IA32_CR_PAT, pat);
>     }
>
>     if (!pat) {
>         /*
>          * No PAT. Emulate the PAT table that corresponds to the two
>          * cache bits, PWT (Write Through) and PCD (Cache Disable).
>          * This setup is also the same as the BIOS default setup.
>          *
>          * PTE encoding:
>          *
>          *       PCD
>          *       |PWT  PAT
>          *       ||    slot
>          *       00    0    WB : _PAGE_CACHE_MODE_WB
>          *       01    1    WT : _PAGE_CACHE_MODE_WT
>          *       10    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
>          *       11    3    UC : _PAGE_CACHE_MODE_UC
>          *
>          * NOTE: When WC or WP is used, it is redirected to UC- per
>          * the default setup in __cachemode2pte_tbl[].
>          */
>         pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>     }
>
>     else if (!pat_bp_enabled) {
>         /*
>          * In some environments, specifically Xen PV, PAT
>          * initialization is skipped because MTRRs are
>          * disabled even though PAT is available. In such
>          * environments, set PAT to initialized and enabled to
>          * correctly indicate to callers of pat_enabled() that
>          * PAT is available and prevent PAT from being disabled.
>          */
>         pat_bp_enabled = true;
>         pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>     }
>
>     __init_cache_modes(pat);
> }
>
> This function, patched with the extra 'else if' block, fixes the
> regression on my Xen worksatation, and the pr_info message
> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
> when running this patched kernel in my Xen Dom0. This means
> that in the Xen PV environment on my Xen Dom0 workstation,
> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
> Xen Dom0 workstation. At least that is what I think my tests prove.
>
> So why is this not a valid way to test for the existence of
> PAT in the Xen PV environment? Are the existing comments
> in init_cache_modes() about supporting both the case when
> the "nopat" boot option is set and the specific case of Xen and
> MTRR disabled wrong? My testing confirms those comments are
> correct.

At the very least this ignores the possible "nopat" an admin may
have passed to the kernel.

Jan