Re: [tip:x86/urgent] x86/PAT: Fix Xorg regression on CPUs that don't support PAT

From: Mikulas Patocka
Date: Mon May 29 2017 - 18:51:19 EST




On Sun, 28 May 2017, Andy Lutomirski wrote:

> On Sun, May 28, 2017 at 11:18 AM, Bernhard Held <berny156@xxxxxx> wrote:
> > Hi,
> >
> > this patch breaks the boot of my kernel. The last message is "Booting
> > the kernel.".
> >
> > My setup might be unusual: I'm running a Xenon E5450 (LGA 771) in a
> > Gigbayte G33-DS3R board (LGA 775). The BIOS is patched with the
> > microcode of the E5450 and recognizes the CPU.
> >
> > Please find below the dmesg of a the latest kernel w/o the PAT-patch.
> > I'm happy to provide more information or to test patches.

Hi

Please do the following three tests and test if the kernel boots.

1. use the PAT patch and revert the change to the function pat_enabled()
- i.e. change it to the original:
bool pat_enabled(void)
{
return !!__pat_enabled;
}

2. use the PAT patch and revert the change to the function pat_ap_init
- i.e. change it to the original:
static void pat_ap_init(u64 pat)
{
if (!boot_cpu_has(X86_FEATURE_PAT)) {

3. use the full PAT patch and apply the below patch on the top of it.

> I think this patch is bogus. pat_enabled() sure looks like it's
> supposed to return true if PAT is *enabled*, and these days PAT is
> "enabled" even if there's no HW PAT support. Even if the patch were
> somehow correct, it should have been split up into two patches, one to
> change pat_enabled() and one to use this_cpu_has().
>
> Ingo, I'd suggest reverting the patch, cc-ing stable on the revert so
> -stable knows not to backport it, and starting over with the fix.
> >From very brief inspection, the right fix is to make sure that
> pat_init(), or at least init_cache_modes(), gets called on the

pat_init() needs to be called with cache disabled - and the cache disable
code (functions prepare_set() and post_set()) exists in
arch/x86/kernel/cpu/mtrr/generic.c - it may not be compiled if CONFIG_MTRR
is not set.

Though, it is possible to call init_cache_modes() - see the patch below.
init_cache_modes() does nothing if it is called multiple times.

> affected CPUs.
>
> As a future cleanup, I think that pat_enabled() could be deleted
> outright and, if needed, replaced by functions like have_memtype_wc()
> or similar. (Do we already have helpers like that?) Toshi, am I
> right?
>
> --Andy


---
arch/x86/include/asm/pat.h | 1 +
arch/x86/kernel/setup.c | 1 +
arch/x86/mm/pat.c | 3 +--
3 files changed, 3 insertions(+), 2 deletions(-)

Index: linux-stable/arch/x86/include/asm/pat.h
===================================================================
--- linux-stable.orig/arch/x86/include/asm/pat.h
+++ linux-stable/arch/x86/include/asm/pat.h
@@ -8,6 +8,7 @@

void pat_disable(const char *reason);
extern void pat_init(void);
+extern void init_cache_modes(void);

extern int reserve_memtype(u64 start, u64 end,
enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm);
Index: linux-stable/arch/x86/kernel/setup.c
===================================================================
--- linux-stable.orig/arch/x86/kernel/setup.c
+++ linux-stable/arch/x86/kernel/setup.c
@@ -1074,6 +1074,7 @@ void __init setup_arch(char **cmdline_p)

/* update e820 for memory not covered by WB MTRRs */
mtrr_bp_init();
+ init_cache_modes();
if (mtrr_trim_uncached_memory(max_pfn))
max_pfn = e820_end_of_ram_pfn();

Index: linux-stable/arch/x86/mm/pat.c
===================================================================
--- linux-stable.orig/arch/x86/mm/pat.c
+++ linux-stable/arch/x86/mm/pat.c
@@ -39,7 +39,6 @@
static bool boot_cpu_done;

static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
-static void init_cache_modes(void);

void pat_disable(const char *reason)
{
@@ -237,7 +236,7 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}

-static void init_cache_modes(void)
+void init_cache_modes(void)
{
u64 pat = 0;
static int init_cm_done;