[v4.12-rc3] Early boot panic on Broadwell

From: Chris Wilson
Date: Thu Jun 01 2017 - 04:28:02 EST


Hi guys,

I hit an early boot panic on a Broadwell laptop (xps13-9343) that I
bisected to:

commit cbed27cdf0e3f7ea3b2259e86b9e34df02be3fe4
Author: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Date: Tue Apr 18 15:07:11 2017 -0400

x86/PAT: Fix Xorg regression on CPUs that don't support PAT

In the file arch/x86/mm/pat.c, there's a '__pat_enabled' variable. The
variable is set to 1 by default and the function pat_init() sets
__pat_enabled to 0 if the CPU doesn't support PAT.

However, on AMD K6-3 CPUs, the processor initialization code never calls
pat_init() and so __pat_enabled stays 1 and the function pat_enabled()
returns true, even though the K6-3 CPU doesn't support PAT.

The result of this bug is that a kernel warning is produced when attempting to
start the Xserver and the Xserver doesn't start (fork() returns ENOMEM).
Another symptom of this bug is that the framebuffer driver doesn't set the
K6-3 MTRR registers:

x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3891 at arch/x86/mm/pat.c:1020 untrack_pfn+0x5c/0x9f
...
x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining

To fix the bug change pat_enabled() so that it returns true only if PAT
initialization was actually done.

Also, I changed boot_cpu_has(X86_FEATURE_PAT) to
this_cpu_has(X86_FEATURE_PAT) in pat_ap_init(), so that we check the PAT
feature on the processor that is being initialized.

In my testing, I found that reverting the /boot_cpu_has/this_cpu_has/
change was enough to restore working behaviour:

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 83a59a6..c537bfb 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -234,7 +234,7 @@ static void pat_bsp_init(u64 pat)

static void pat_ap_init(u64 pat)
{
- if (!this_cpu_has(X86_FEATURE_PAT)) {
+ if (!boot_cpu_has(X86_FEATURE_PAT)) {
/*
* If this happens we are on a secondary CPU, but switched to
* PAT on the boot CPU. We have no way to undo PAT.

Seems scary enough that different cpus may have different features, but
that may just be a symptom of the boot phase?
-Chris

--
Chris Wilson, Intel Open Source Technology Centre