[PATCH v2 2/2] x86/mm/KASLR: Do not adapt size of the direct mapping section for SGI UV system

From: Baoquan He
Date: Sat May 20 2017 - 08:03:07 EST


On SGI UV system, kernel casually hang with kaslr enabled.

The back trace is:

kernel BUG at arch/x86/mm/init_64.c:311!
invalid opcode: 0000 [#1] SMP
[...]
RIP: 0010:__init_extra_mapping+0x188/0x196
[...]
Call Trace:
init_extra_mapping_uc+0x13/0x15
map_high+0x67/0x75
map_mmioh_high_uv3+0x20a/0x219
uv_system_init_hub+0x12d9/0x1496
uv_system_init+0x27/0x29
native_smp_prepare_cpus+0x28d/0x2d8
kernel_init_freeable+0xdd/0x253
? rest_init+0x80/0x80
kernel_init+0xe/0x110
ret_from_fork+0x2c/0x40

The root cause is that SGI UV system needs map its MMIOH region to direct
mapping section and the mapping happens in rest_init(). However mm KASLR
is done in kernel_randomize_memory() which is much earlier than MMIOH
mapping of SGI UV and doesn't count in the MMIOH regions. When kaslr
disabled, there are 64TB space for system RAM to do direct mapping. Both
system RAM and SGI UV MMIOH region share this 64TB space. With kaslr
enabled, mm KASLR only reserves the actual size of system RAM plus 10TB
for direct mapping usage. Then later MMIOH mapping of SGI UV could go
beyond the upper bound of direct mapping section to step into VMALLOC or
VMEMMAP area. Then the BUG_ON() in __init_extra_mapping() will be
triggered.

E.g on the SGI UV3 machine where this bug is reported , there are two MMIOH
regions:

[ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
[ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000

They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G ram are
spread out to 1TB regions. Then above two SGI MMIOH regions also will be
mapped into the direct mapping section.

To fix it, we need check if it's SGI UV system by calling
is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt the
size of the direct mapping section. Do it now.

Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: Thomas Garnier <thgarnie@xxxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx>
---
arch/x86/mm/kaslr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index aed2064..20b0456 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -27,6 +27,7 @@
#include <asm/pgtable.h>
#include <asm/setup.h>
#include <asm/kaslr.h>
+#include <asm/uv/uv.h>

#include "mm_internal.h"

@@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;

/* Adapt phyiscal memory region size based on available memory */
- if (memory_tb < kaslr_regions[0].size_tb)
+ if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
kaslr_regions[0].size_tb = memory_tb;

/* Calculate entropy available between regions */
--
2.5.5