Re: [PATCH v2 2/2] x86/mm/KASLR: Do not adapt size of the direct mapping section for SGI UV system

From: Baoquan He
Date: Thu Aug 31 2017 - 02:21:38 EST


Hi all,

Since this is a blocker bug found on SGI UV system and only happen on
SGI UV system, and expert from HPE SGI UV dev team, Mike Travis sent
private mail to me saying that I can add his Acked-by to this patchset
if repost, I will repost with updated patch log. Currently without this
fix, SGI UV system will panic during boot with very high possibility.

On 05/20/17 at 08:02pm, Baoquan He wrote:
> On SGI UV system, kernel casually hang with kaslr enabled.
>
> The back trace is:
>
> kernel BUG at arch/x86/mm/init_64.c:311!
> invalid opcode: 0000 [#1] SMP
> [...]
> RIP: 0010:__init_extra_mapping+0x188/0x196
> [...]
> Call Trace:
> init_extra_mapping_uc+0x13/0x15
> map_high+0x67/0x75
> map_mmioh_high_uv3+0x20a/0x219
> uv_system_init_hub+0x12d9/0x1496
> uv_system_init+0x27/0x29
> native_smp_prepare_cpus+0x28d/0x2d8
> kernel_init_freeable+0xdd/0x253
> ? rest_init+0x80/0x80
> kernel_init+0xe/0x110
> ret_from_fork+0x2c/0x40
>
> The root cause is that SGI UV system needs map its MMIOH region to direct
> mapping section and the mapping happens in rest_init(). However mm KASLR
> is done in kernel_randomize_memory() which is much earlier than MMIOH
> mapping of SGI UV and doesn't count in the MMIOH regions. When kaslr
> disabled, there are 64TB space for system RAM to do direct mapping. Both
> system RAM and SGI UV MMIOH region share this 64TB space. With kaslr
> enabled, mm KASLR only reserves the actual size of system RAM plus 10TB
> for direct mapping usage. Then later MMIOH mapping of SGI UV could go
> beyond the upper bound of direct mapping section to step into VMALLOC or
> VMEMMAP area. Then the BUG_ON() in __init_extra_mapping() will be
> triggered.
>
> E.g on the SGI UV3 machine where this bug is reported , there are two MMIOH
> regions:
>
> [ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
> [ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
>
> They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G ram are
> spread out to 1TB regions. Then above two SGI MMIOH regions also will be
> mapped into the direct mapping section.
>
> To fix it, we need check if it's SGI UV system by calling
> is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt the
> size of the direct mapping section. Do it now.
>
> Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: Thomas Garnier <thgarnie@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx>
> ---
> arch/x86/mm/kaslr.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index aed2064..20b0456 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -27,6 +27,7 @@
> #include <asm/pgtable.h>
> #include <asm/setup.h>
> #include <asm/kaslr.h>
> +#include <asm/uv/uv.h>
>
> #include "mm_internal.h"
>
> @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void)
> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>
> /* Adapt phyiscal memory region size based on available memory */
> - if (memory_tb < kaslr_regions[0].size_tb)
> + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
> kaslr_regions[0].size_tb = memory_tb;
>
> /* Calculate entropy available between regions */
> --
> 2.5.5
>