Re: [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system
From: Baoquan He
Date: Sat Feb 16 2019 - 21:09:56 EST
Hi Mike,
On 02/16/19 at 10:00pm, Baoquan He wrote:
> On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
> KASLR makes kernel work well.
I wrap codes which calculate the size of the direct mapping section
into a new function calc_direct_mapping_size() as Ingo suggested. This
code change has passed basic testing, but hasn't been tested on a
SGI UV machine after reproducing since it needs UV machine with UV
module installed of enough size.
To reproduce it, we can apply patches 0001~0005. If reproduced, patch
0006 can be applied on top to check if bug is fixed. Please help check
if the code is OK, if you have a machine, I can have a test.
Thanks
Baoquan
>
> The back trace is:
>
> kernel BUG at arch/x86/mm/init_64.c:311!
> invalid opcode: 0000 [#1] SMP
> [...]
> RIP: 0010:__init_extra_mapping+0x188/0x196
> [...]
> Call Trace:
> init_extra_mapping_uc+0x13/0x15
> map_high+0x67/0x75
> map_mmioh_high_uv3+0x20a/0x219
> uv_system_init_hub+0x12d9/0x1496
> uv_system_init+0x27/0x29
> native_smp_prepare_cpus+0x28d/0x2d8
> kernel_init_freeable+0xdd/0x253
> ? rest_init+0x80/0x80
> kernel_init+0xe/0x110
> ret_from_fork+0x2c/0x40
>
> This is because the SGI UV system need map its MMIOH region to the direct
> mapping section, and the mapping happens in rest_init() which is much
> later than the calling of kernel_randomize_memory() to do mm KASLR. So
> mm KASLR can't count in the size of the MMIOH region when calculate the
> needed size of address space for the direct mapping section.
>
> When KASLR is disabled, there are 64TB address space for both system RAM
> and the MMIOH regions to share. When KASLR is enabled, the current code
> of mm KASLR only reserves the actual size of system RAM plus extra 10TB
> for the direct mapping. Thus later the MMIOH mapping could go beyond
> the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
> Then BUG_ON() in __init_extra_mapping() will be triggered.
>
> E.g on the SGI UV3 machine where this bug was reported , there are two
> MMIOH regions:
>
> [ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
> [ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
>
> They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
> spread out to 1TB regions. Then above two SGI MMIOH regions also will be
> mapped into the direct mapping section.
>
> To fix it, we need check if it's SGI UV system by calling
> is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt
> thesize of the direct mapping section, just keep it as is, e.g in level-4
> paging mode, 64TB.
>
> Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
> ---
> arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 42 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index ca12ed4e5239..754b5da91d43 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -29,6 +29,7 @@
> #include <asm/pgtable.h>
> #include <asm/setup.h>
> #include <asm/kaslr.h>
> +#include <asm/uv/uv.h>
>
> #include "mm_internal.h"
>
> @@ -113,15 +114,51 @@ static inline bool kaslr_memory_enabled(void)
> return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
> }
>
> +/*
> + * Even though a huge virtual address space is reserved for the direct
> + * mapping of physical memory, e.g in 4-level pageing mode, it's 64TB,
> + * rare system can own enough physical memory to use it up, most are
> + * even less than 1TB. So with KASLR enabled, we adapt the size of
> + * direct mapping area to size of actual physical memory plus the
> + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> + * The left part will be taken out to join memory randomization.
> + *
> + * Note that UV system is an exception, its MMIOH region need be mapped
> + * into the direct mapping area too, while the size can't be got until
> + * rest_init() calling. Hence for UV system, do not adapt the size
> + * of direct mapping area.
> + */
> +static inline unsigned long calc_direct_mapping_size(void)
> +{
> + unsigned long size_tb, memory_tb;
> +
> + /*
> + * Update Physical memory mapping to available and
> + * add padding if needed (especially for memory hotplug support).
> + */
> + memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +
> + size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> +
> + /*
> + * Adapt phyiscal memory region size based on available memory if
> + * it's not UV system.
> + */
> + if (memory_tb < size_tb && !is_early_uv_system())
> + size_tb = memory_tb;
> +
> + return size_tb;
> +}
> +
> /* Initialize base and padding for each memory region randomized with KASLR */
> void __init kernel_randomize_memory(void)
> {
> - size_t i;
> - unsigned long vaddr_start, vaddr;
> - unsigned long rand, memory_tb;
> - struct rnd_state rand_state;
> + unsigned long vaddr_start, vaddr, rand;
> unsigned long remain_entropy;
> unsigned long vmemmap_size;
> + struct rnd_state rand_state;
> + size_t i;
>
> vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
> vaddr = vaddr_start;
> @@ -138,20 +175,10 @@ void __init kernel_randomize_memory(void)
> if (!kaslr_memory_enabled())
> return;
>
> - kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> + kaslr_regions[0].size_tb = calc_direct_mapping_size();
> kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
>
> - /*
> - * Update Physical memory mapping to available and
> - * add padding if needed (especially for memory hotplug support).
> - */
> BUG_ON(kaslr_regions[0].base != &page_offset_base);
> - memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> - CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> -
> - /* Adapt phyiscal memory region size based on available memory */
> - if (memory_tb < kaslr_regions[0].size_tb)
> - kaslr_regions[0].size_tb = memory_tb;
>
> /*
> * Calculate how many TB vmemmap region needs, and align to
> --
> 2.17.2
>