Re: [PATCH v4 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size

From: Thomas Gleixner
Date: Sun Mar 24 2019 - 16:59:06 EST


On Thu, 14 Mar 2019, Baoquan He wrote:
> In memory region KASLR, __PHYSICAL_MASK_SHIFT is taken to calculate
> the initial size of the direct mapping region. This is correct in
> the old code where __PHYSICAL_MASK_SHIFT was equal to MAX_PHYSMEM_BITS,
> 46 bits, and only 4-level mode was supported.
>
> Later, in commit:
> b83ce5ee91471d ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52"),
> __PHYSICAL_MASK_SHIFT was changed to be always 52 bits, no matter it's
> 5-level or 4-level. This is wrong for 4-level paging. Then when we
> adapt physical memory region size based on available memory, it
> will overflow if the amount of system RAM and the padding is bigger
> than 64 TB.

I have no idea what that sentence means and what will overflow. Neither I
have the time to stare at the code to figure it out. Changelogs need to be
self explanatory. Providing a simple example with numbers or an
illustration would be helpful.

> In fact, here MAX_PHYSMEM_BITS should be used instead. Fix it by
> replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
>
> Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Reviewed-by: Thomas Garnier <thgarnie@xxxxxxxxxx>
> Acked-by: Kees Cook <keescook@xxxxxxxxxxxx>

So this is an actual bug fix, which is in the middle of this series. Bug
fixes go first and need to be independent of the rest of the series.

They also need a Fixes: tag.

> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index d7ea6b252594..ebf6d1d92385 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -131,7 +131,7 @@ void __init kernel_randomize_memory(void)
> if (!kaslr_memory_enabled())
> return;
>
> - kaslr_regions[0].size_tb = 1 << (__PHYSICAL_MASK_SHIFT - TB_SHIFT);
> + kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;

That said, I surely can understand why this change needs to be done, but
the changelog needs to explain the issue so someone with less experience or
someone looking at this in a year from now doesn't have to bang his head
against the wall.

Thanks,

tglx