Re: [PATCH v2 2/3] x86/mm/KASLR: Calculate the actual size of vmemmap region

From: Ingo Molnar
Date: Mon Sep 10 2018 - 02:11:59 EST



* Baoquan He <bhe@xxxxxxxxxx> wrote:

> Vmemmap region has different maximum size depending on paging mode.
> Now its size is hardcoded as 1TB in memory KASLR, this is not
> right for 5-level paging mode. It will cause overflow if vmemmap
> region is randomized to be adjacent to cpu_entry_area region and
> its actual size is bigger than 1TB.
>
> So here calculate how many TB by the actual size of vmemmap region
> and align up to 1TB boundary.
>
> Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
> ---
> arch/x86/mm/kaslr.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 0988971069c9..1db8e166455e 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -51,7 +51,7 @@ static __initdata struct kaslr_memory_region {
> } kaslr_regions[] = {
> { &page_offset_base, 0 },
> { &vmalloc_base, 0 },
> - { &vmemmap_base, 1 },
> + { &vmemmap_base, 0 },
> };
>
> /* Get size in bytes used by the memory region */
> @@ -77,6 +77,7 @@ void __init kernel_randomize_memory(void)
> unsigned long rand, memory_tb;
> struct rnd_state rand_state;
> unsigned long remain_entropy;
> + unsigned long vmemmap_size;
>
> vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
> vaddr = vaddr_start;
> @@ -108,6 +109,14 @@ void __init kernel_randomize_memory(void)
> if (memory_tb < kaslr_regions[0].size_tb)
> kaslr_regions[0].size_tb = memory_tb;
>
> + /*
> + * Calculate how many TB vmemmap region needs, and align to
> + * 1TB boundary.
> + * */

Yeah, so that's not the standard comment style ...

> + vmemmap_size = (kaslr_regions[0].size_tb << (TB_SHIFT - PAGE_SHIFT)) *
> + sizeof(struct page);
> + kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);

So I tried to review what all this code does, and the comments aren't too great to explain the
concepts.

For example:

/*
* Memory regions randomized by KASLR (except modules that use a separate logic
* earlier during boot). The list is ordered based on virtual addresses. This
* order is kept after randomization.
*/
static __initdata struct kaslr_memory_region {
unsigned long *base;
unsigned long size_tb;
} kaslr_regions[] = {
{ &page_offset_base, 0 },
{ &vmalloc_base, 0 },
{ &vmemmap_base, 1 },
};

So I get the part where the 'base' pointer is essentially pointers to various global variables
used by the MM to get the virtual base address of the kernel, vmalloc and vmemmap areas from,
which base addresses can thus be modified by the very early KASLR code to dynamically shape the
virtual memory layout of these kernel memory areas on a per bootup basis.

(BTW., that would be a great piece of information to add for the uninitiated. It's not like
it's obvious!)

But what does 'size_tb' do? Nothing explains it and your patch doesn't make it clearer either.
Also, get_padding() looks like an unnecessary layer of obfuscation:

/* Get size in bytes used by the memory region */
static inline unsigned long get_padding(struct kaslr_memory_region *region)
{
return (region->size_tb << TB_SHIFT);
}

It's used only twice and we do bit shifts in the parent function anyway so it's not like it's
hiding some uninteresting detail. (The style ugliness of the return statement makes it annoying
as well.)

So could we please first clean up this code, explain it properly, name the fields properly,
etc., before modifying it? Because it still looks unnecessarily hard to review. I.e. this early
boot code needs improvements of quality and neither the base code nor your patches give me the
impression of carefully created, easy to maintain code.

Thanks,

Ingo