Re: [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64)

From: Thomas Garnier
Date: Tue Jun 21 2016 - 12:47:07 EST


On Fri, Jun 17, 2016 at 3:26 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1993,6 +1993,23 @@ config PHYSICAL_ALIGN
>>
>> Don't change this unless you know what you are doing.
>>
>> +config RANDOMIZE_MEMORY
>> + bool "Randomize the kernel memory sections"
>> + depends on X86_64
>> + depends on RANDOMIZE_BASE
>> + default RANDOMIZE_BASE
>> + ---help---
>> + Randomizes the base virtual address of kernel memory sections
>> + (physical memory mapping, vmalloc & vmemmap). This security feature
>> + makes exploits relying on predictable memory locations less reliable.
>> +
>> + The order of allocations remains unchanged. Entropy is generated in
>> + the same way as RANDOMIZE_BASE. Current implementation in the optimal
>> + configuration have in average 30,000 different possible virtual
>> + addresses for each memory section.
>> +
>> + If unsure, say N.
>
> So this is really poor naming!
>
> The user, in first approximation, will see something like this:
>
> Randomize the kernel memory sections (RANDOMIZE_MEMORY) [Y/n/?] (NEW)
>
> ... and could naively conclude that this feature will randomize memory contents.
>
> Furthermore, there's no apparent connection to another, related kernel feature:
>
> Randomize the address of the kernel image (KASLR) (RANDOMIZE_BASE) [Y/n/?]
>
> A better naming would be something like:
>
> Randomize other static kernel memory addresses (RANDOMIZE_ALL) [Y/n/?] (NEW)
>
> (assuming it's truly 'all')
>
> But ... I don't see it explained anywhere why a user, once he expressed interest
> in randomization would even want to _not_ randomize as much as possible. I.e. why
> does this new option exist at all, shouldn't we just gradually extend
> randomization to more memory areas and control it via CONFIG_RANDOMIZE_BASE?
>
> Btw., CONFIG_RANDOMIZE_BASE is probably a misnomer, and after these patches it
> will be more of a misnomer - so we should rename it to something better. For
> example CONFIG_KASLR would have the advantage of being obviously named at a
> glance, to anyone who knows what 'ASLR' means. To those who don't know the short
> description will tell it:
>
> Kernel address space layout randomization (KASLR) [Y/n/?]
>
> This would also fit the kernel internal naming of kaslr.c/etc.
>
> What do you think?
>
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -0,0 +1,167 @@
>> +/*
>> + * This file implements KASLR memory randomization for x86_64. It randomizes
>> + * the virtual address space of kernel memory sections (physical memory
>> + * mapping, vmalloc & vmemmap) for x86_64. This security feature mitigates
>> + * exploits relying on predictable kernel addresses.
>> + *
>> + * Entropy is generated using the KASLR early boot functions now shared in
>> + * the lib directory (originally written by Kees Cook). Randomization is
>> + * done on PGD & PUD page table levels to increase possible addresses. The
>> + * physical memory mapping code was adapted to support PUD level virtual
>> + * addresses. This implementation on the best configuration provides 30,000
>> + * possible virtual addresses in average for each memory section. An
>> + * additional low memory page is used to ensure each CPU can start with a
>> + * PGD aligned virtual address (for realmode).
>> + *
>> + * The order of each memory section is not changed. The feature looks at
>> + * the available space for the sections based on different configuration
>> + * options and randomizes the base and space between each. The size of the
>> + * physical memory mapping is the available physical memory.
>> + *
>> + */
>
> (Stray newline.)
>
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +#include <linux/mm.h>
>> +#include <linux/smp.h>
>> +#include <linux/init.h>
>> +#include <linux/memory.h>
>> +#include <linux/random.h>
>> +
>> +#include <asm/processor.h>
>> +#include <asm/pgtable.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/e820.h>
>> +#include <asm/init.h>
>> +#include <asm/setup.h>
>> +#include <asm/kaslr.h>
>> +#include <asm/kasan.h>
>> +
>> +#include "mm_internal.h"
>
> How many of those include lines are truly unnecessary, or did this just got copied
> from another file?
>

No that's correct. I will clean that up.

>> +/*
>> + * Memory base and end randomization is based on different configurations.
>> + * We want as much space as possible to increase entropy available.
>> + */
>> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
>> +
>> +#if defined(CONFIG_KASAN)
>> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
>> +#elif defined(CONFIG_X86_ESPFIX64)
>> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
>> +#elif defined(CONFIG_EFI)
>> +static const unsigned long memory_rand_end = EFI_VA_START;
>> +#else
>> +static const unsigned long memory_rand_end = __START_KERNEL_map;
>> +#endif
>
> It's unclear to me from the naming and the description whether these are virtual
> or physical memory ranges. Nor is it explained why this laundry list of kernel
> options influences the randomization range - and it's not explained how new kernel
> features modifying the virtual memory layout should plug into this mechanism.
>
> Could we please get a bit more verbose introduction into why all of this is done?
>

The comment was describing that we want as much as possible for
entropy. I will make that clearer and I will add information on how to
change changing these variables.

>> +/* Default values */
>> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
>> +EXPORT_SYMBOL(page_offset_base);
>> +unsigned long vmalloc_base = __VMALLOC_BASE;
>> +EXPORT_SYMBOL(vmalloc_base);
>> +unsigned long vmemmap_base = __VMEMMAP_BASE;
>> +EXPORT_SYMBOL(vmemmap_base);
>> +
>> +/* Describe each randomized memory sections in sequential order */
>> +static __initdata struct kaslr_memory_region {
>> + unsigned long *base;
>> + unsigned short size_tb;
>> +} kaslr_regions[] = {
>> + { &page_offset_base, 64/* Maximum */ },
>> + { &vmalloc_base, VMALLOC_SIZE_TB },
>> + { &vmemmap_base, 1 },
>> +};
>
> So this comment is totally misleading. This area does not 'describe' each memory
> section, it actually puts live pointers to virtual memory address offset control
> variables used by other subsystems into this array - allowing us to randomize them
> programmatically.
>
> This patch should be split up into 4 components:
>
> - core patch adding all the infrastructure but not actually randomizing anything,
> the kaslr_regions[] array is empty.
>
> - a patch adding page_offset_base randomization.
>
> - a patch adding vmalloc_base randomization.
>
> - a patch adding vmemmap_base randomization.
>
> This way if any breakage is introduced by this randomization it can be bisected to
> in a finegrained fashion. It would also have saved 5 minutes from my life trying
> to interpret the code here :-/
>
> In fact I'd suggest a series of 7 patches, with each randomization patch split
> into two further patches:
>
> - a patch making address offset variables dynamic.
>
> - a patch actually enabling the randomization of that virtual address range.
>
> This structure allows us to carefully evaluate the code size and runtime impact of
> randomizing a given range.
>

Make sense. I will build a 4 component patch a described.

>> +
>> +/* Size in Terabytes + 1 hole */
>> +static __init unsigned long get_padding(struct kaslr_memory_region *region)
>> +{
>> + return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
>> +}
>
> So why is size_tb a short, forcing ugly type casts like this? Also, since this
> appears to be a trivial shift, why isn't this an inline?
>

Changed.

>> +/* Initialize base and padding for each memory section randomized with KASLR */
>> +void __init kernel_randomize_memory(void)
>> +{
>> + size_t i;
>> + unsigned long addr = memory_rand_start;
>> + unsigned long padding, rand, mem_tb;
>> + struct rnd_state rnd_st;
>> + unsigned long remain_padding = memory_rand_end - memory_rand_start;
>
> Yeah, so these variable names are awful :-/
>
> Look at how postfix and prefix naming is mixed in the same function:
> 'memory_rand_start', but 'remain_padding'. Please use postfixes for all of them.
>
> Also why is one variable named 'rand', another 'rnd'? Did we run out of 'a'
> characters?
>
> Similarly, we have 'memory' and 'mem'. Pick one variant and harmonize!
>
> This patch is an object lesson in how fragile, insecure code is written.
>

Will change.

>> +
>> + /*
>> + * All these BUILD_BUG_ON checks ensures the memory layout is
>> + * consistent with the current KASLR design.
>> + */
>> + BUILD_BUG_ON(memory_rand_start >= memory_rand_end);
>> + BUILD_BUG_ON(config_enabled(CONFIG_KASAN) &&
>> + memory_rand_end >= ESPFIX_BASE_ADDR);
>> + BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
>> + config_enabled(CONFIG_X86_ESPFIX64)) &&
>> + memory_rand_end >= EFI_VA_START);
>> + BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
>> + config_enabled(CONFIG_X86_ESPFIX64) ||
>> + config_enabled(CONFIG_EFI)) &&
>> + memory_rand_end >= __START_KERNEL_map);
>> + BUILD_BUG_ON(memory_rand_end > __START_KERNEL_map);
>> +
>> + if (!kaslr_enabled())
>> + return;
>> +
>> + BUG_ON(kaslr_regions[0].base != &page_offset_base);
>> + mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
>
> Why is the granularity of the randomization 1TB? I don't see this explained
> anywhere. Some of the randomization may have PUD-granularity limitation - but PUD
> entries have a size of 512 GB.
>
> I think we should extend the kaslr_regions[] table with a minimum required
> alignment field instead of random granularity limitations.
>

I will change it so that the next region can start on the next PUD
entry with minimum padding.

>> +/*
>> + * Create PGD aligned trampoline table to allow real mode initialization
>> + * of additional CPUs. Consume only 1 low memory page.
>> + */
>> +void __meminit init_trampoline(void)
>
> This too should probably be a separate patch, for bisectability and reviewability
> reasons.
>

Will do.

>> + unsigned long addr, next;
>> + pgd_t *pgd;
>> + pud_t *pud_page, *tr_pud_page;
>> + int i;
>
> I had to look twice to understand that 'tr' stands for 'trampoline'. Please rename
> to 'pud_page_tramp' or so.
>

Will do

>> + if (!kaslr_enabled())
>> + return;
>> +
>> + tr_pud_page = alloc_low_page();
>> + set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
>
> Shouldn't we set it _after_ we've initialized the tr_pud_page page table?
>
> Also, why _PAGE_TABLE? AFAICS using _PAGE_TABLE adds _PAGE_USER.
>

It is used by pgd_populate in kernel_physical_mapping_init. I will
update to _KERNPG_TABLE.

>> +
>> + addr = 0;
>> + pgd = pgd_offset_k((unsigned long)__va(addr));
>
> Why the type cast?
>
> Also, similar complaint to the first patch: why is a physical address named in an
> ambiguous fashion, as 'addr'?
>

Will adapt as the other patch.

>> + pud_page = (pud_t *) pgd_page_vaddr(*pgd);
>
> Side note: it appears all users of pgd_page_vaddr() are casting it immediately to
> a pointer type. Could we change the return type of this function to 'void *' or
> 'pud_t *'? AFAICS it's mostly used in platform code.
>

I agree but I think that should be a separate change.

>> +
>> + for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
>> + pud_t *pud, *tr_pud;
>> +
>> + tr_pud = tr_pud_page + pud_index(addr);
>> + pud = pud_page + pud_index((unsigned long)__va(addr));
>
> Why is this type cast needed?
>

Because __va return a pointer and pud_index expect an unsigned long. I
will edit in a similar way as the other patch.

>> + next = (addr & PUD_MASK) + PUD_SIZE;
>> +
>> + /* Needed to copy pte or pud alike */
>> + BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
>> + *tr_pud = *pud;
>
> So I don't understand what brought the size of 'pte' into this code. We are
> copying pud entries around AFAICS, why should the size of the pte entry matter?
>

To support PSE. Depending of the page_size_mask this entry can be a
pud entry or a pte. I will remove the BUILD_BUG_ON, it is confusing.

> Thanks,
>
> Ingo>