Re: [PATCH v3 3/5] arm64/mm: Create initial page tables in init_pg_dir

From: James Morse
Date: Fri Jul 06 2018 - 04:58:27 EST


Hi Jun,

On 02/07/18 12:16, Jun Yao wrote:
> Create initial page tables in init_pg_dir and then create final
> page tables in swapper_pg_dir directly.

This is what the patch does, but doesn't preserve the why for the git-log. Could
you expand it to describe why we're doing this.


The series so far fails to boot from me. This is because the kaslr code tries to
read the kaslr-seed from the DT, via the fixmap. To do this, it needs the fixmap
tables installed, which early_fixmap_init() does.

early_fixmap_init() calls pgd_offset_k(), which assumes init_mm.pgd is in use,
so we hit a BUG_ON() when trying to setup the fixmap because we added the tables
to the wrong page tables.

If you enable 'CONFIG_RANDOMIZE_BASE', even without EFI you should see the same
thing happen.


I think we should move this hunk:
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 30ad2f085d1f..b065c08d4008 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -249,6 +249,7 @@ void __init setup_arch(char **cmdline_p)
> init_mm.end_code = (unsigned long) _etext;
> init_mm.end_data = (unsigned long) _edata;
> init_mm.brk = (unsigned long) _end;
> + init_mm.pgd = init_pg_dir;
>
> *cmdline_p = boot_command_line;
>

into early_fixmap_init(), which is the first thing setup_arch() calls.
Something like [0] fixes it.

This looks to be the only thing that goes near init_mm.pgd before paging_init().


> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7c4c8f318ba9..3b408f21fe2e 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -718,6 +718,8 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> }
> #endif
>
> +extern pgd_t init_pg_dir[PTRS_PER_PGD] __section(.init.data);
> +extern pgd_t init_pg_end[] __section(.init.data);

normally we'd spell this '__initdata', but if we move them out of the
'.init.data' section we shouldn't, otherwise the extern definition doesn't match
where the symbol appears. (Looks like I was wrong to think that tools like
sparse pick this up, that's just the __iomem/__user stuff.)


> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 2dbb2c9f1ec1..a7ab0010ff80 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -628,26 +628,10 @@ static void __init map_kernel(pgd_t *pgdp)
> */
> void __init paging_init(void)
> {
> - phys_addr_t pgd_phys = early_pgtable_alloc();
> - pgd_t *pgdp = pgd_set_fixmap(pgd_phys);
> -
> - map_kernel(pgdp);
> - map_mem(pgdp);
> -
> - /*
> - * We want to reuse the original swapper_pg_dir so we don't have to
> - * communicate the new address to non-coherent secondaries in
> - * secondary_entry, and so cpu_switch_mm can generate the address with
> - * adrp+add rather than a load from some global variable.
> - *
> - * To do this we need to go via a temporary pgd.
> - */
> - cpu_replace_ttbr1(__va(pgd_phys));
> - memcpy(swapper_pg_dir, pgdp, PGD_SIZE);
> - cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
> -
> - pgd_clear_fixmap();
> - memblock_free(pgd_phys, PAGE_SIZE);
> + map_kernel(swapper_pg_dir);
> + map_mem(swapper_pg_dir);

> + cpu_replace_ttbr1(swapper_pg_dir);

The lm_alias() here is important: cpu_replace_ttbr1() calls virt_to_phys() on
its argument, virt_to_phys() is only intended for addresses in the linear-map
region of the VA space, as it works by doing some arithmetic with the address.

swapper_pg_dir is the name of the kernel symbol, so its address will be in the
kernel-text region of the VA space.

Today virt_to_phys() catches this happening and fixes it, CONFIG_DEBUG_VIRTUAL
will give you a warning, at some point virt_to_phys()'s safety net will go-away.

The original call that did this was wrapped in lm_alias(), which gives you the
linear-map address of a symbol in the kernel-text.


Thanks,

James


[0] make sure fixmap tables go in the init page tables
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 117d080639b3..e097c78a66f8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -770,6 +771,13 @@ void __init early_fixmap_init(void)
pmd_t *pmdp;
unsigned long addr = FIXADDR_START;

+ /*
+ * During early setup we use init_pg_dir, update init_mm so its
+ * implicit use by pgd_offset_k() gets the live page tables.
+ * swapper_pg_dir is restored by paging_init().
+ */
+ init_mm.pgd = init_pg_dir;
+
pgdp = pgd_offset_k(addr);
pgd = READ_ONCE(*pgdp);
if (CONFIG_PGTABLE_LEVELS > 3 &&