Re: [PATCH v3] RISC-V: enable XIP
From: Anup Patel
Date: Tue Dec 22 2020 - 06:27:07 EST
On Tue, Dec 22, 2020 at 3:55 PM Vitaly Wool <vitaly.wool@xxxxxxxxxxxx> wrote:
>
> Hi Anup,
>
> On Tue, Dec 22, 2020 at 6:16 AM Anup Patel <anup@xxxxxxxxxxxxxx> wrote:
> >
> > On Tue, Dec 22, 2020 at 2:08 AM Vitaly Wool <vitaly.wool@xxxxxxxxxxxx> wrote:
> > >
> > > Introduce XIP (eXecute In Place) support for RISC-V platforms.
> > > It allows code to be executed directly from non-volatile storage
> > > directly addressable by the CPU, such as QSPI NOR flash which can
> > > be found on many RISC-V platforms. This makes way for significant
> > > optimization of RAM footprint. The XIP kernel is not compressed
> > > since it has to run directly from flash, so it will occupy more
> > > space on the non-volatile storage to The physical flash address
> > > used to link the kernel object files and for storing it has to
> > > be known at compile time and is represented by a Kconfig option.
> > >
> > > XIP on RISC-V will currently only work on MMU-enabled kernels.
> > >
> > > Changed in v2:
> > > - dedicated macro for XIP address fixup when MMU is not enabled yet
> > > o both for 32-bit and 64-bit RISC-V
> > > - SP is explicitly set to a safe place in RAM before __copy_data call
> > > - removed redundant alignment requirements in vmlinux-xip.lds.S
> > > - changed long -> uintptr_t typecast in __XIP_FIXUP macro.
> > >
> > > Changed in v3:
> > > - rebased against latest for-next
> > > - XIP address fixup macro now takes an argument
> > > - SMP related fixes
> > > Signed-off-by: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> > > ---
> > > arch/riscv/Kconfig | 46 ++++++++-
> > > arch/riscv/Makefile | 8 +-
> > > arch/riscv/boot/Makefile | 13 +++
> > > arch/riscv/include/asm/pgtable.h | 56 +++++++++--
> > > arch/riscv/kernel/cpu_ops_sbi.c | 3 +
> > > arch/riscv/kernel/head.S | 69 +++++++++++++-
> > > arch/riscv/kernel/head.h | 3 +
> > > arch/riscv/kernel/setup.c | 8 +-
> > > arch/riscv/kernel/vmlinux-xip.lds.S | 132 ++++++++++++++++++++++++++
> > > arch/riscv/kernel/vmlinux.lds.S | 6 ++
> > > arch/riscv/mm/init.c | 142 +++++++++++++++++++++++++---
> > > 11 files changed, 460 insertions(+), 26 deletions(-)
> > > create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S
> >
> > If possible please break down this patch into smaller patches
> > which are bisect-able and easy to reivew.
>
> I was thinking about that but any such split would look artificial to
> me. It would obscure the idea in my opinion rather than help
> understand it.
>
> > >
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 2b41f6d8e458..fabafdf763da 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -398,7 +398,7 @@ config EFI_STUB
> > >
> > > config EFI
> > > bool "UEFI runtime support"
> > > - depends on OF
> > > + depends on OF && !XIP_KERNEL
> > > select LIBFDT
> > > select UCS2_STRING
> > > select EFI_PARAMS_FROM_FDT
> > > @@ -415,12 +415,52 @@ config EFI
> > > allow the kernel to be booted as an EFI application. This
> > > is only useful on systems that have UEFI firmware.
> > >
> > > +config XIP_KERNEL
> > > + bool "Kernel Execute-In-Place from ROM"
> > > + depends on MMU
> > > + help
> > > + Execute-In-Place allows the kernel to run from non-volatile storage
> > > + directly addressable by the CPU, such as NOR flash. This saves RAM
> > > + space since the text section of the kernel is not loaded from flash
> > > + to RAM. Read-write sections, such as the data section and stack,
> > > + are still copied to RAM. The XIP kernel is not compressed since
> > > + it has to run directly from flash, so it will take more space to
> > > + store it. The flash address used to link the kernel object files,
> > > + and for storing it, is configuration dependent. Therefore, if you
> > > + say Y here, you must know the proper physical address where to
> > > + store the kernel image depending on your own flash memory usage.
> > > +
> > > + Also note that the make target becomes "make xipImage" rather than
> > > + "make zImage" or "make Image". The final kernel binary to put in
> > > + ROM memory will be arch/riscv/boot/xipImage.
> > > +
> > > + If unsure, say N.
> > > +
> > > +config XIP_PHYS_ADDR
> > > + hex "XIP Kernel Physical Location"
> > > + depends on XIP_KERNEL
> > > + default "0x21000000"
> > > + help
> > > + This is the physical address in your flash memory the kernel will
> > > + be linked for and stored to. This address is dependent on your
> > > + own flash usage.
> > > +
> > > +config XIP_PHYS_RAM_BASE
> > > + hex "Platform Physical RAM address"
> > > + depends on XIP_KERNEL
> > > + default "0x80000000"
> > > + help
> > > + This is the physical address of RAM in the system. It has to be
> > > + explicitly specified to run early relocations of read-write data
> > > + from flash to RAM.
> > > endmenu
> > >
> > > config BUILTIN_DTB
> > > - def_bool n
> > > - depends on RISCV_M_MODE
> > > + bool
> > > + depends on (RISCV_M_MODE || XIP_KERNEL)
> > > depends on OF
> > > + default y if XIP_KERNEL
> > > + default n if !XIP_KERNEL
> >
> > The "default n" case is not required because by default it is disabled.
>
> Thanks, will fix.
>
> > > menu "Power management options"
> > >
> > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > index 8c29e553ef7f..b89b6897b0be 100644
> > > --- a/arch/riscv/Makefile
> > > +++ b/arch/riscv/Makefile
> > > @@ -70,7 +70,11 @@ CHECKFLAGS += -D__riscv -D__riscv_xlen=$(BITS)
> > >
> > > # Default target when executing plain make
> > > boot := arch/riscv/boot
> > > +ifeq ($(CONFIG_XIP_KERNEL),y)
> > > +KBUILD_IMAGE := $(boot)/xipImage
> > > +else
> > > KBUILD_IMAGE := $(boot)/Image.gz
> > > +endif
> > >
> > > head-y := arch/riscv/kernel/head.o
> > >
> > > @@ -83,12 +87,14 @@ PHONY += vdso_install
> > > vdso_install:
> > > $(Q)$(MAKE) $(build)=arch/riscv/kernel/vdso $@
> > >
> > > +ifneq ($(CONFIG_XIP_KERNEL),y)
> > > ifeq ($(CONFIG_RISCV_M_MODE)$(CONFIG_SOC_KENDRYTE),yy)
> > > KBUILD_IMAGE := $(boot)/loader.bin
> > > else
> > > KBUILD_IMAGE := $(boot)/Image.gz
> > > endif
> > > -BOOT_TARGETS := Image Image.gz loader loader.bin
> > > +endif
> > > +BOOT_TARGETS := Image Image.gz loader loader.bin xipImage
> > >
> > > all: $(notdir $(KBUILD_IMAGE))
> > >
> > > diff --git a/arch/riscv/boot/Makefile b/arch/riscv/boot/Makefile
> > > index 03404c84f971..6bf299f70c27 100644
> > > --- a/arch/riscv/boot/Makefile
> > > +++ b/arch/riscv/boot/Makefile
> > > @@ -17,8 +17,21 @@
> > > KCOV_INSTRUMENT := n
> > >
> > > OBJCOPYFLAGS_Image :=-O binary -R .note -R .note.gnu.build-id -R .comment -S
> > > +OBJCOPYFLAGS_xipImage :=-O binary -R .note -R .note.gnu.build-id -R .comment -S
> > >
> > > targets := Image Image.* loader loader.o loader.lds loader.bin
> > > +targets := Image Image.* loader loader.o loader.lds loader.bin xipImage
> > > +
> > > +ifeq ($(CONFIG_XIP_KERNEL),y)
> > > +
> > > +quiet_cmd_mkxip = $(quiet_cmd_objcopy)
> > > +cmd_mkxip = $(cmd_objcopy)
> > > +
> > > +$(obj)/xipImage: vmlinux FORCE
> > > + $(call if_changed,mkxip)
> > > + @$(kecho) ' Physical Address of xipImage: $(CONFIG_XIP_PHYS_ADDR)'
> > > +
> > > +endif
> > >
> > > $(obj)/Image: vmlinux FORCE
> > > $(call if_changed,objcopy)
> > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > index 183f1f4b2ae6..ec3360363b71 100644
> > > --- a/arch/riscv/include/asm/pgtable.h
> > > +++ b/arch/riscv/include/asm/pgtable.h
> > > @@ -11,6 +11,42 @@
> > >
> > > #include <asm/pgtable-bits.h>
> > >
> > > +#ifdef CONFIG_MMU
> > > +
> > > +#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
> > > +#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
> > > +
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +/*
> > > + * Since we use sections to map it, this macro replaces the physical address
> > > + * with its virtual address while keeping offset from the base section.
> > > + */
> > > +#define XIP_OFFSET SZ_8M
> > > +#define XIP_MASK (SZ_8M - 1)
> > > +#define XIP_VIRT_ADDR(physaddr) (PAGE_OFFSET - XIP_OFFSET + ((physaddr) & XIP_MASK))
> > > +#define XIP_PHYS_ADDR(va) \
> > > + ((uintptr_t)(va) - PAGE_OFFSET + XIP_OFFSET + CONFIG_XIP_PHYS_ADDR)
> > > +#define VMALLOC_END (PAGE_OFFSET - SZ_16M - 1)
> > > +
> > > +#define XIP_VIRT_ADDR_START XIP_VIRT_ADDR(CONFIG_XIP_PHYS_ADDR)
> > > +#define __PHYS_RAM_BASE CONFIG_XIP_PHYS_RAM_BASE
> > > +
> > > +#define __XIP_FIXUP(addr) \
> > > + (((uintptr_t)(addr) >= CONFIG_XIP_PHYS_ADDR && \
> > > + (uintptr_t)(addr) <= CONFIG_XIP_PHYS_ADDR + SZ_16M) ? \
> > > + (uintptr_t)(addr) - CONFIG_XIP_PHYS_ADDR + __PHYS_RAM_BASE - XIP_OFFSET : \
> > > + (uintptr_t)(addr))
> > > +
> > > +#else
> > > +#define VMALLOC_END (PAGE_OFFSET - 1)
> > > +#endif /* CONFIG_XIP_KERNEL */
> > > +
> > > +#else
> > > +
> > > +#define XIP_VIRT_ADDR(physaddr) physaddr
> > > +
> > > +#endif /* CONFIG_MMU */
> > > +
> >
> > Lot of defines above will not work for assembly sources.
> >
> > Keep only assembly friendly defines outside "#ifndef __ASSEMBLY__"
>
> Fair enough, will do.
>
> > > #ifndef __ASSEMBLY__
> > >
> > > /* Page Upper Directory not used in RISC-V */
> > > @@ -21,10 +57,6 @@
> > >
> > > #ifdef CONFIG_MMU
> > >
> > > -#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
> > > -#define VMALLOC_END (PAGE_OFFSET - 1)
> > > -#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
> > > -
> > > #define BPF_JIT_REGION_SIZE (SZ_128M)
> > > #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> > > #define BPF_JIT_REGION_END (VMALLOC_END)
> > > @@ -467,8 +499,20 @@ static inline void __kernel_map_pages(struct page *page, int numpages, int enabl
> > >
> > > #define kern_addr_valid(addr) (1) /* FIXME */
> > >
> > > -extern void *dtb_early_va;
> > > -extern uintptr_t dtb_early_pa;
> > > +extern void *_dtb_early_va;
> > > +extern uintptr_t _dtb_early_pa;
> > > +#if defined(CONFIG_XIP_KERNEL) && defined(CONFIG_MMU)
> > > +
> > > +#define dtb_early_va (*(void **)__XIP_FIXUP(&_dtb_early_va))
> > > +#define dtb_early_pa (*(uintptr_t *)__XIP_FIXUP(&_dtb_early_pa))
> > > +
> > > +#else
> > > +
> > > +#define dtb_early_va _dtb_early_va
> > > +#define dtb_early_pa _dtb_early_pa
> > > +
> > > +#endif /* CONFIG_XIP_KERNEL */
> > > +
> > > void setup_bootmem(void);
> > > void paging_init(void);
> > >
> > > diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
> > > index 685fae72b7f5..9c8d6dab55b8 100644
> > > --- a/arch/riscv/kernel/cpu_ops_sbi.c
> > > +++ b/arch/riscv/kernel/cpu_ops_sbi.c
> > > @@ -59,6 +59,9 @@ static int sbi_cpu_start(unsigned int cpuid, struct task_struct *tidle)
> > > unsigned long boot_addr = __pa_symbol(secondary_start_sbi);
> > > int hartid = cpuid_to_hartid_map(cpuid);
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + boot_addr = XIP_PHYS_ADDR(secondary_start_sbi);
> > > +#endif
> >
> > You can avoid "#ifdef CONFIG_XIP_KERNEL" here by defining
> > XIP_PHYS_ADDR() to do nothing when XIP_KERNEL is disabled.
>
> That's right but I think it actually makes things harder to
> understand. Instead, I can implement a static inline helper like
> get_secondary_start_phys() which will return the appropriate value
> depending on the actual configuration. Does that sound reasonable?
Yes, that would be better.
Overall, I would also suggest to use more pre-processor tricks to
minimize "#ifdef" both in assembly and C sources.
>
> > > cpu_update_secondary_bootdata(cpuid, tidle);
> > > rc = sbi_hsm_hart_start(hartid, boot_addr, 0);
> > >
> > > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > > index 45dbdae930bf..6f065a5502d3 100644
> > > --- a/arch/riscv/kernel/head.S
> > > +++ b/arch/riscv/kernel/head.S
> > > @@ -9,11 +9,31 @@
> > > #include <linux/linkage.h>
> > > #include <asm/thread_info.h>
> > > #include <asm/page.h>
> > > +#include <asm/pgtable.h>
> > > #include <asm/csr.h>
> > > #include <asm/hwcap.h>
> > > #include <asm/image.h>
> > > #include "efi-header.S"
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +.macro XIP_FIXUP_OFFSET reg
> > > +#ifdef CONFIG_64BIT
> > > + ld t0, _xip_fixup
> > > +#else
> > > + lw t0, _xip_fixup
> > > +#endif
> > > + add \reg, \reg, t0
> > > +.endm
> > > +
> > > +_xip_fixup:
> > > +#ifdef CONFIG_64BIT
> > > + .dword CONFIG_XIP_PHYS_RAM_BASE - CONFIG_XIP_PHYS_ADDR - XIP_OFFSET
> > > +#else
> > > + .word CONFIG_XIP_PHYS_RAM_BASE - CONFIG_XIP_PHYS_ADDR - XIP_OFFSET
> > > +#endif
> > > +
> > > +#endif /* CONFIG_XIP_KERNEL */
> > > +
> > > __HEAD
> > > ENTRY(_start)
> > > /*
> > > @@ -64,7 +84,11 @@ pe_head_start:
> > > #ifdef CONFIG_MMU
> > > relocate:
> > > /* Relocate return address */
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + li a1, XIP_VIRT_ADDR(CONFIG_XIP_PHYS_ADDR)
> > > +#else
> > > li a1, PAGE_OFFSET
> > > +#endif
> > > la a2, _start
> > > sub a1, a1, a2
> > > add ra, ra, a1
> > > @@ -85,7 +109,12 @@ relocate:
> > > * full fence here because setup_vm() just wrote these PTEs and we need
> > > * to ensure the new translations are in use.
> > > */
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + la a0, _trampoline_pg_dir
> > > + XIP_FIXUP_OFFSET a0
> > > +#else
> > > la a0, trampoline_pg_dir
> > > +#endif
> > > srl a0, a0, PAGE_SHIFT
> > > or a0, a0, a1
> > > sfence.vma
> > > @@ -140,6 +169,10 @@ secondary_start_sbi:
> > > slli a3, a0, LGREG
> > > la a4, __cpu_up_stack_pointer
> > > la a5, __cpu_up_task_pointer
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + XIP_FIXUP_OFFSET a4
> > > + XIP_FIXUP_OFFSET a5
> > > +#endif
> > > add a4, a3, a4
> > > add a5, a3, a5
> > > REG_L sp, (a4)
> > > @@ -233,10 +266,24 @@ pmp_done:
> > >
> > > /* Pick one hart to run the main boot sequence */
> > > la a3, hart_lottery
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + XIP_FIXUP_OFFSET a3
> > > +#endif
> > > li a2, 1
> > > amoadd.w a3, a2, (a3)
> > > bnez a3, .Lsecondary_start
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + la sp, _end + THREAD_SIZE
> > > + XIP_FIXUP_OFFSET sp
> > > + mv s0, a0
> > > + call __copy_data
> > > +
> > > + /* Restore a0 copy */
> > > + mv a0, s0
> > > +#endif
> > > +
> > > +#ifndef CONFIG_XIP_KERNEL
> > > /* Clear BSS for flat non-ELF images */
> > > la a3, __bss_start
> > > la a4, __bss_stop
> > > @@ -246,19 +293,35 @@ clear_bss:
> > > add a3, a3, RISCV_SZPTR
> > > blt a3, a4, clear_bss
> > > clear_bss_done:
> > > -
> > > +#endif
> > > /* Save hart ID and DTB physical address */
> > > mv s0, a0
> > > mv s1, a1
> > > +
> > > la a2, boot_cpu_hartid
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + XIP_FIXUP_OFFSET a2
> > > +#endif
> > > REG_S a0, (a2)
> > >
> > > /* Initialize page tables and relocate to virtual addresses */
> > > la sp, init_thread_union + THREAD_SIZE
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + XIP_FIXUP_OFFSET sp
> > > +#endif
> > > +#if defined (CONFIG_XIP_KERNEL) && defined (CONFIG_BUILTIN_DTB)
> > > + la a0, __dtb_start
> > > +#else
> > > mv a0, s1
> > > +#endif
> > > call setup_vm
> > > #ifdef CONFIG_MMU
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + la a0, _early_pg_dir
> > > + XIP_FIXUP_OFFSET a0
> > > +#else
> > > la a0, early_pg_dir
> > > +#endif
> > > call relocate
> > > #endif /* CONFIG_MMU */
> > >
> > > @@ -284,6 +347,10 @@ clear_bss_done:
> > > slli a3, a0, LGREG
> > > la a1, __cpu_up_stack_pointer
> > > la a2, __cpu_up_task_pointer
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + XIP_FIXUP_OFFSET a1
> > > + XIP_FIXUP_OFFSET a2
> > > +#endif
> > > add a1, a3, a1
> > > add a2, a3, a2
> > >
> > > diff --git a/arch/riscv/kernel/head.h b/arch/riscv/kernel/head.h
> > > index b48dda3d04f6..aabbc3ac3e48 100644
> > > --- a/arch/riscv/kernel/head.h
> > > +++ b/arch/riscv/kernel/head.h
> > > @@ -12,6 +12,9 @@ extern atomic_t hart_lottery;
> > >
> > > asmlinkage void do_page_fault(struct pt_regs *regs);
> > > asmlinkage void __init setup_vm(uintptr_t dtb_pa);
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +asmlinkage void __init __copy_data(void);
> > > +#endif
> > >
> > > extern void *__cpu_up_stack_pointer[];
> > > extern void *__cpu_up_task_pointer[];
> > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> > > index a834deab31d3..876d4ff0ecc4 100644
> > > --- a/arch/riscv/kernel/setup.c
> > > +++ b/arch/riscv/kernel/setup.c
> > > @@ -50,7 +50,11 @@ struct screen_info screen_info __section(".data") = {
> > > * This is used before the kernel initializes the BSS so it can't be in the
> > > * BSS.
> > > */
> > > -atomic_t hart_lottery __section(".sdata");
> > > +atomic_t hart_lottery __section(".sdata")
> > > +#ifdef CONFIG_XIP_KERNEL
> > > += ATOMIC_INIT(1)
> > > +#endif
> > > +;
> > > unsigned long boot_cpu_hartid;
> > > static DEFINE_PER_CPU(struct cpu, cpu_devices);
> > >
> > > @@ -244,7 +248,7 @@ void __init setup_arch(char **cmdline_p)
> > > #if IS_ENABLED(CONFIG_BUILTIN_DTB)
> > > unflatten_and_copy_device_tree();
> > > #else
> > > - if (early_init_dt_verify(__va(dtb_early_pa)))
> > > + if (early_init_dt_verify(dtb_early_va))
> >
> > We can't use dtb_early_va here because early DTB mappings are only done
> > in early_pg_dir but not available in swapper_pg_dir.
>
> Well, as you can see if you look further down the patch, I do set
> dtb_early_va to __va(dtb_early_pa) for the non-XIP case.
Ahh, I see.
>
> > > unflatten_device_tree();
> > > else
> > > pr_err("No DTB found in kernel mappings\n");
> > > diff --git a/arch/riscv/kernel/vmlinux-xip.lds.S b/arch/riscv/kernel/vmlinux-xip.lds.S
> > > new file mode 100644
> > > index 000000000000..9f0f08c34cd3
> > > --- /dev/null
> > > +++ b/arch/riscv/kernel/vmlinux-xip.lds.S
> > > @@ -0,0 +1,132 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (C) 2012 Regents of the University of California
> > > + * Copyright (C) 2017 SiFive
> > > + * Copyright (C) 2020 Vitaly Wool, Konsulko AB
> > > + */
> > > +
> > > +#define LOAD_OFFSET XIP_VIRT_ADDR(CONFIG_XIP_PHYS_ADDR)
> > > +/* No __ro_after_init data in the .rodata section - which will always be ro */
> > > +#define RO_AFTER_INIT_DATA
> > > +
> > > +#include <asm/vmlinux.lds.h>
> > > +#include <asm/page.h>
> > > +#include <asm/pgtable.h>
> > > +#include <asm/cache.h>
> > > +#include <asm/thread_info.h>
> > > +
> > > +OUTPUT_ARCH(riscv)
> > > +ENTRY(_start)
> > > +
> > > +jiffies = jiffies_64;
> > > +
> > > +SECTIONS
> > > +{
> > > + /* Beginning of code and text segment */
> > > + . = XIP_VIRT_ADDR(CONFIG_XIP_PHYS_ADDR);
> > > + _xiprom = .;
> > > + _start = .;
> > > + HEAD_TEXT_SECTION
> > > + INIT_TEXT_SECTION(PAGE_SIZE)
> > > + /* we have to discard exit text and such at runtime, not link time */
> > > + .exit.text :
> > > + {
> > > + EXIT_TEXT
> > > + }
> > > +
> > > + .text : {
> > > + _text = .;
> > > + _stext = .;
> > > + TEXT_TEXT
> > > + SCHED_TEXT
> > > + CPUIDLE_TEXT
> > > + LOCK_TEXT
> > > + KPROBES_TEXT
> > > + ENTRY_TEXT
> > > + IRQENTRY_TEXT
> > > + SOFTIRQENTRY_TEXT
> > > + *(.fixup)
> > > + _etext = .;
> > > + }
> > > + RO_DATA(L1_CACHE_BYTES)
> > > + .srodata : {
> > > + *(.srodata*)
> > > + }
> > > + .init.rodata : {
> > > + INIT_SETUP(16)
> > > + INIT_CALLS
> > > + CON_INITCALL
> > > + INIT_RAM_FS
> > > + }
> > > + _exiprom = .; /* End of XIP ROM area */
> > > +
> > > +
> > > +/*
> > > + * From this point, stuff is considered writable and will be copied to RAM
> > > + */
> > > + __data_loc = ALIGN(16); /* location in file */
> > > + . = PAGE_OFFSET; /* location in memory */
> > > +
> > > + _sdata = .; /* Start of data section */
> > > + _data = .;
> > > + RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
> > > + _edata = .;
> > > + __start_ro_after_init = .;
> > > + .data.ro_after_init : AT(ADDR(.data.ro_after_init) - LOAD_OFFSET) {
> > > + *(.data..ro_after_init)
> > > + }
> > > + __end_ro_after_init = .;
> > > +
> > > + . = ALIGN(PAGE_SIZE);
> > > + __init_begin = .;
> > > + .init.data : {
> > > + INIT_DATA
> > > + }
> > > + .exit.data : {
> > > + EXIT_DATA
> > > + }
> > > + . = ALIGN(8);
> > > + __soc_early_init_table : {
> > > + __soc_early_init_table_start = .;
> > > + KEEP(*(__soc_early_init_table))
> > > + __soc_early_init_table_end = .;
> > > + }
> > > + __soc_builtin_dtb_table : {
> > > + __soc_builtin_dtb_table_start = .;
> > > + KEEP(*(__soc_builtin_dtb_table))
> > > + __soc_builtin_dtb_table_end = .;
> > > + }
> > > + PERCPU_SECTION(L1_CACHE_BYTES)
> > > +
> > > + . = ALIGN(PAGE_SIZE);
> > > + __init_end = .;
> > > +
> > > + .sdata : {
> > > + __global_pointer$ = . + 0x800;
> > > + *(.sdata*)
> > > + *(.sbss*)
> > > + }
> > > +
> > > + BSS_SECTION(PAGE_SIZE, PAGE_SIZE, 0)
> > > + EXCEPTION_TABLE(0x10)
> > > +
> > > + .rel.dyn : AT(ADDR(.rel.dyn) - LOAD_OFFSET) {
> > > + *(.rel.dyn*)
> > > + }
> > > +
> > > + /*
> > > + * End of copied data. We need a dummy section to get its LMA.
> > > + * Also located before final ALIGN() as trailing padding is not stored
> > > + * in the resulting binary file and useless to copy.
> > > + */
> > > + .data.endmark : AT(ADDR(.data.endmark) - LOAD_OFFSET) { }
> > > + _edata_loc = LOADADDR(.data.endmark);
> > > +
> > > + . = ALIGN(PAGE_SIZE);
> > > + _end = .;
> > > +
> > > + STABS_DEBUG
> > > + DWARF_DEBUG
> > > +
> > > + DISCARDS
> > > +}
> >
> > Atish's patch series "[PATCH v3 0/5] Improve kernel section protections"
> > updates vmlinux.lds.S so take that as reference for your vmlinux-xip.lds.S
>
> vmlinux-xip loaded script doesn't have to be a carbon copy of the
> original one. Is there something special that you believe is needed
> for the XIP version?
Atish's series improves kernel strict mappings and in the process those
patches end-up re-organizing sections in linker script.
Make sure you test XIP + strict kernel mappings.
>
> > > diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S
> > > index de03cb22d0e9..6745ec325930 100644
> > > --- a/arch/riscv/kernel/vmlinux.lds.S
> > > +++ b/arch/riscv/kernel/vmlinux.lds.S
> > > @@ -4,7 +4,12 @@
> > > * Copyright (C) 2017 SiFive
> > > */
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +#include "vmlinux-xip.lds.S"
> > > +#else
> > > +
> > > #define LOAD_OFFSET PAGE_OFFSET
> > > +
> > > #include <asm/vmlinux.lds.h>
> > > #include <asm/page.h>
> > > #include <asm/cache.h>
> > > @@ -132,3 +137,4 @@ SECTIONS
> > >
> > > DISCARDS
> > > }
> > > +#endif /* CONFIG_XIP_KERNEL */
> > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > > index 87c305c566ac..8636432400df 100644
> > > --- a/arch/riscv/mm/init.c
> > > +++ b/arch/riscv/mm/init.c
> > > @@ -30,8 +30,8 @@ EXPORT_SYMBOL(empty_zero_page);
> > >
> > > extern char _start[];
> > > #define DTB_EARLY_BASE_VA PGDIR_SIZE
> > > -void *dtb_early_va __initdata;
> > > -uintptr_t dtb_early_pa __initdata;
> > > +void *_dtb_early_va __initdata;
> > > +uintptr_t _dtb_early_pa __initdata;
> > >
> > > struct pt_alloc_ops {
> > > pte_t *(*get_pte_virt)(phys_addr_t pa);
> > > @@ -87,6 +87,10 @@ static void print_vm_layout(void)
> > > (unsigned long)VMALLOC_END);
> > > print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
> > > (unsigned long)high_memory);
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + print_mlm("xip", (unsigned long)XIP_VIRT_ADDR_START,
> > > + (unsigned long)XIP_VIRT_ADDR_START + SZ_16M);
> > > +#endif /* CONFIG_XIP_KERNEL */
> > > }
> > > #else
> > > static void print_vm_layout(void) { }
> > > @@ -163,6 +167,10 @@ void __init setup_bootmem(void)
> > > phys_addr_t vmlinux_start = __pa_symbol(&_start);
> > > u64 i;
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + vmlinux_start = __pa_symbol(&_sdata);
> > > +#endif
> > > +
> > > /* Find the memory region containing the kernel */
> > > for_each_mem_range(i, &start, &end) {
> > > phys_addr_t size = end - start;
> > > @@ -194,11 +202,18 @@ void __init setup_bootmem(void)
> > > setup_initrd();
> > > #endif /* CONFIG_BLK_DEV_INITRD */
> > >
> > > - /*
> > > - * Avoid using early_init_fdt_reserve_self() since __pa() does
> > > - * not work for DTB pointers that are fixmap addresses
> > > - */
> > > - memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va));
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + if (dtb_early_pa < CONFIG_XIP_PHYS_ADDR ||
> > > + dtb_early_pa > CONFIG_XIP_PHYS_ADDR + SZ_16M)
> > > +#endif
> > > + {
> > > + /*
> > > + * Avoid using early_init_fdt_reserve_self() since __pa() does
> > > + * not work for DTB pointers that are fixmap addresses
> > > + */
> > > + memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va));
> > > + dtb_early_va = __va(dtb_early_pa);
> >
> > Drop this "dtb_early_va" assignment.
>
> See above.
This assignment deserve's a detailed comment for non-XIP case. Also, this
assignemt should be moved to setup_vm_final() where the switch from
early_pg_dir to swapper_pg_dir for boot CPU happens.
>
> > > + }
> > >
> > > early_init_fdt_scan_reserved_mem();
> > > dma_contiguous_reserve(dma32_phys_limit);
> > > @@ -206,21 +221,47 @@ void __init setup_bootmem(void)
> > > memblock_dump_all();
> > > }
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +
> > > +extern char _xiprom[], _exiprom[];
> > > +extern char _sdata[], _edata[];
> > > +
> > > +#endif /* CONFIG_XIP_KERNEL */
> > > +
> > > #ifdef CONFIG_MMU
> > > -static struct pt_alloc_ops pt_ops;
> > > +static struct pt_alloc_ops _pt_ops;
> > > +
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +#define pt_ops (*(struct pt_alloc_ops *)__XIP_FIXUP(&_pt_ops))
> > > +#else
> > > +#define pt_ops _pt_ops
> > > +#endif
> > >
> > > unsigned long va_pa_offset;
> > > EXPORT_SYMBOL(va_pa_offset);
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +#define va_pa_offset (*((unsigned long *)__XIP_FIXUP(&va_pa_offset)))
> > > +#endif
> > > unsigned long pfn_base;
> > > EXPORT_SYMBOL(pfn_base);
> > >
> > > pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> > > -pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> > > -pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
> > >
> > > #define MAX_EARLY_MAPPING_SIZE SZ_128M
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +pgd_t _trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> > > +pte_t _fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
> > > +pgd_t _early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> > > +
> > > +#define trampoline_pg_dir ((pgd_t *)__XIP_FIXUP(_trampoline_pg_dir))
> > > +#define fixmap_pte ((pte_t *)__XIP_FIXUP(_fixmap_pte))
> > > +#define early_pg_dir ((pgd_t *)__XIP_FIXUP(_early_pg_dir))
> > > +#else
> > > +pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> > > +pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
> > > pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> > > +#endif /* CONFIG_XIP_KERNEL */
> > >
> > > void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> > > {
> > > @@ -292,15 +333,27 @@ static void __init create_pte_mapping(pte_t *ptep,
> > >
> > > #ifndef __PAGETABLE_PMD_FOLDED
> > >
> > > -pmd_t trampoline_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > > -pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > > -
> > > #if MAX_EARLY_MAPPING_SIZE < PGDIR_SIZE
> > > #define NUM_EARLY_PMDS 1UL
> > > #else
> > > #define NUM_EARLY_PMDS (1UL + MAX_EARLY_MAPPING_SIZE / PGDIR_SIZE)
> > > #endif
> > > +
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +pmd_t _trampoline_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > > +pmd_t _fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > > +pmd_t _xip_pmd[PTRS_PER_PGD * NUM_EARLY_PMDS] __page_aligned_bss;
> > > +pmd_t _early_pmd[PTRS_PER_PMD * NUM_EARLY_PMDS] __initdata __aligned(PAGE_SIZE);
> > > +
> > > +#define trampoline_pmd ((pmd_t *)__XIP_FIXUP(_trampoline_pmd))
> > > +#define fixmap_pmd ((pmd_t *)__XIP_FIXUP(_fixmap_pmd))
> > > +#define xip_pmd ((pmd_t *)__XIP_FIXUP(_xip_pmd))
> > > +#define early_pmd ((pmd_t *)__XIP_FIXUP(_early_pmd))
> > > +#else
> > > +pmd_t trampoline_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > > +pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > > pmd_t early_pmd[PTRS_PER_PMD * NUM_EARLY_PMDS] __initdata __aligned(PAGE_SIZE);
> > > +#endif
> > >
> > > static pmd_t *__init get_pmd_virt_early(phys_addr_t pa)
> > > {
> > > @@ -420,6 +473,19 @@ static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> > > return PMD_SIZE;
> > > }
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > +/* called from head.S with MMU off */
> > > +asmlinkage void __init __copy_data(void)
> > > +{
> > > + void *from = (void *)(&_sdata);
> > > + void *end = (void *)(&_end);
> > > + void *to = (void *)CONFIG_XIP_PHYS_RAM_BASE;
> > > + size_t sz = (size_t)(end - from);
> > > +
> > > + memcpy(to, from, sz);
> > > +}
> > > +#endif
> > > +
> > > /*
> > > * setup_vm() is called from head.S with MMU-off.
> > > *
> > > @@ -448,6 +514,19 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > > pmd_t fix_bmap_spmd, fix_bmap_epmd;
> > > #endif
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + uintptr_t xiprom = (uintptr_t)CONFIG_XIP_PHYS_ADDR;
> > > + uintptr_t xiprom_sz = (uintptr_t)(&_exiprom) - (uintptr_t)(&_xiprom);
> > > + uintptr_t xip_map_size = best_map_size(xiprom,
> > > + MAX_EARLY_MAPPING_SIZE);
> > > +
> > > + BUG_ON((xiprom % xip_map_size) != 0);
> > > + BUG_ON(xiprom_sz > MAX_EARLY_MAPPING_SIZE);
> > > +
> > > + load_pa = (uintptr_t)CONFIG_XIP_PHYS_RAM_BASE;
> > > + load_sz = (uintptr_t)(&_end) - (uintptr_t)(&_sdata);
> > > + map_size = best_map_size(load_pa, MAX_EARLY_MAPPING_SIZE);
> > > +#endif
> >
> > Add new line here.
> >
> > > va_pa_offset = PAGE_OFFSET - load_pa;
> > > pfn_base = PFN_DOWN(load_pa);
> > >
> > > @@ -487,6 +566,21 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > > load_pa, PGDIR_SIZE, PAGE_KERNEL_EXEC);
> > > #endif
> > >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + create_pgd_mapping(trampoline_pg_dir, XIP_VIRT_ADDR_START,
> > > + (uintptr_t)xip_pmd, PGDIR_SIZE, PAGE_TABLE);
> > > + for (va = XIP_VIRT_ADDR_START;
> > > + va < XIP_VIRT_ADDR_START + xiprom_sz;
> > > + va += PMD_SIZE) {
> > > + create_pmd_mapping(xip_pmd, va,
> > > + xiprom + (va - XIP_VIRT_ADDR_START),
> > > + PMD_SIZE, PAGE_KERNEL_EXEC);
> > > + }
> > > +
> > > + create_pgd_mapping(early_pg_dir, XIP_VIRT_ADDR_START,
> > > + (uintptr_t)xip_pmd, PGDIR_SIZE, PAGE_TABLE);
> > > +#endif
> > > +
> > > /*
> > > * Setup early PGD covering entire kernel which will allows
> > > * us to reach paging_init(). We map all memory banks later
> > > @@ -543,6 +637,9 @@ static void __init setup_vm_final(void)
> > > uintptr_t va, map_size;
> > > phys_addr_t pa, start, end;
> > > u64 i;
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + uintptr_t xiprom_sz = (uintptr_t)(&_exiprom) - (uintptr_t)(&_xiprom);
> > > +#endif
> > >
> > > /**
> > > * MMU is enabled at this point. But page table setup is not complete yet.
> > > @@ -574,6 +671,25 @@ static void __init setup_vm_final(void)
> > > map_size, PAGE_KERNEL_EXEC);
> > > }
> > > }
> >
> > Add new line here.
> >
> > > +#ifdef CONFIG_XIP_KERNEL
> > > + for (va = XIP_VIRT_ADDR_START;
> > > + va < XIP_VIRT_ADDR_START + xiprom_sz;
> > > + va += PMD_SIZE)
> > > + create_pgd_mapping(swapper_pg_dir, va,
> > > + CONFIG_XIP_PHYS_ADDR + (va - XIP_VIRT_ADDR_START),
> > > + PMD_SIZE, PAGE_KERNEL_EXEC);
> > > +
> > > + /* if DT is in NOR, keep the early mapping */
> > > + if (dtb_early_pa >= CONFIG_XIP_PHYS_ADDR &&
> > > + dtb_early_pa <= CONFIG_XIP_PHYS_ADDR + SZ_16M) {
> > > + pa = dtb_early_pa & ~(PGDIR_SIZE - 1);
> > > + va = DTB_EARLY_BASE_VA;
> > > + create_pgd_mapping(swapper_pg_dir, va,
> > > + pa, PGDIR_SIZE, PAGE_KERNEL);
> > > + create_pgd_mapping(swapper_pg_dir, va + PGDIR_SIZE,
> > > + pa + PGDIR_SIZE, PGDIR_SIZE, PAGE_KERNEL);
> >
> > This is wasted of virtual address space for XIP kernel.
>
> For XIP cases we shouldn't care too much about virtual address space.
> We are saving RAM (and power, FWIW, but that's unrelated) :-)
>
> > Also, DTB_EARLY_BASE_VA is an adhoc address falling in user-space
> > address range.
>
> That's fair. Would you agree with me changing the mapping to point to
> somewhere between VMALLOC_END and XIP_VIRT_ADDR_START?
Please don't change DTB_EARLY_BASE_VA for the non-XIP case.
The mapping done by setup_vm() is really temporary and once boot CPU
switches to swapper_pg_dir we don't need DTB early mappings.
>
> > I would recommend XIP kernel to:
> > 1. Allocate memory for DTB using memblock_phys_alloc()
> > 2. The virtual addres of allocated memory can be found using __va()
> > 3. Copy DTB to allocated memory
> > 4. Set dtb_early_pa to new DTB location
>
> No, I don't want to copy DTB to RAM at all as long as I can avoid
> that. See also above, we have a goal of *saving* RAM. It is directly
> contradictory to this goal to waste up to 100+K of RAM.
>
> Again, what if I change DTB_EARLY_BASE_VA to be something like
> (XIP_VIRT_ADDR_START-2*PMD_SIZE) in the XIP case?
Maybe you can keep DTB_EARLY_BASE_VA as-is and introduce
XIP_DTB_VA define for mapping DTB in setup_vm_final() for the XIP case.
The non-XIP case does not need any DTB mappings in setup_vm_final().
Also, setup_vm_final() should do smaller mappings for the DTB instead
of PGDIR_SIZE mappings.
>
>
> > > + }
> > > +#endif
> > >
> > > /* Clear fixmap PTE and PMD mappings */
> > > clear_fixmap(FIX_PTE);
> > > --
> > > 2.29.2
> > >
> >
> > Regards,
> > Anup
>
> ...and thanks for the thorough review of course.
>
> Best regards,
> Vitaly
Regards,
Anup