Re: [PATCH v3 4/4] RISC-V: Allow booting kernel from any 4KB aligned address

From: Anup Patel
Date: Mon Mar 25 2019 - 12:17:13 EST


On Mon, Mar 25, 2019 at 8:29 PM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 25, 2019 at 06:18:45PM +0530, Anup Patel wrote:
> > We trying to addresses following issues in current code:
> > 1. The current setup_vm() maps all possible kernel virtual addresses (128GB
> > on 64bit system and 1GB on 32bit system). The amount RAM present on
> > real systems might be much less so we should not have kernel mappings for
> > non-existent RAM. Of course, we don't know amount of RAM available in
> > setup_vm() so we have to split page table setup in two parts and do minimal
> > required mapping in setup_vm().
>
> Why do you even care about kernel mappings for non-existant ram.

We care because there will always be some buggy kernel driver/code going
out-of-bound and accessing non-existent RAM. If we by default map all
possible kernel virtual address then behaviour of buggy accesses will be
unpredictable.

Further, I think we should also make .text and .rodata sections of kernel
as read-only. This will protect kernel code and rodata.

Above things add more debugablity to kernel and also help us catch bugs
faster.

>
> > 2. NOMMU kernel requires a swapper_pg_dir with identity mapping (VA == PA)
>
> Bullshit. nommu per defintion does not have page tables, and thus

Yes, I know but we did see a kernel crash in-absence of kernel page table
with identity mappings.

>
> > 3. For both NOMMU with tiny memory and MMU with tiny memory, the current
> > setup_vm() is not allowing us to place kernel on non-2M (or non-4M) aligned
> > addressed there by causing memory below kernel to be wasted.
>
> As mentioned a few times - nommu per defintion does not have page
> tables, and in my uptodate nommu port none of this code is even compiled
> in. If someone still compiles that code in some codebase that is just
> a bug that needs to be fixed.
>
> For MMU with tiny memory is is a theoretical case, but I still haven't
> seen a good rationale why we'd care for that case.

Lot of RISC-V systems being built today target resource constraint
use-cases (such as IoT or embedded) where more RAM is more
cost and more power.

It is very useful to have Linux RISC-V run on just few MBs (if possible).

>
> > 4. For MMU based kernel, the current setup_vm() is hard-wired for fixed 2M
> > mapping size. It will require more changes if we want to do 1G mappings.
>
> And why do we care? Even if we come up with a good reason for 1G
> mappings we'd better find a way to select it at run time and not require
> a gazillion of options to select how to map the kernel.

1G mappings will give better performance compared 2M mappings.

This will be very useful for performance hungry system with lot of RAM.

This patch selects 1G or 2M mapping at runtime based on load address
alignment.

Regards,
Anup