Re: [PATCH v2 3/5] RISC-V: Allow booting kernel from any 4KB aligned address

From: Anup Patel
Date: Sun Mar 24 2019 - 00:16:53 EST


On Sat, Mar 23, 2019 at 10:54 PM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Sat, Mar 23, 2019 at 05:40:12PM +0200, Mike Rapoport wrote:
> > I have no general objection, but I presume the patch will be significantly
> > simplified if the addition of 4K pages support will follow the removal of
> > the trampoline_pd_dir.
> >
> > That said, I didn't look into the details, since they will change
> > substantially, only some comments on the Kconfig part.
> >
> > On the high level, have you considered using large pages in setup_vm() and
> > the remapping everything with 4K pages in setup_vm_final()? This might
> > save you the whole ops-> churn.
>
> That would be a great start. That being said the current tiny memory
> RISC-V devices don't even have a MMU, so the kernel pagetable mapping
> isn't even relevant for them. I'm just not sure adding more complexity
> in the early boot path for a borderline case (MMU and tiny memory
> with a tiny kernel image) is reall worth all the complexity.

It's not just for addressing a borderline case (MMU and tiny memory with tiny
kernel image).

We trying to addresses following issues in current code:
1. The current setup_vm() maps all possible kernel virtual addresses (128GB
on 64bit system and 1GB on 32bit system). The amount RAM present on
real systems might be much less so we should not have kernel mappings for
non-existent RAM. Of course, we don't know amount of RAM available in
setup_vm() so we have to split page table setup in two parts and do minimal
required mapping in setup_vm().
2. NOMMU kernel requires a swapper_pg_dir with identity mapping (VA == PA)
and without it we get boot-time crash so we cannot skip it for NOMMU case. For
NOMMU, the PAGE_OFFSET will typically be 0x80020000 (or 0x80xxxxxx). This
means swapper_pmd array (which uses -PAGE_OFFSET) will be over-sized
causing compile errors.
3. For both NOMMU with tiny memory and MMU with tiny memory, the current
setup_vm() is not allowing us to place kernel on non-2M (or non-4M) aligned
addressed there by causing memory below kernel to be wasted.
4. For MMU based kernel, the current setup_vm() is hard-wired for fixed 2M
mapping size. It will require more changes if we want to do 1G mappings.

The above issues motivated us to re-write setup_vm().

We are trying to make initial page table setup more flexible and robust so that:
1. We don't have any unwanted mappings pointing to non-existent RAM
2. We can have any value of PAGE_OFFSET for NOMMU case without the
page table arrays becoming oversized
3. We can create mappings of best possible size to get good performance
4. We can boot from any 4K/2M/1G (or just 4K) aligned load address

Also, the end result of all this is a much more readable page table setup code
shared between setup_vm() an setup_vm_final() where the differences are
abstracted via mapping ops.

Regards,
Anup