Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

From: Palmer Dabbelt
Date: Tue Jul 21 2020 - 19:48:16 EST


On Tue, 21 Jul 2020 16:12:58 PDT (-0700), benh@xxxxxxxxxxxxxxxxxxx wrote:
On Tue, 2020-07-21 at 12:05 -0700, Palmer Dabbelt wrote:

* We waste vmalloc space on 32-bit systems, where there isn't a lot of it.
* On 64-bit systems the VA space around the kernel is precious because it's the
only place we can place text (modules, BPF, whatever).

Why ? Branch distance limits ? You can't use trampolines ?

Nothing fundamental, it's just that we don't have a large code model in the C
compiler. As a result all the global symbols are resolved as 32-bit
PC-relative accesses. We could fix this with a fast large code model, but then
the kernel would need to relax global symbol references in modules and we don't
even do that for the simple code models we have now. FWIW, some of the
proposed large code models are essentially just split-PLT/GOT and therefor
don't require relaxation, but at that point we're essentially PIC until we
have more that 2GiB of kernel text -- and even then, we keep all the
performance issues.

If we start putting
the kernel in the vmalloc space then we either have to pre-allocate a bunch
of space around it (essentially making it a fixed mapping anyway) or it
becomes likely that we won't be able to find space for modules as they're
loaded into running systems.

I dislike the kernel being in the vmalloc space (see my other email)
but I don't understand the specific issue with modules.

Essentially what's above, the modules smell the same as the rest of the
kernel's code and therefor have a similar set of restrictions. If we build PIC
modules and have the PLT entries do GOT loads (as do our shared libraries) then
we could break this restriction, but that comes with some performance
implications. Like I said in the other email, I'm less worried about the
instruction side of things so maybe that's the right way to go.

* Relying on a relocatable kernel for sv48 support introduces a fairly large
performance hit.

Out of curiosity why would relocatable kernels introduce a significant
hit ? Where about do you see the overhead coming from ?

Our PIC codegen, probably better addressed by my other email and above.


Roughly, my proposal would be to:

* Leave the 32-bit memory map alone. On 32-bit systems we can load modules
anywhere and we only have one VA width, so we're not really solving any
problems with these changes.
* Staticly allocate a 2GiB portion of the VA space for all our text, as its own
region. We'd link/relocate the kernel here instead of around PAGE_OFFSET,
which would decouple the kernel from the physical memory layout of the system.
This would have the side effect of sorting out a bunch of bootloader headaches
that we currently have.
* Sort out how to maintain a linear map as the canonical hole moves around
between the VA widths without adding a bunch of overhead to the virt2phys and
friends. This is probably going to be the trickiest part, but I think if we
just change the page table code to essentially lie about VAs when an sv39
system runs an sv48+sv39 kernel we could make it work -- there'd be some
logical complexity involved, but it would remain fast.

This doesn't solve the problem of virtually relocatable kernels, but it does
let us decouple that from the sv48 stuff. It also lets us stop relying on a
fixed physical address the kernel is loaded into, which is another thing I
don't like.

I know this may be a more complicated approach, but there aren't any sv48
systems around right now so I just don't see the rush to support them,
particularly when there's a cost to what already exists (for those who haven't
been watching, so far all the sv48 patch sets have imposed a significant
performance penalty on all systems).