Re: [PATCH 1/3] module: Introduce module_alloc_type

From: Song Liu
Date: Fri May 26 2023 - 20:03:43 EST


On Fri, May 26, 2023 at 4:39 PM Kent Overstreet
<kent.overstreet@xxxxxxxxx> wrote:
>
[...]

>
> But it should be an internal implementation detail, I don't think we
> want the external interface tied to vmalloc -
>
> > These two APIs allow the core code work with all archs. They won't break
> > sub-page allocations. (not all archs will have sub-page allocations)
>
> So yes, text_poke() doesn't work on all architectures, and we need a
> fallback.
>
> But if the fallback needs to go the unprotect/protect route, I think we
> need to put that in the helper, and not expose it. Part of what people
> are wanting is to limit or eliminate pages that are RWX, so we
> definitely shouldn't be creating new interfaces to flipping page
> permissions: that should be pushed down to as low a level as possible.
>
> E.g., with my jitalloc_update(), a more complete version would look like
>
> void jitalloc_update(void *dst, void *src, size_t len)
> {
> if (text_poke_available) {
> text_poke(...);
> } else {
> unprotect();
> memcpy();
> protect();
> }
> }

I think this doesn't work for sub page allocation?

>
> See? We provide a simpler, safer interface, and this also lets us handle
> multiple threads racing to update/flip permissions on the same page in a
> single place (e.g. with sticking a lock somewhere in the jitalloc
> structures).
[...]
>
> This must have been part of the previous discussion since you started
> with execmem_alloc(), but I did scan through that and I'm still not
> seeing the justification for why this needs to handle non-text
> allocations.
>
> If I had to hazard a guess it would be because of tglx wanting to solve
> tlb fragmentation for modules, but to me that doesn't sound like a good
> enough reason and it looks like we're ending up with a messy "try to be
> all things to all people" interface as a result :/
>
> Do we _really_ need that?
>
> Mike was just saying at LSF that direct map fragmentation turned out to
> be a non issue for performance for non-text, so maybe we don't.

At the end of all this, we will have modules running from huge pages, data
and text. It will give significant performance benefit when some key driver
cannot be compiled into the kernel.

>
> Also - module_memory_fill_type(), module_memory_invalidate_type() - I
> know these are for BPF, but could you explain why we need them in the
> external interface here? Could they perhaps be small helpers in the bpf
> code that use something like jitalloc_update()?

These are used by all users, not just BPF. 1/3 uses them in
kernel/module/main.c. I didn't use them in 3/3 as it is arch code, but I can
use them instead of text_poke_* (and that is probably better code style).
As I am writing the email, I think I should use it in ftrace.c (2/3) to avoid
the __weak function.

>
> > OTOH, jit_alloc (as-is) solves a subset of the problem (only the text).
> > We can probably use it (with some updates) instead of the vmap_area
> > based allocator. But I guess we need to align the direction first.
>
> Let's see if we can get tglx to chime in again, since he was pretty
> vocal in the last discussion.
>
> It's also good practice to try to summarize all the relevant "whys" from
> the discussion that went into a feature and put that in at least the
> commit message - or even better, header file comments.
>
> Also, organization: the module code is a huge mess, let's _please_ do
> split this out and think about organization a bit more, not add to the
> mess :)

I don't really think module code is very messy at the moment. If
necessary, we can put module_alloc_type related code in a
separate file.

Thanks,
Song