Re: [PATCH 1/3] module: Introduce module_alloc_type

From: Kent Overstreet
Date: Fri May 26 2023 - 23:20:02 EST

On Fri, May 26, 2023 at 05:03:18PM -0700, Song Liu wrote:
> On Fri, May 26, 2023 at 4:39 PM Kent Overstreet
> <kent.overstreet@xxxxxxxxx> wrote:
> >
> [...]
> >
> > But it should be an internal implementation detail, I don't think we
> > want the external interface tied to vmalloc -
> >
> > > These two APIs allow the core code work with all archs. They won't break
> > > sub-page allocations. (not all archs will have sub-page allocations)
> >
> > So yes, text_poke() doesn't work on all architectures, and we need a
> > fallback.
> >
> > But if the fallback needs to go the unprotect/protect route, I think we
> > need to put that in the helper, and not expose it. Part of what people
> > are wanting is to limit or eliminate pages that are RWX, so we
> > definitely shouldn't be creating new interfaces to flipping page
> > permissions: that should be pushed down to as low a level as possible.
> >
> > E.g., with my jitalloc_update(), a more complete version would look like
> >
> > void jitalloc_update(void *dst, void *src, size_t len)
> > {
> > if (text_poke_available) {
> > text_poke(...);
> > } else {
> > unprotect();
> > memcpy();
> > protect();
> > }
> > }
> I think this doesn't work for sub page allocation?

Perhaps I elided too much - it does if you add a single lock. You can't
do that if it's not a common helper.

> At the end of all this, we will have modules running from huge pages, data
> and text. It will give significant performance benefit when some key driver
> cannot be compiled into the kernel.

Yeah, I've seen the numbers for the perf impact of running as a module
due to the extra TLB overhead - but Mike's recent data was showing that
this doesn't matter nearly as much as data as it does for text.

> > Also - module_memory_fill_type(), module_memory_invalidate_type() - I
> > know these are for BPF, but could you explain why we need them in the
> > external interface here? Could they perhaps be small helpers in the bpf
> > code that use something like jitalloc_update()?
> These are used by all users, not just BPF. 1/3 uses them in
> kernel/module/main.c. I didn't use them in 3/3 as it is arch code, but I can
> use them instead of text_poke_* (and that is probably better code style).
> As I am writing the email, I think I should use it in ftrace.c (2/3) to avoid
> the __weak function.

Ok. Could we make it clearer why they're needed and what they're for?

I know bpf fills in invalid instructions initially; I didn't read
through enough code to find the "why", and let's document that to save
other people the same effort.

And do they really need to be callbacks in mod_alloc_params...?

Do we need the other things in mod_alloc_params/vmalloc_params?
- your granularity field says it's for specifying PAGE/PMD size: we
definitely do not want that. We've had way too much code along the
lines of "implement hugepages for x", we need to stop doing that.
It's an internal mm/ detail.

- vmalloc_params: we need gfp_t and pgprot_t, but those should be
normal arguments. start/end/vm_flags? They seem like historical
module baggage, can we get rid of them?

> > > OTOH, jit_alloc (as-is) solves a subset of the problem (only the text).
> > > We can probably use it (with some updates) instead of the vmap_area
> > > based allocator. But I guess we need to align the direction first.
> >
> > Let's see if we can get tglx to chime in again, since he was pretty
> > vocal in the last discussion.
> >
> > It's also good practice to try to summarize all the relevant "whys" from
> > the discussion that went into a feature and put that in at least the
> > commit message - or even better, header file comments.
> >
> > Also, organization: the module code is a huge mess, let's _please_ do
> > split this out and think about organization a bit more, not add to the
> > mess :)
> I don't really think module code is very messy at the moment. If
> necessary, we can put module_alloc_type related code in a
> separate file.

Hey, it's been organized since I last looked at it :) But I tihink this
belongs in mm/, not module code.