Re: [RFC v2 3/4] modules, x86: use vmalloc_exec for module core

From: Song Liu
Date: Fri Oct 14 2022 - 14:26:14 EST




> On Oct 14, 2022, at 8:42 AM, Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote:
>
> On Fri, 2022-10-07 at 16:43 -0700, Song Liu wrote:
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index a4e4d84b6f4e..b44806e31a56 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -53,6 +53,7 @@
>> #include <linux/bsearch.h>
>> #include <linux/dynamic_debug.h>
>> #include <linux/audit.h>
>> +#include <linux/bpf.h>
>> #include <uapi/linux/module.h>
>> #include "internal.h"
>>
>> @@ -1203,7 +1204,7 @@ static void free_module(struct module *mod)
>> lockdep_free_key_range(mod->data_layout.base, mod-
>>> data_layout.size);
>>
>> /* Finally, free the core (containing the module structure)
>> */
>> - module_memfree(mod->core_layout.base);
>> + vfree_exec(mod->core_layout.base);
>> #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>> vfree(mod->data_layout.base);
>> #endif
>> @@ -1321,7 +1322,8 @@ static int simplify_symbols(struct module *mod,
>> const struct load_info *info)
>> ksym = resolve_symbol_wait(mod, info, name);
>> /* Ok if resolved. */
>> if (ksym && !IS_ERR(ksym)) {
>> - sym[i].st_value =
>> kernel_symbol_value(ksym);
>> + unsigned long val =
>> kernel_symbol_value(ksym);
>> + bpf_arch_text_copy(&sym[i].st_value,
>> &val, sizeof(val));
>
> Why bpf_arch_text_copy()? This of course won't work for other
> architectures. So there needs to be fallback method. That RFC broke the
> operation into two stages: Loading and finalized. When loading, on non-
> x86 the writes would simply be to the allocation mapped as writable.
> When it was finalized it changed it to it's final permission (RO, etc).
> Then for x86 it does text_pokes() for the writes and has it RO from the
> beginning.

Yeah, this one (3/4) is really a prototype to show vmalloc_exec could
work for modules (with a lot more work of course). And something to
replace bpf_arch_text_copy() is one of the issues we need to address in
the future.

>
> I ended up needing a staging buffer for modules too, so that the code
> could operate on it directly. I can't remember why that was, it might
> be unneeded now since you moved data out of the core allocation.

Both bpf_jit and bpf_dispather uses a staging buffer with bpf_prog_pack.
The benefit of this approach is that it minimizes the number of
text_poke/copy() calls. OTOH, it is quite a pain to make all the
relative calls correct, as the staging buffer has different address to
the final allocation.

I think we may not need the staging buffer for modules, as module
load/unload happens less often than BPF program JITs (so it is ok for
it to be slightly slower).

btw: I cannot take credit for split module data out of core allocation,
Christophe Leroy did the work. :)

Thanks,
Song